I need to decode x86 machine language for control flow instructions

Post by olcott
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all of
these bytes.

There's also INT* instructions, and IRET* instructions. Later CPUs also
have a fast call to access the core OS functions.

Post by olcott
I posted some good online documentation (see below) yet some of the
numerical values are not listed in this documentation.
I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,
I don't know what this means: /2, /3, /4, /5
Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html
Jump
https://c9x.me/x86/html/file_module_x86_id_147.html
Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html
Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional bits
in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31. You
can see samples of the Intel instruction encoding syntax on pages 42 and
the CMC example.

--
Rick C. Hodgin

olcott

2020-11-25 05:40:43 UTC

Post by olcott
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all of
these bytes.

There's also INT* instructions, and IRET* instructions. Later CPUs also
have a fast call to access the core OS functions.

x86 encodes opcode bits in the primary opcode, plus some additional bits
in the Mod/Reg/RM and SIB bits.
If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Specifically, the Volume 2: Instruction Set Reference on page 31. You
can see samples of the Intel instruction encoding syntax on pages 42 and
the CMC example.

That was very helpful I almost have what I need. I found a little
inconsistency:

The text seems to indicate that the SIB BYTE alway follows a MODR/M byte:

Certain encodings of the ModR/M byte require a second
addressing byte, the SIB byte,

Yet the instructions below only match Table 2-3 and do not seem to have
a MODR/M BYTE

01 010 101 55
00 010 101 15
11 010 001 D1

[0000057a](03) ff5508 call dword [ebp+08]
[00000589](06) ff150b020000 call dword [0000020b]
[0000059d](02) ffd1 call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte (Page 2-7)

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

Rick C. Hodgin

2020-11-25 13:30:34 UTC

Post by olcott
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all of
these bytes.

There's also INT* instructions, and IRET* instructions. Later CPUs
also have a fast call to access the core OS functions.

x86 encodes opcode bits in the primary opcode, plus some additional
bits in the Mod/Reg/RM and SIB bits.
If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Specifically, the Volume 2: Instruction Set Reference on page 31. You
can see samples of the Intel instruction encoding syntax on pages 42
and the CMC example.

That was very helpful I almost have what I need. I found a little
    Certain encodings of the ModR/M byte require a second
    addressing byte, the SIB byte,
Yet the instructions below only match Table 2-3 and do not seem to have
a MODR/M BYTE
                01 010 101 55
                00 010 101 15
                11 010 001 D1
[0000057a](03) ff5508              call dword [ebp+08]
[00000589](06) ff150b020000        call dword [0000020b]
[0000059d](02) ffd1                call ecx
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override
bytes, and is signaled by a special encoding in the Mod/Reg/RM byte.
See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

x86 is a little hairy to decode. But, it does follow rules. If you get
a handle on how they work, you can decode anything.

Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if a
SIB follows
SIB is optional
Offset data is optional
Immediate data is optional

It can be a maximum of 16 bytes long before signaling a fault. And when
you get into AMD64 mode, they have REX prefixes which remove some
opcodes from their 32-bit meanings, and add additional bits and flags
making the Reg/RM portions be 4 bits (16 registers instead of 8).

--
Rick C. Hodgin

olcott

2020-11-25 14:41:23 UTC

Post by olcott
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all
of these bytes.

There's also INT* instructions, and IRET* instructions. Later CPUs
also have a fast call to access the core OS functions.

x86 encodes opcode bits in the primary opcode, plus some additional
bits in the Mod/Reg/RM and SIB bits.
If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Specifically, the Volume 2: Instruction Set Reference on page 31.
You can see samples of the Intel instruction encoding syntax on pages
42 and the CMC example.

That was very helpful I almost have what I need. I found a little
     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,
Yet the instructions below only match Table 2-3 and do not seem to
have a MODR/M BYTE
                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1
[0000057a](03) ff5508              call dword [ebp+08]
[00000589](06) ff150b020000        call dword [0000020b]
[0000059d](02) ffd1                call ecx
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override
bytes, and is signaled by a special encoding in the Mod/Reg/RM byte. See
page 36 and the reference to the encoding in the Mod/Reg/RM byte.
x86 is a little hairy to decode. But, it does follow rules. If you get
a handle on how they work, you can decode anything.
Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if a
SIB follows
SIB is optional
Offset data is optional
Immediate data is optional
It can be a maximum of 16 bytes long before signaling a fault. And when
you get into AMD64 mode, they have REX prefixes which remove some
opcodes from their 32-bit meanings, and add additional bits and flags
making the Reg/RM portions be 4 bits (16 registers instead of 8).

My issue is that the handbook seems to say that you can't have a SIB
byte unless you have a MODR/M byte preceding it. The three call
instruction listed above do have an SIB byte (55,15,D1) without a MODR/M
byte preceding it. How can this be, is the handbook wrong?

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

Rick C. Hodgin

2020-11-25 15:02:04 UTC

Post by olcott
Yet the instructions below only match Table 2-3 and do not seem to
have a MODR/M BYTE
                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1
[0000057a](03) ff5508              call dword [ebp+08]
[00000589](06) ff150b020000        call dword [0000020b]
[0000059d](02) ffd1                call ecx
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte (Page 2-7)

Look at the encoding for the call instruction with the 0xff format on
page 93. It shows that there is a Mod/Reg/RM byte there in byte two,
and then the immediate data afterwards in the first two, and no other
data following in the call ecx instruction.

--
Rick C. Hodgin

olcott

2020-11-25 15:34:05 UTC

My issue is that the handbook seems to say that you can't have a SIB
byte unless you have a MODR/M byte preceding it. The three call
instruction listed above do have an SIB byte (55,15,D1) without a
MODR/M byte preceding it. How can this be, is the handbook wrong?

That is really weird.
Last night this table was giving me the wrong decode values for the
above call instructions and this morning they are correct:
Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

olcott

2020-11-25 22:18:43 UTC

My issue is that the handbook seems to say that you can't have a SIB
byte unless you have a MODR/M byte preceding it. The three call
instruction listed above do have an SIB byte (55,15,D1) without a
MODR/M byte preceding it. How can this be, is the handbook wrong?

Look at the encoding for the call instruction with the 0xff format on
page 93. It shows that there is a Mod/Reg/RM byte there in byte two,

I can't see how it says that. I don't understand the codes.

Post by Rick C. Hodgin
and then the immediate data afterwards in the first two, and no other
data following in the call ecx instruction.

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

Andrew Cooper

2020-11-25 15:07:22 UTC

Post by olcott
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all
of these bytes.

There's also INT* instructions, and IRET* instructions. Later CPUs
also have a fast call to access the core OS functions.

x86 encodes opcode bits in the primary opcode, plus some additional
bits in the Mod/Reg/RM and SIB bits.
If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Specifically, the Volume 2: Instruction Set Reference on page 31.
You can see samples of the Intel instruction encoding syntax on
pages 42 and the CMC example.

That was very helpful I almost have what I need. I found a little
     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,
Yet the instructions below only match Table 2-3 and do not seem to
have a MODR/M BYTE
                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1
[0000057a](03) ff5508              call dword [ebp+08]
[00000589](06) ff150b020000        call dword [0000020b]
[0000059d](02) ffd1                call ecx
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override
bytes, and is signaled by a special encoding in the Mod/Reg/RM byte.
See page 36 and the reference to the encoding in the Mod/Reg/RM byte.
x86 is a little hairy to decode. But, it does follow rules. If you
get a handle on how they work, you can decode anything.
Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if
a SIB follows
SIB is optional
Offset data is optional
Immediate data is optional
It can be a maximum of 16 bytes long before signaling a fault. And
when you get into AMD64 mode, they have REX prefixes which remove some
opcodes from their 32-bit meanings, and add additional bits and flags
making the Reg/RM portions be 4 bits (16 registers instead of 8).

A ModRM byte is optional, determined by the opcode.
A SIB byte is optional, determined by the ModRM encoding.

SIB is generally only used for the more complicated memory addressing
modes. Simple instructions tend not to use them.

~Andrew

olcott

2020-11-25 20:33:31 UTC

Post by Andrew Cooper

Post by olcott
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all
of these bytes.

There's also INT* instructions, and IRET* instructions. Later CPUs
also have a fast call to access the core OS functions.

x86 encodes opcode bits in the primary opcode, plus some additional
bits in the Mod/Reg/RM and SIB bits.
If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Specifically, the Volume 2: Instruction Set Reference on page 31.
You can see samples of the Intel instruction encoding syntax on
pages 42 and the CMC example.

That was very helpful I almost have what I need. I found a little
     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,
Yet the instructions below only match Table 2-3 and do not seem to
have a MODR/M BYTE
                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1
[0000057a](03) ff5508              call dword [ebp+08]
[00000589](06) ff150b020000        call dword [0000020b]
[0000059d](02) ffd1                call ecx
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override
bytes, and is signaled by a special encoding in the Mod/Reg/RM byte.
See page 36 and the reference to the encoding in the Mod/Reg/RM byte.
x86 is a little hairy to decode. But, it does follow rules. If you
get a handle on how they work, you can decode anything.
Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if
a SIB follows
SIB is optional
Offset data is optional
Immediate data is optional
It can be a maximum of 16 bytes long before signaling a fault. And
when you get into AMD64 mode, they have REX prefixes which remove some
opcodes from their 32-bit meanings, and add additional bits and flags
making the Reg/RM portions be 4 bits (16 registers instead of 8).

A ModRM byte is optional, determined by the opcode.
A SIB byte is optional, determined by the ModRM encoding.
SIB is generally only used for the more complicated memory addressing
modes. Simple instructions tend not to use them.
~Andrew

I can't understand the specified notational conventions so that I know
to look for a ModRM byte a SIB byte both or neither.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

Andrew Cooper

2020-11-25 21:18:33 UTC

Post by olcott
I can't understand the specified notational conventions so that I know
to look for a ModRM byte a SIB byte both or neither.
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

(As a tangent, that is an exceedingly obsolete version of the spec. You
can obtain up-to-date ones from https://intel.com/sdm/ or
https://developer.amd.com/resources/developer-guides-manuals/ but it
doesn't matter for this specific purpose.)

As for `CALL indirect`, the general form looks like this:

FF <modrm>

There is *always* a ModRM byte. You know this because you know the
opcode FF always has one.

From the examples given previously:

FF D1 - CALL ecx
The ModRM byte is D1, encoding just ecx.

FF 55 08 - CALL DWORD [ebp + 8]
The ModRM byte is 55, encoding ebp, and also that a single byte
immediate (08) follows.

FF 15 0b020000 - CALL DWORD [0000020b]
The ModRM byte is 15, encoding a memory address composed only of a
4-byte immediate value.

If you want to start getting into SIB, then you want the more
complicated addressing modes, such as:

FF 14 03 - CALL DWORD [ebx + eax]
The ModRM byte is 14 (doesn't encode any registers, but does encode a
presence of SIB), and the SIB byte is 03, encoding a base of ebx, index
of eax, and a scale of 1.

In all cases, the value of the ModRM byte controls what follows. It
might be nothing, or it might be an immediate (1, 2 or 4 bytes), and/or
there might be a SIB.

An example using both might be:

FF 54 03 08 - CALL DWORD [ebx + eax + 8]
The ModRM byte is 54, encoding both a SIB byte (03) and a 1-byte
immediate (08) to follow.

~Andrew

Andrew Cooper

2020-11-25 21:19:15 UTC

olcott

2020-11-25 22:14:27 UTC

Post by Andrew Cooper

(As a tangent, that is an exceedingly obsolete version of the spec. You
can obtain up-to-date ones from https://intel.com/sdm/ or
https://developer.amd.com/resources/developer-guides-manuals/ but it
doesn't matter for this specific purpose.)
FF <modrm>
There is *always* a ModRM byte. You know this because you know the
opcode FF always has one.
FF D1 - CALL ecx
The ModRM byte is D1, encoding just ecx.
FF 55 08 - CALL DWORD [ebp + 8]
The ModRM byte is 55, encoding ebp, and also that a single byte
immediate (08) follows.
FF 15 0b020000 - CALL DWORD [0000020b]
The ModRM byte is 15, encoding a memory address composed only of a
4-byte immediate value.
If you want to start getting into SIB, then you want the more
FF 14 03 - CALL DWORD [ebx + eax]
The ModRM byte is 14 (doesn't encode any registers, but does encode a
presence of SIB), and the SIB byte is 03, encoding a base of ebx, index
of eax, and a scale of 1.
In all cases, the value of the ModRM byte controls what follows. It
might be nothing, or it might be an immediate (1, 2 or 4 bytes), and/or
there might be a SIB.
FF 54 03 08 - CALL DWORD [ebx + eax + 8]
The ModRM byte is 54, encoding both a SIB byte (03) and a 1-byte
immediate (08) to follow.
~Andrew

I need to know how to decode this to understand that it whether or not
to look for a ModRM byte a SIB byte both or neither for all instructions.

FF /2 CALL r/m16 Call near, absolute indirect, address given in r/m16
FF /2 CALL r/m32 Call near, absolute indirect, address given in r/m32

Is the /2 somehow supposed to tell us this?

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

Rick C. Hodgin

2020-11-25 23:13:35 UTC

Post by Andrew Cooper
FF <modrm>
There is *always* a ModRM byte. You know this because you know the
opcode FF always has one.
FF D1 - CALL ecx
The ModRM byte is D1, encoding just ecx.
FF 55 08 - CALL DWORD [ebp + 8]
The ModRM byte is 55, encoding ebp, and also that a single byte
immediate (08) follows.
FF 15 0b020000 - CALL DWORD [0000020b]
The ModRM byte is 15, encoding a memory address composed only of a
4-byte immediate value.
If you want to start getting into SIB, then you want the more
FF 14 03 - CALL DWORD [ebx + eax]
The ModRM byte is 14 (doesn't encode any registers, but does encode a
presence of SIB), and the SIB byte is 03, encoding a base of ebx, index
of eax, and a scale of 1.
In all cases, the value of the ModRM byte controls what follows. It
might be nothing, or it might be an immediate (1, 2 or 4 bytes), and/or
there might be a SIB.
FF 54 03 08 - CALL DWORD [ebx + eax + 8]
The ModRM byte is 54, encoding both a SIB byte (03) and a 1-byte
immediate (08) to follow.
~Andrew

I need to know how to decode this to understand that it whether or not
to look for a ModRM byte a SIB byte both or neither for all instructions.
FF /2 CALL r/m16 Call near, absolute indirect, address given in r/m16
FF /2 CALL r/m32 Call near, absolute indirect, address given in r/m32
Is the /2 somehow supposed to tell us this?

It's part of the opcode. It overflows from the 8 bits in the first byte
into those 3 bits from the Mod/Reg/RM byte.

On page 93 it shows two separate encodings for the 0xff opcode. The
first is the /2, and the second is the /3. That means you'll find the
bit pattern 010 in the Reg bits.

If you look on page 42, it says: "/digit - A digit between 0 and 7
indicates that the Mod/Reg/RM byte of the instruction uses only the RM
(register or memory) operand. The Reg field contains the digit that
provides an extension to the instruction's opcode."

In cases where you see the "/r" encoding, that means the Reg field
actually does contain a register. This would be for two-operand
sources, like "mov eax,ebx". In that case, it would use both the Reg
and RM components to indicate the two registers. In the case of "call
ecx" you're only using one register, so the Reg bits are opened up as
not being in use, and the x86 designers decided to use those bits to
allow for additional encodings.

For the CALL instruction, they added /2 and /3, which yields two
completely different call operations, such as call r/m32, and call
m16:16, or call m16:32 depending on which mode you're in, either
natively, or due to override prefixes.

Fast-forward to page 186, and you see the DEC instruction uses the 0xff
/1 encoding, meaning the same 0xff opcode, but the /1 indicates it's not
a CALL instruction, but rather a DEC instruction.

Make sense?

--
Rick C. Hodgin

Rick C. Hodgin

2020-11-25 23:17:11 UTC

Post by Andrew Cooper

Post by Rick C. Hodgin
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

I gave him that reference because it's an old and simple x86 ISA manual.
The newer ones have that information as well, but they're also
convoluted with 64-bit support, new instructions, different operating
modes, etc.

When you're learning, I find it easiest to start with something simple.
In fact, the MASM 6.x Reference manual that came with MASM 6.1 when I
bought it was my introduction to 80386 programming. I wrote my first
assemblers, debuggers, and compilers there. I have different colored
highlighter markers all the way through it for where I would code and
test and validate the various instructions.

I just thought it would be easier to start out with something easier,
like what existed back in the 90s before 64-bit, before virtualization,
before extended ISAs (beyond early SIMD anyway).

--
Rick C. Hodgin

wolfgang kern

2020-11-26 13:48:01 UTC