Use of SHR and SHL instructions

Discussion:

(too old to reply)

Hac4u

2018-04-25 00:27:12 UTC

What is the purpose of using SHR and SHL Instructions.

For example I have the following instructions.

0x0804854c <+12>: mov eax,0x0
0x08048551 <+17>: add eax,0xf
0x08048554 <+20>: add eax,0xf
0x08048557 <+23>: shr eax,0x4
0x0804855a <+26>: shl eax,0x4
=> 0x0804855d <+29>: sub esp,eax

Why was shr and shl used?

I am debugging using GDB and I am not using any switch for optimization.

Robert Wessel

2018-04-25 01:51:33 UTC

Permalink

On Tue, 24 Apr 2018 17:27:12 -0700 (PDT), Hac4u

Post by Hac4u
What is the purpose of using SHR and SHL Instructions.
For example I have the following instructions.
0x0804854c <+12>: mov eax,0x0
0x08048551 <+17>: add eax,0xf
0x08048554 <+20>: add eax,0xf
0x08048557 <+23>: shr eax,0x4
0x0804855a <+26>: shl eax,0x4
=> 0x0804855d <+29>: sub esp,eax
Why was shr and shl used?
I am debugging using GDB and I am not using any switch for optimization.

This appears to be the entry to a routine, allocating some space on
the stack. It's rather bad code because of the lack of optimization,
but...

The first two instructions appear to set the space requirements on the
stack (15 bytes).

The two shifts are to mask off the bottom four bits, IOW, to truncate
it to a multiple of 16 bytes (to maintain stack alignment). The right
shift shifts the four bottom bits off the right, the left shift shifts
four zero bits back in.

The truncation from the two shifts, and the third instruction combine
to round the space requirements *up*, so that the required number of
16-byte chunks (one, in this case) to hold the required locals (15
bytes) is computed.

The last instruction actually allocates the space on the stack.

At even -O1, I'd expect most of that to be collapse to one or two
instructions, perhaps as little as a "sub esp,16", or even omitted
entirely if the optimization can eliminate the need for a stack
frame..

Rod Pemberton

2018-04-25 02:17:54 UTC

Permalink

On Tue, 24 Apr 2018 20:51:33 -0500

Post by Robert Wessel
On Tue, 24 Apr 2018 17:27:12 -0700 (PDT), Hac4u

Post by Hac4u
What is the purpose of using SHR and SHL Instructions.
0x08048557 <+23>: shr eax,0x4
0x0804855a <+26>: shl eax,0x4
=> 0x0804855d <+29>: sub esp,eax
Why was shr and shl used?

The two shifts are to mask off the bottom four bits, IOW, to truncate
it to a multiple of 16 bytes (to maintain stack alignment). The right
shift shifts the four bottom bits off the right, the left shift shifts
four zero bits back in.

While that is the effect, the code seems bizarre, even if it is
expected to be optimized away. Was there some reason to use SHR and
SHL instead of AND with a mask? E.g., need to keep CF clear for
possible SBB instead of SUB? E.g., code works for 32-bit and 64-bit
unlike a fixed size mask?

Rod Pemberton

--
I believe in the right to life. That's why I oppose gun control.

Robert Wessel

2018-04-25 02:52:58 UTC

Permalink

On Tue, 24 Apr 2018 22:17:54 -0400, Rod Pemberton

Post by Rod Pemberton
On Tue, 24 Apr 2018 20:51:33 -0500

Post by Robert Wessel
On Tue, 24 Apr 2018 17:27:12 -0700 (PDT), Hac4u

Post by Hac4u
What is the purpose of using SHR and SHL Instructions.
0x08048557 <+23>: shr eax,0x4
0x0804855a <+26>: shl eax,0x4
=> 0x0804855d <+29>: sub esp,eax
Why was shr and shl used?

It's not code size, the operand for an and with 0xfffffff0 gets sign
extended from one byte, so that's actually a shorted sequence.
I wonder if somewhere in GCC's architecture definition for the
platform the required rounding is defined as "((n+15)/16)*16". The
code would be a literal translation of that.

OTOH, you really can't infer too much from GCC at -O0. It generates
really bad code. After all, he just used five instructions to
generate a constant he could have trivially calculated at compile
time.

Bartc

2018-04-25 11:23:44 UTC

Permalink

Post by Robert Wessel
It's not code size, the operand for an and with 0xfffffff0 gets sign
extended from one byte, so that's actually a shorted sequence.
I wonder if somewhere in GCC's architecture definition for the
platform the required rounding is defined as "((n+15)/16)*16". The
code would be a literal translation of that.
OTOH, you really can't infer too much from GCC at -O0. It generates
really bad code.

No, really bad code would have used DIV and MUL (I'm an expert on that).

--
bartc