Discussion:
BSR EAX,EAX into a c function?
(too old to reply)
LiloLilo
2009-06-30 20:48:16 UTC
Permalink
Hi all!

while I was looking for a fast way to detect the highest set bit
position into a number, I found this:

* Integer Log base 2 *)
function IntLog2 Value: integer ): integer;
asm
BSR EAX, EAX
end;

Till now I was doing this operation this way (C++)

unsigned BitSize(long long Data) { unsigned Size = 0; while (Data)
{ Data >>= 1; Size ++; } return Size; }

I'd like to understand if BSR EAX, EAX can fit into my c function to
speed up things. It can be done?

Thank you all for help!
Rod Pemberton
2009-07-01 20:58:12 UTC
Permalink
Post by LiloLilo
while I was looking for a fast way to detect the highest set bit
* Integer Log base 2 *)
function IntLog2 Value: integer ): integer;
asm
BSR EAX, EAX
end;
Till now I was doing this operation this way (C++)
unsigned BitSize(long long Data) { unsigned Size = 0; while (Data)
{ Data >>= 1; Size ++; } return Size; }
I'd like to understand if BSR EAX, EAX can fit into my c function to
speed up things. It can be done?
Ok. Other than the mention of "BSR EAX,EAX" this is really all C related...
Post by LiloLilo
c function
(C++)
Was that C or C++? Which compiler?
Post by LiloLilo
I'd like to understand if BSR EAX, EAX can fit into my c function
C compilers usually support inline assembly. I'd think C++ compilers do
also. So, yes, it can fit into a C function.
Post by LiloLilo
to speed up things.
You'll have to test it out. Usually, inline assembly in C has two problems.
First, inline assembly usually disables or interferes with code
optimization. Second, inline assembly forces the compiler to generate extra
instructions. All or most of the cpu registers are being used by the C
code. Extra instructions are used to save and restore the registers that
the C code is using which conflict with the register usage of the inline
assembly. So, the only way to know for sure if it'll speed up things is to
code it and test it.
Post by LiloLilo
doing this operation this way
unsigned BitSize(long long Data) { unsigned Size = 0; while (Data)
{ Data >>= 1; Size ++; } return Size; }
First, do you want "unsigned long long Data"?

I'll assume a long long is 64-bits... If the compiler optimizes BitSize
into a small tight loop, it might be very difficult to outperform it. The
addition of a branch from a comparison can slow things down tremendously.
But, you could try some novice things like breaking Data into bytes. At
most, you'd need 8 shifts, but you'd need another loop to find the first
non-zero uppermost byte.

/* untested */
unsigned BitSize(unsigned long long Data)
{ unsigned Size = 56, Idx = 7, Byte;
/* find first non-zero upper most byte */
while (!(Byte=((unsigned char *)&Data)[Idx])&&Idx) {Idx--;Size-=8;}
/* shift Byte, not Data */
while (Byte){ Byte >>= 1; Size ++; }
return Size; }

However, my personal suspicions are that a loop and/or comparison will add
too much... but it is for 64-bits.


Rod Pemberton
LiloLilo
2009-07-02 19:27:33 UTC
Permalink
Was that C or C++? =A0Which compiler?
C++, Visual C++ 2008 Express Edition
First, do you want "unsigned long long Data"?
I'll assume a long long is 64-bits... =A0
Yes 64 bit unsigned

I wasn't able to write a 64 bit working asm function, but I was able
to write a 32 bit one but it isn't so much faster.

unsigned BitSize(int Data) {

__asm {

MOV EAX, Data; BSR EAX, EAX; CMP EAX, 0;

JNG ZERO

ADD EAX, 1;

ZERO:
}
}
Terje Mathisen
2009-07-03 10:04:26 UTC
Permalink
Post by LiloLilo
unsigned BitSize(int Data) {
__asm {
MOV EAX, Data; BSR EAX, EAX; CMP EAX, 0;
JNG ZERO
I think the logic is broken:

EAX will be zero if the original value was either 0 or 1, you want to
separate them, right?
Post by LiloLilo
ADD EAX, 1;
You can try to get rid of the branch with a little trick:

mov ebx,[data]
mov eax,-1
bsr eax,ebx ; Does not modify EAX if EBX was zero!
add eax,1 ; Return 0-32

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Steven Fuerst
2009-07-03 17:31:08 UTC
Permalink
Post by Terje Mathisen
bsr eax,ebx ; Does not modify EAX if EBX was zero!
Are you sure that is always true? In all the documentation I've seen, it
states that EAX is undefined if EBX is zero. However, testing it shows you
are right, at least on the machines I have access to.

Does anyone know of any machines where this statement isn't the case? If
not, then it means implementing a 64bit bsr and bsr in 32bit asm became a
whole lot more efficient. :-)

Steven
Richard Russell
2009-07-03 18:57:43 UTC
Permalink
Does anyone know of any machines where this statement isn't the case? =A0=
If
not, then it means implementing a 64bit bsr and bsr in 32bit asm became a
whole lot more efficient. :-)
Maybe I'm being unduly pedantic here, but surely what matters is not
how current machines behave but how the instruction is specified? If
Intel's and AMD's documentation says EAX is undefined (rather than
unchanged), then one day, on some (maybe future) CPU, under some
condition, it may get altered and you'd have nobody to blame but
yourself!

Richard.
http://www.rtrussell.co.uk/
To reply by email change 'news' to my forename.
Terje Mathisen
2009-07-05 03:06:57 UTC
Permalink
Post by Richard Russell
Does anyone know of any machines where this statement isn't the case? =A0=
If
not, then it means implementing a 64bit bsr and bsr in 32bit asm became a
whole lot more efficient. :-)
Maybe I'm being unduly pedantic here, but surely what matters is not
how current machines behave but how the instruction is specified? If
Intel's and AMD's documentation says EAX is undefined (rather than
unchanged), then one day, on some (maybe future) CPU, under some
condition, it may get altered and you'd have nobody to blame but
yourself!
You are absolutely correct:

mov eax,[data]
mov ebx,-1
bsr eax,eax
cmovz eax,ebx
inc eax

is much safer, and still branchless.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Hendrik van der Heijden
2009-07-02 20:46:57 UTC
Permalink
Post by LiloLilo
Hi all!
while I was looking for a fast way to detect the highest set bit
* Integer Log base 2 *)
function IntLog2 Value: integer ): integer;
asm
BSR EAX, EAX
end;
Till now I was doing this operation this way (C++)
unsigned BitSize(long long Data) { unsigned Size = 0; while (Data)
{ Data >>= 1; Size ++; } return Size; }
I'd like to understand if BSR EAX, EAX can fit into my c function to
speed up things. It can be done?
We just had that.

MS Visual Studio has the _BitScanForward intrinsic for 32 bits,
on x86_64 there's also _BitScanForward64. The gcc equivalents
(except for 0-argument handling) are __builtin_clz and __builtin_clzl.
These intrinsics compile to the bsf CPU instruction.

If you need a 64bit bsf on x86_32, check whether the high 32bits
are zero and then bsf the lower or upper half accordingly.


Hendrik vdH
Loading...