Discussion:
help spot the error in this floating point code
(too old to reply)
Bin Xin
2008-03-18 22:26:40 UTC
Permalink
Dear all,

I would appreciate any help in spotting what might be wrong in the
code snippet (attached at the end). A little bit of background: for
reasons not particularly relevant here, I cannot use C math library as
it is, so I have to supply my own implementation of a few math
functions that are used in my code. I searched around and eventually
enclose the following code in my C program (taken from the dietlibc
code base, with some change). The goal is to replace calls to 'log/
pow' with my version 'my_log/my_pow'.

It's working to some extent. For example the following code:
printf("%.3f\n", my_pow(2.0, 35.3));

//... some other code

printf("%.3f\n", my_pow(2.0, 35.3));
<<<
outputs:

42301799935.756
2.000

In other words, the first invocation get the right results, but not
the second one. My intuition is that both log and pow are pure
function, so this should not happen. But then again, I have limited
experience in FP coding at assembly level, so there might be some
thing I overlooked. Like maybe the special ways the FP registers or
FP status register have to be manipulated.

Thanks ahead for any comments.

-bx

Begin code snippet:
======================
double my_log(double);
double my_pow(double x, double y);

asm(".text\n"
".global my_log,my_pow,__finexp\n"
".type my_log,@function\n"
".type my_pow,@function\n"
".type __finexp,@function\n"
"my_log:\n"
"fldln2\n"
"fldl 4(%esp)\n"
"fyl2x\n"
"ret\n"
#ifndef __DYN_LIB
"__finexp:\n"
#endif
".Lfinexp:\n"
"fst %st(1)\n"
"frndint\n"
"fst %st(2)\n"
"fsubrp\n"
"f2xm1\n"
"fld1\n"
"faddp\n"
"fscale\n"
"ret\n"
#ifdef __DYN_LIB
"__finexp:\n"
"PIC_RESTORE\n"
"jmp .Lfinexp\n"
#endif
"my_pow:\n"
"fldl 4(%esp)\n"
"fldl 12(%esp)\n"
".L__pow:\n"
"ftst\n"
"fstsw %ax\n"
"fld1\n"
"sahf\n"
"jz 1f\n"
"fcomp %st(1)\n"
"fstsw %ax\n"
"fxch\n"
"sahf\n"
"jz 1f\n"
"ftst\n"
"fstsw %ax\n"
"sahf\n"
"jz 1f\n"
"jnc .Lfinpow\n"
"fxch\n"
"fld %st(0)\n"
"frndint\n"
"fcomp %st(1)\n"
"fstsw %ax\n"
"fxch\n"
"sahf\n"
"jnz .Lfinpow\n"
"fld1\n"
"fadd %st(0)\n"
"fdivr %st(2),%st(0)\n"
"frndint\n"
"fadd %st(0),%st(0)\n"
"fcomp %st(2)\n"
"fstsw %ax\n"
"fchs\n"
"sahf\n"
"jz .Lfinpow\n"
"call .Lfinpow\n"
"fchs\n"
"1: ret\n"
".Lfinpow:\n"
"fyl2x\n"
#ifdef __DYN_LIB
"PIC_SAVE\n"
"PIC_INIT\n"
"jmp ***@PLT\n"
#else
"jmp __finexp\n"
#endif
".Lende:\n"
".size my_log,.Lende-my_log\n"
".size my_pow,.Lende-my_pow\n"
".size __finexp,.Lende-__finexp\n"
);
======================
End code snippet.
Tim Roberts
2008-03-19 05:13:52 UTC
Permalink
Post by Bin Xin
...
printf("%.3f\n", my_pow(2.0, 35.3));
//... some other code
printf("%.3f\n", my_pow(2.0, 35.3));
<<<
42301799935.756
2.000
In other words, the first invocation get the right results, but not
the second one. My intuition is that both log and pow are pure
function, so this should not happen. But then again, I have limited
experience in FP coding at assembly level, so there might be some
thing I overlooked. Like maybe the special ways the FP registers or
FP status register have to be manipulated.
I admit that I lost track part way through, but it looks to me like you
aren't cleaning up the floating point stack when you are done. If you
leave stuff on the stack, sooner or later the stack will overflow (there
are only 8 entries, after all), which triggers floating point exceptions.
--
Tim Roberts, ***@probo.com
Providenza & Boekelheide, Inc.
Terje Mathisen
2008-03-19 13:02:16 UTC
Permalink
Post by Tim Roberts
Post by Bin Xin
printf("%.3f\n", my_pow(2.0, 35.3));
<<<
42301799935.756
2.000
In other words, the first invocation get the right results, but not
the second one. My intuition is that both log and pow are pure
[snip]
Post by Tim Roberts
I admit that I lost track part way through, but it looks to me like you
aren't cleaning up the floating point stack when you are done. If you
leave stuff on the stack, sooner or later the stack will overflow (there
are only 8 entries, after all), which triggers floating point exceptions.
Tim is almost certainly right!

The easy way to check this is to call your my_pow() function in a loop,
preferably while inside the debugger, and watch the x87 stack contents.

Anyway, if you're going to get rid of pow/log, do you still need full
double precision results?

You can do lots of fun stuff with polynomial approximations. :-)

For a sw DX9/DX10 implementation I came up with ways to do this with
sufficient precision that were one or two orders of magnitude faster
than the x87 hw stuff.

Terje
--
- <***@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Bin Xin
2008-04-16 22:23:00 UTC
Permalink
Post by Terje Mathisen
Post by Tim Roberts
I admit that I lost track part way through, but it looks to me like you
aren't cleaning up thefloatingpointstack when you are done. If you
leave stuff on the stack, sooner or later the stack will overflow (there
are only 8 entries, after all), which triggersfloatingpointexceptions.
The easy way to check this is to call your my_pow() function in a loop,
preferably while inside the debugger, and watch the x87 stack contents.
Thanks for replying.

So I try to follow the stack state after each call to my_pow, it
looks ok for a few calls (9), then I see a weird thing: when "fld1" is
executed at some point, it loads a "nan" instead of "1" into st0:

(gdb) ni
0x08048397 270
8: $st7 = 15
7: $st6 = 14
6: $st5 = 13
5: $st4 = 12
4: $st3 = 2
3: $st2 = 0
2: $st1 = 2
1: $st0 = 0
(gdb) ni
0x08048399 270
8: $st7 = 14
7: $st6 = 13
6: $st5 = 12
5: $st4 = 2
4: $st3 = 0
3: $st2 = 2
2: $st1 = 0
1: $st0 = -nan(0xc000000000000000)

So what am I missing here? I have attached the disassembled code at
the end for my_pow.

Thanks,
-Bin




my_pow:
804838a: dd 44 24 04 fldl 0x4(%esp)
804838e: dd 44 24 0c fldl 0xc(%esp)
8048392: d9 e4 ftst
8048394: 9b df e0 fstsw %ax
8048397: d9 e8 fld1
8048399: 9e sahf
804839a: 74 3f je 80483db <my_pow+0x51>
804839c: d8 d9 fcomp %st(1)
804839e: 9b df e0 fstsw %ax
80483a1: d9 c9 fxch %st(1)
80483a3: 9e sahf
80483a4: 74 35 je 80483db <my_pow+0x51>
80483a6: d9 e4 ftst
80483a8: 9b df e0 fstsw %ax
80483ab: 9e sahf
80483ac: 74 2d je 80483db <my_pow+0x51>
80483ae: 73 2c jae 80483dc <my_pow+0x52>
80483b0: d9 c9 fxch %st(1)
80483b2: d9 c0 fld %st(0)
80483b4: d9 fc frndint
80483b6: d8 d9 fcomp %st(1)
80483b8: 9b df e0 fstsw %ax
80483bb: d9 c9 fxch %st(1)
80483bd: 9e sahf
80483be: 75 1c jne 80483dc <my_pow+0x52>
80483c0: d9 e8 fld1
80483c2: d8 c0 fadd %st(0),%st
80483c4: d8 fa fdivr %st(2),%st
80483c6: d9 fc frndint
80483c8: d8 c0 fadd %st(0),%st
80483ca: d8 da fcomp %st(2)
80483cc: 9b df e0 fstsw %ax
80483cf: d9 e0 fchs
80483d1: 9e sahf
80483d2: 74 08 je 80483dc <my_pow+0x52>
80483d4: e8 03 00 00 00 call 80483dc <my_pow+0x52>
80483d9: d9 e0 fchs
80483db: c3 ret
80483dc: d9 f1 fyl2x
80483de: e9 96 ff ff ff jmp 8048379 <__finexp>


08048379 <__finexp>:
8048379: dd d1 fst %st(1)
804837b: d9 fc frndint
804837d: dd d2 fst %st(2)
804837f: de e9 fsubrp %st,%st(1)
8048381: d9 f0 f2xm1
8048383: d9 e8 fld1
8048385: de c1 faddp %st,%st(1)
8048387: d9 fd fscale
8048389: c3 ret
Wolfgang Kern
2008-04-17 11:28:18 UTC
Permalink
"Bin Xin" wrote:
[...]
Post by Bin Xin
So I try to follow the stack state after each call to my_pow, it
looks ok for a few calls (9), then I see a weird thing: when "fld1" is
(gdb) ni
0x08048397 270
8: $st7 = 15
7: $st6 = 14
6: $st5 = 13
5: $st4 = 12
4: $st3 = 2
3: $st2 = 0
2: $st1 = 2
1: $st0 = 0
(gdb) ni
0x08048399 270
8: $st7 = 14
7: $st6 = 13
6: $st5 = 12
5: $st4 = 2
4: $st3 = 0
3: $st2 = 2
2: $st1 = 0
1: $st0 = -nan(0xc000000000000000)
So what am I missing here?
A classical stack-overrun ?
I'd try to insert a FFREE %st(7) before the FLD1
__
wolfgang
Jentje Goslinga
2008-04-18 01:25:15 UTC
Permalink
Post by Bin Xin
Post by Terje Mathisen
Post by Tim Roberts
I admit that I lost track part way through, but it looks to me like you
aren't cleaning up thefloatingpointstack when you are done. If you
leave stuff on the stack, sooner or later the stack will overflow (there
are only 8 entries, after all), which triggersfloatingpointexceptions.
The easy way to check this is to call your my_pow() function in a loop,
preferably while inside the debugger, and watch the x87 stack contents.
Thanks for replying.
So I try to follow the stack state after each call to my_pow, it
looks ok for a few calls (9), then I see a weird thing: when "fld1" is
As Terje says, check for FPU stack overflow.

If you are using normal calling conventions (as enforced by
compilers), your FPU stack should be empty upon entering any
function, and upon exit should be either empty (I) or contain
a single entry (II), in the case where the function returns
a floating point number:

void myfunc(...) (I)

double myfunc(...) (II)

To find out how many entries out off the available eight of the
FPU stack are used, inspect the FPU TAGS word with your debugger.
(NB: The TAGS word; NOT the FPU CONTROL OR STATUS word).
The TAGS word should read FFFF upon entry to the function (just
prior to the hardware call instruction) and upon exit should read
FFFF in case (I) and 3FFF in case (II), if I am not mistaken.
This is because the coding is from left to right and is 11 for an
empty register and 00 for a used register, so 3FFF means
0011 1111 1111 1111
which means that the first FPU register "st0" is in use and
st1...st7 are empty. Note that some debuggers display garbage even
in empty registers, so do not rely on the debugger FPU stack display.

This is as much as I know; I am not sure whether the used registers
should be contiguous, or what happens upon stack rotation, I haven't
investigated that.

Jen
Bin Xin
2008-04-18 20:47:00 UTC
Permalink
Post by Jentje Goslinga
Post by Bin Xin
Post by Terje Mathisen
Post by Tim Roberts
I admit that I lost track part way through, but it looks to me like you
aren't cleaning up thefloatingpointstack when you are done. If you
leave stuff on the stack, sooner or later the stack will overflow (there
are only 8 entries, after all), which triggersfloatingpointexceptions.
The easy way to check this is to call your my_pow() function in a loop,
preferably while inside the debugger, and watch the x87 stack contents.
Thanks for replying.
So I try to follow the stack state after each call to my_pow, it
looks ok for a few calls (9), then I see a weird thing: when "fld1" is
As Terje says, check for FPU stack overflow.
If you are using normal calling conventions (as enforced by
compilers), your FPU stack should be empty upon entering any
function, and upon exit should be either empty (I) or contain
a single entry (II), in the case where the function returns
void myfunc(...) (I)
double myfunc(...) (II)
To find out how many entries out off the available eight of the
FPU stack are used, inspect the FPU TAGS word with your debugger.
(NB: The TAGS word; NOT the FPU CONTROL OR STATUS word).
The TAGS word should read FFFF upon entry to the function (just
prior to the hardware call instruction) and upon exit should read
FFFF in case (I) and 3FFF in case (II), if I am not mistaken.
This is because the coding is from left to right and is 11 for an
empty register and 00 for a used register, so 3FFF means
0011 1111 1111 1111
which means that the first FPU register "st0" is in use and
st1...st7 are empty. Note that some debuggers display garbage even
in empty registers, so do not rely on the debugger FPU stack display.
This is as much as I know; I am not sure whether the used registers
should be contiguous, or what happens upon stack rotation, I haven't
investigated that.
Thanks, Jen. This is informative. Is there an existing tool that can
check a piece of assembly code, for registers not cleaned up?

bin
Jentje Goslinga
2008-04-19 02:08:18 UTC
Permalink
Post by Bin Xin
Post by Jentje Goslinga
Post by Bin Xin
Post by Terje Mathisen
Post by Tim Roberts
I admit that I lost track part way through, but it looks to me like you
aren't cleaning up thefloatingpointstack when you are done. If you
leave stuff on the stack, sooner or later the stack will overflow (there
are only 8 entries, after all), which triggersfloatingpointexceptions.
The easy way to check this is to call your my_pow() function in a loop,
preferably while inside the debugger, and watch the x87 stack contents.
Thanks for replying.
So I try to follow the stack state after each call to my_pow, it
looks ok for a few calls (9), then I see a weird thing: when "fld1" is
As Terje says, check for FPU stack overflow.
If you are using normal calling conventions (as enforced by
compilers), your FPU stack should be empty upon entering any
function, and upon exit should be either empty (I) or contain
a single entry (II), in the case where the function returns
void myfunc(...) (I)
double myfunc(...) (II)
To find out how many entries out off the available eight of the
FPU stack are used, inspect the FPU TAGS word with your debugger.
(NB: The TAGS word; NOT the FPU CONTROL OR STATUS word).
The TAGS word should read FFFF upon entry to the function (just
prior to the hardware call instruction) and upon exit should read
FFFF in case (I) and 3FFF in case (II), if I am not mistaken.
This is because the coding is from left to right and is 11 for an
empty register and 00 for a used register, so 3FFF means
0011 1111 1111 1111
which means that the first FPU register "st0" is in use and
st1...st7 are empty. Note that some debuggers display garbage even
in empty registers, so do not rely on the debugger FPU stack display.
This is as much as I know; I am not sure whether the used registers
should be contiguous, or what happens upon stack rotation, I haven't
investigated that.
Thanks, Jen. This is informative. Is there an existing tool that can
check a piece of assembly code, for registers not cleaned up?
bin
There is no existing tool per se but the FPU STATUS word reflects
the results of FPU operations and it contains a stack overflow bit.

Let's start simple and create a FPU stack overflow

finit ; initialize processor, setting control word to 0x0037
fldz
fldz
fldz
fldz
fldz
fldz
fldz
fldz
fldz ; trying to load a ninth element on the stack

The finit instruction initializes the FPU and sets the CONTROL
WORD to 0x0037. This means that all exceptions except overflow
are "masked". Masked means that the corresponding bit is 1 and
not 0 which would be unmasked. If an exception is masked then
the processor "takes a default internal action" which means that
execution merrily continues... almost certainly infecting your
data with Inf's and Nan's; if an exception is unmasked, then a
software exception handler is called. I don't know how this works
and it may be later, possibly MUCH later. I would stay away from
OS generated interrupts.

The lower bits in the FPU CONTROL WORD are
0 invalid operation
1 denormalized operand
2 zerodivide
3 overflow
4 underflow
5 precision
6 RESERVED
7 interrupt enable, 8087 only
8-9 rounding control 00=nearest etc

The bits in the FPU STATUS WORD are
0 invalid operation
1 denormalized operand
2 zerodivide
3 overflow
4 underflow
5 precision
6 STACK FAULT
7 interrupt request 8087, otherwise error summary status
8-10,14 condition code bits

We are not concerned here with the higher bits which relate
to arithmetic condition codes, much like the flags on the CPU.
Note that bit 6 in the CONTROL WORD is reserved, obviously
because stack faults should never be masked, since they are
programmer errors, not the result of "unlucky data".

Stepping through this sequence of instructions with a debugger
shows that on the 9-th FPU load, the status word is set to
0x3A41 which equals 00111010 01000001 in binary.
^ ^
[It also seems that st0 contains infinity].
The lower eight bits indicate that the "invalid operation"
and the "stack fault" bits are set, see table.
There are probably other conditions which generate an
"invalid operation", so we should concentrate on the "stack
fault" condition as reflected by the stack fault bit.

One way would be to write a function which tests the stack
fault bit something like:

int stack_fault()
{
__asm {
fstsw ax ; FPU status word to ax
and eax, 0x0040 ; isolate the stack fault bit
shr eax, 6 ; shift to least significant
};

/* return code is in eax as per C convention */
/* hope compiler accepts this; otherwise write */
/* asm function */
}

I am using inline assembler for clarity, not recommending it
and hope I haven't made a mistake but you get the idea. I
have no time to test this right now, have to go to work.
[I hope you have at least a 386, otherwise you have to store
the FPU status word to memory (presumably a word on the CPU
stack) and from there move it to ax.]
Now you can call this test function following some suspect
test code to see if there was a stack fault.

One more thing: The FPU exception status codes are "sticky"
and will remain set until someone clears FPU exceptions using
FCLEX or resets the FPU with FINIT.

Jen

nbaker2328
2008-03-19 06:08:55 UTC
Permalink
Arithmatic expressions are difficult to get correct no matter how good
you are at visualizing the stack manipulation. Trust me... save
yourself a lot of time and do it the easy way. Here is a hueristic
that works every time:

http://webster.cs.ucr.edu/AoA/Windows/HTML/RealArithmetica3.html#1001073

Also, remember that the Sub and Div instructions have alternate
incarnations which reverse the order of the operands.

Nathan.
Continue reading on narkive:
Loading...