8086 32-bit multiply

Discussion:

(too old to reply)

Paul Edwards

2021-04-23 11:42:29 UTC

Hi.

Since 1994 I have been working on a project to
create a public domain version of MSDOS, called
PDOS. There is an 8086 version and an 80386
version which can be found here:

http://pdos.sourceforge.net/

I took some shortcuts along the way to get it to
work at all, and one of those has finally bitten me.

I'm getting incorrect results from this:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm

; multiply cx:bx by dx:ax, result in dx:ax

public __I4M
__I4M:
public __U4M
__U4M:
public f_lxmul@
f_lxmul@ proc
push bp
mov bp,sp
push cx

push ax
mul cx
mov cx, ax
pop ax
mul bx
add dx, cx

pop cx
pop bp
ret
f_lxmul@ endp

Does anyone have some public domain (explicit notice)
8086 (not 80386) code they are willing to share to do
this? Not LGPL. Not BSD. Public domain. The entire
codebase of tens of thousands of lines of code is
public domain.

Also let me know if you wish to be acknowledged in
the source code and/or code check-in. Some people
prefer to remain anonymous.

There are other routines in there that may not work
properly either, but I haven't come across them yet.

Thanks. Paul.

DJ Delorie

2021-04-23 23:24:37 UTC

Permalink

Post by Paul Edwards
; multiply cx:bx by dx:ax, result in dx:ax

Such would have three multiplies and a few adds:

LSW = bx * ax (lower 16, save upper 16 in XX)

MSW = bx * dx + cx * ax + XX (from lsw)

Paul Edwards

2021-04-24 03:16:04 UTC

Permalink

Post by DJ Delorie

Post by Paul Edwards
; multiply cx:bx by dx:ax, result in dx:ax

LSW = bx * ax (lower 16, save upper 16 in XX)
MSW = bx * dx + cx * ax + XX (from lsw)

Thanks for the algorithm! I thought I might be able to do that,
but my brain started to melt down. Here's what I came up with,
which causes a hang, but at least it happened after I got the
results of some calculations. I'll see if I can figure out what
is happening.

; multiply cx:bx by dx:ax, result in dx:ax

public __I4M
__I4M:
public __U4M
__U4M:
public f_lxmul@
f_lxmul@ proc
push bp
mov bp,sp
push bx
push cx
push si
push di

push ax
push bx

; I think this multiples bx * ax and puts the upper 16 bits in ax
; and lower 16 bits in bx
mul bx

; Save upper 16 in si and lower 16 in di
mov si, ax
mov di, bx

; This does the equivalent of bx * dx
pop bx
mov ax, dx
mul bx
mov dx, ax

; Now we do cx * ax with upper 16 bits in ax and lower in cx
pop ax
mul cx

; Now we need to add the results of those two multiplies together
; lower 16 bits first, so we can get the carry
push bp ; ran out of registers!
mov bp, bx
mov bx, ax
mov ax, 1
add dx, cx
jc noone
mov ax, 1
noone:

push ax

; Now the other lower 16 bits we saved
mov ax, 1
add dx, di
jc noone2
mov ax, 1
noone2:

push ax

; Upper 16 bits
mov ax, bx
add bx, ax
pop ax
add bx, ax ; one carry
pop ax
add bx, ax ; the other carry
mov ax, bp
add bx, ax

; store in proper output register
mov dx, bx

pop bp

pop di
pop si
pop cx
pop bx
pop bp
ret
f_lxmul@ endp

BFN. Paul.

wolfgang kern

2021-04-24 00:46:36 UTC

Permalink

On 23.04.2021 13:42, Paul Edwards wrote:

[x8086 only]

Post by Paul Edwards
; multiply cx:bx by dx:ax, result in dx:ax

the result of 32*32 bit doesn't fit into 32 bit.
either go with the given limits (16*16 bit) or
build a cascade with intermediate variables aka
MUL-ADD chains.
__
wolfgang

Paul Edwards

2021-04-24 03:17:33 UTC

Permalink

Post by wolfgang kern
[x8086 only]

Post by Paul Edwards
; multiply cx:bx by dx:ax, result in dx:ax

the result of 32*32 bit doesn't fit into 32 bit.

Good point. I didn't think of that. I can't multiply
17 bits by 17 bits, one of the registers needs to
be 0. But I assume I need to at least overflow in
a predictable manner.

Post by wolfgang kern
either go with the given limits (16*16 bit) or
build a cascade with intermediate variables aka
MUL-ADD chains.

See my most recent post. :-)

BFN. Paul.

wolfgang kern

2021-04-24 08:36:46 UTC

Permalink

Post by Paul Edwards

Post by wolfgang kern
[x8086 only]

Post by Paul Edwards
; multiply cx:bx by dx:ax, result in dx:ax

the result of 32*32 bit doesn't fit into 32 bit.

Good point. I didn't think of that. I can't multiply
17 bits by 17 bits, one of the registers needs to
be 0. But I assume I need to at least overflow in
a predictable manner.

Post by wolfgang kern
either go with the given limits (16*16 bit) or
build a cascade with intermediate variables aka
MUL-ADD chains.

See my most recent post. :-)

you create a stack frame but use not a single variable there.
and it may hang because your stack isn't balanced.
__
wolfgang

Anton Ertl

2021-04-24 14:01:21 UTC

Permalink

Post by Paul Edwards

Post by wolfgang kern
[x8086 only]

Post by Paul Edwards
; multiply cx:bx by dx:ax, result in dx:ax

the result of 32*32 bit doesn't fit into 32 bit.

Good point. I didn't think of that. I can't multiply
17 bits by 17 bits, one of the registers needs to
be 0. But I assume I need to at least overflow in
a predictable manner.

The usual way is to produce the lower 32 bits of the result, i.e.,
produce a*b mod 2^32. And thanks to the magic of 2s-complement
arithmetic, the result is the same for unsigned multiplication and for
signed multiplication (the results for the high 32 bits would differ,
but you are not interested in that).

- anton

--
M. Anton Ertl Some things have to be seen to be believed
***@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Terje Mathisen

2021-04-24 10:17:08 UTC

Permalink

Post by Paul Edwards
Hi.
Since 1994 I have been working on a project to
create a public domain version of MSDOS, called
PDOS. There is an 8086 version and an 80386
http://pdos.sourceforge.net/
I took some shortcuts along the way to get it to
work at all, and one of those has finally bitten me.
https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm
; multiply cx:bx by dx:ax, result in dx:ax
public __I4M
public __U4M
push bp
mov bp,sp
push cx
push ax
mul cx
mov cx, ax
pop ax
mul bx
add dx, cx
pop cx
pop bp
ret

As several have noted, the code above is missing at least one MUL!

Please test it, then feel free to use (with or without attribution) this
totally untested but reasonably efficent/short code:

mov si,ax
mov di,dx
mul cx ;; hi * lo
xchg ax,di ;; First mul saved, grab org dx
mul bx ;; lo * hi
add di,ax ;; top word of result

mov ax,si ;; retrieve original AX
mul bx ;; lo * lo
add dx,di

At this point DX:AX has the low 32 bits of the multiplication result.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Paul Edwards

2021-04-24 21:00:07 UTC

Permalink

Post by Terje Mathisen

As several have noted, the code above is missing at least one MUL!
Please test it, then feel free to use (with or without attribution) this
mov si,ax
mov di,dx
mul cx ;; hi * lo
xchg ax,di ;; First mul saved, grab org dx
mul bx ;; lo * hi
add di,ax ;; top word of result
mov ax,si ;; retrieve original AX
mul bx ;; lo * lo
add dx,di
At this point DX:AX has the low 32 bits of the multiplication result.

Thanks so much!!!

I have tested it and it works fine. I have committed the
change, with attribution:

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm

BFN. Paul.