Discussion:
Assembly contest idea
Rick C. Hodgin
2020-10-13 19:47:32 UTC
I was thinking if we could come up with a way to optimize a real-world
problem I have right now.

I have two rectangles r1 and r2. I need to determine if r1 is 1) fully
inside, 2) partially inside, or 3) not inside r2.

The coordinates are all signed integers ranging from -32768 to +32767.

What do you think?

-----
BTW, the reason I haven't been here for our last competition is because
I've been pursuing an idea that occurred to me about ten years ago.
I've been working on an Open GL simulation bringing the idea to a
our solar system:

Perspective:

Orthographic:

--
Rick C. Hodgin
Melzzzzz
2020-10-15 02:00:52 UTC
Post by Rick C. Hodgin
I was thinking if we could come up with a way to optimize a real-world
problem I have right now.
I have two rectangles r1 and r2. I need to determine if r1 is 1) fully
inside, 2) partially inside, or 3) not inside r2.
The coordinates are all signed integers ranging from -32768 to +32767.
Too old for this ;)

I think that I solved this problem some 30 years ago :P
Post by Rick C. Hodgin
What do you think?
--
current job title: senior software engineer

press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Svi smo svedoci - oko 3 godine intenzivne propagande je dovoljno da jedan narod poludi -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
Rick C. Hodgin
2020-10-15 04:11:41 UTC
Post by Melzzzzz
Post by Rick C. Hodgin
I have two rectangles r1 and r2. I need to determine if r1 is 1) fully
inside, 2) partially inside, or 3) not inside r2.
The coordinates are all signed integers ranging from -32768 to +32767.
Too old for this ;)
I think that I solved this problem some 30 years ago :P
I spent some time searching online for branchless compares to see if I
could find some optimizations that would be useful.

I remember coming across a page that showed all kinds of optimizations
for adding certain values, subtracting certain values, etc.

I'm thinking there is a way to take the values and perform only ADD,
SUB, INC, DEC, OR, and XOR to get a result. If it could be done for a
type of between() function, the rest becomes academic.
--
Rick C. Hodgin
Terje Mathisen
2020-10-15 06:22:31 UTC
Post by Rick C. Hodgin
Post by Melzzzzz
I have two rectangles r1 and r2.  I need to determine if r1 is 1) fully
inside, 2) partially inside, or 3) not inside r2.
The coordinates are all signed integers ranging from -32768 to +32767.
Too old for this ;)
I think that I solved this problem some 30 years ago :P
I spent some time searching online for branchless compares to see if I
could find some optimizations that would be useful.
I remember coming across a page that showed all kinds of optimizations
for adding certain values, subtracting certain values, etc.
I'm thinking there is a way to take the values and perform only ADD,
SUB, INC, DEC, OR, and XOR to get a result.  If it could be done for a
type of between() function, the rest becomes academic.
You can obviously do a bunch of parallel compares, using SIMD opcodes:

Each AVX-512 register would hold 32 16-bit ints, so you can splat the
ll/ur corners (4 values) of both rectangles (8 total) across up to 8
locations:
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Terje Mathisen
2020-10-15 06:40:41 UTC
Post by Rick C. Hodgin
Post by Melzzzzz
I have two rectangles r1 and r2.  I need to determine if r1 is 1) fully
inside, 2) partially inside, or 3) not inside r2.
The coordinates are all signed integers ranging from -32768 to +32767.
Too old for this ;)
I think that I solved this problem some 30 years ago :P
I spent some time searching online for branchless compares to see if I
could find some optimizations that would be useful.
I remember coming across a page that showed all kinds of optimizations
for adding certain values, subtracting certain values, etc.
I'm thinking there is a way to take the values and perform only ADD,
SUB, INC, DEC, OR, and XOR to get a result.  If it could be done for a
type of between() function, the rest becomes academic.
Did not finish my previous post:

The idea is that we need to compare the left edge of rectangle one
against both the left and right edge of rectangle2, and the same for the
right edge, so this is 8 compares for the vertical lines. Add the same
for the horizontal and you need 16 compares which would fit in a much
more common AVX register.

R2 totally inside R1 becomes

(R2.l >= R1.l) AND (R2.r <= R1.r) AND
(R2.b >= R1.b) AND (R2.a <= R1.a)

i.e. relatively simple, while zero overlap is much harder, and related
to the clipping question from last week: It requires all of R2 to be
either to the left, right, above or below, so there are 4 different
masks that give the same result and you have to OR those together:

(R2.r < R1.l) OR (R2.a < R1.b) OR (R2.l >= R1.r) OR (R2.b >= R1.a)

All that remains is to setup all those values in the correct spot so
that these 8 compares can be done (some of them needs to be inverted
after the fact obviously!) with a single compare_greater_or_equal()
instruction. Seems doable by duplicating the R1 values so that each
occur twice, then use a permute on the R2 values to put them in the
right spots, do the compare, flip half the results and then the slowest
part: Horizontally combine them with and AND for the inside result, with
OR for outside, and then all partial overlaps fall out by being neither
of these.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Rick C. Hodgin
2020-10-15 16:31:29 UTC
Post by Terje Mathisen
The idea is that we need to compare the left edge of rectangle one
against both the left and right edge of rectangle2, and the same for the
right edge, so this is 8 compares for the vertical lines. Add the same
for the horizontal and you need 16 compares which would fit in a much
more common AVX register.
R2 totally inside R1 becomes
(R2.l >= R1.l) AND (R2.r <= R1.r) AND
(R2.b >= R1.b) AND (R2.a <= R1.a)
i.e. relatively simple, while zero overlap is much harder, and related
to the clipping question from last week: It requires all of R2 to be
either to the left, right, above or below, so there are 4 different
(R2.r < R1.l) OR (R2.a < R1.b) OR (R2.l >= R1.r) OR (R2.b >= R1.a)
All that remains is to setup all those values in the correct spot so
that these 8 compares can be done (some of them needs to be inverted
after the fact obviously!) with a single compare_greater_or_equal()
instruction. Seems doable by duplicating the R1 values so that each
occur twice, then use a permute on the R2 values to put them in the
right spots, do the compare, flip half the results and then the slowest
part: Horizontally combine them with and AND for the inside result, with
OR for outside, and then all partial overlaps fall out by being neither
of these.
I'm thinking there's a relationship between the various values, such
that for an inside test we will know:

t
___
l | | r // left, right, top, bottom
|___|

b

r2.l is between r1.l and r1.r
r2.r is between r1.l and r1.r

And the same will be true for r2.top and r2.bottom:

r2.t is between r1.t and r1.b
r2.b is between r1.t and r1.b

And if we assert that each l is less than r, and each t is less than b,
is there some mathematical formula we can conclude?

if r2.r - r1.l - r2.l > x
and r2.r - r1.r - r2.l > x
and r2.b - r1.t - r2.t > y
and r2.t - r1.t - r2.t > y

then we know the relationship is true? It would be determining x and y,
which surely are proportional to the rectangles and/or their
relationships. And once determined, we could possibly simplify the
expressions to their minimums.

Something like that? I don't know. My mind is hazy on all this.
--
Rick C. Hodgin
wolfgang kern
2020-10-15 07:10:45 UTC