Discussion:
Bit Swizzling
(too old to reply)
Rick C. Hodgin
2020-09-05 00:33:23 UTC
Permalink
I'm not sure where to ask this question, so I pushed it out to several
groups. You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation. So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

Input: 07 06 05 04 03 02 01 00
Output: 05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

v = get_value();
o = 0;
swizzle1(o, v, 0, 6);
swizzle1(o, v, 1, 0);
swizzle1(o, v, 2, 3);
swizzle1(o, v, 3, 1);
swizzle1(o, v, 4, 2);
swizzle1(o, v, 5, 7);
swizzle1(o, v, 6, 4);
swizzle1(o, v, 7, 5);

// Untested, off the top of my head
void swizzle(unsigned char& o, unsigned char v, int s, int d)
{
o |= (((v & (1 << s)) >> s) << d);
}

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence. It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

Thank you in advance.
--
Rick C. Hodgin
Alexei A. Frounze
2020-09-05 01:01:00 UTC
Permalink
On Friday, September 4, 2020 at 5:44:20 PM UTC-7, Rick C. Hodgin wrote:
...
Post by Rick C. Hodgin
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?
For example, if I have an 8-bit byte and I want to swizzle the bits
Input: 07 06 05 04 03 02 01 00
Output: 05 04 07 02 01 03 00 06
What's wrong with a 256-byte look-up table?

If you had to swizzle entire bytes, you could look up
the appropriate SIMD instruction.
MIPS MSA has a shuffle instruction for that.
I'm not up to date on x86 SSE/AVX.

Alex
Bart
2020-09-05 11:33:32 UTC
Permalink
Post by Rick C. Hodgin
-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?
For example, if I have an 8-bit byte and I want to swizzle the bits
    Input:   07 06 05 04 03 02 01 00
   Output:   05 04 07 02 01 03 00 06
If it's just 8-bit bytes, then it's easy: use any method to build a map
that translates one 8-bit value to its swizzled version. I assume it
will be used many millions of times.
Post by Rick C. Hodgin
    v = get_value();
    o = 0;
    swizzle1(o, v, 0, 6);
    swizzle1(o, v, 1, 0);
    swizzle1(o, v, 2, 3);
    swizzle1(o, v, 3, 1);
    swizzle1(o, v, 4, 2);
    swizzle1(o, v, 5, 7);
    swizzle1(o, v, 6, 4);
    swizzle1(o, v, 7, 5);
    // Untested, off the top of my head
    void swizzle(unsigned char& o, unsigned char v, int s, int d)
    {
        o |= (((v & (1 << s)) >> s) << d);
    }
It took me a while to figure out what this means, which appears to be
take bit s of v, and store it as bit d of o. (But o has to start at 0
since |= means it can't change a 1 in o to 0.)

In my language it would be simply this:

o.[d] := v.[s]

although there, it can write both 1s and 0s into o.
Post by Rick C. Hodgin
And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.
However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.
Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?
The algorithm that works on what, bits of C++ code? What is its input?
What are the aims: to make it fast? Will the mapping always be known
constants, or variables? Will all bits be translated or just some?

And, what would be its output?

If it's just reordering the bits in a word (where you don't map, for
example, both bits 5 and 7 to bit 3), that doesn't sound too hard
(although it probably is!). But importantly, this isn't done in-place.

Up to 8 or 16 bits, probably you can use tables, but the tables need to
be set up.

Otherwise the simplest thing I can think of, for an N-bit word, is a
table of translation pairs. For your 8-bit example this will just be the
parameters to your swizzle function, but as data:

(0,6)
(1,0)
(2,3)
(3,1)
(4,2)
(5,7)
(6,4)
(7,5)

This be input to a routine that will do the job, or it can be the input
to an algorithem that generates, for example, one giant logical expression.

I can probably just about do that last part, if it doesn't need to be
optimised, example:

y = ((x&1)<<6)|((x&2)>>1)|((x&4)<<1)|((x&8)>>2)|
((x&16)>>2)|((x&32)<<2)|((x&64)>>2)|((x&128)>>2);

where x is the input word, and y is the output word. I did this with a
12-line script based on that data input. (Not tested; may need extra
parens.)

I'm sure gcc-O3 will find some optimisations in there if there are any.
olcott
2020-09-08 16:25:58 UTC
Permalink
Post by Rick C. Hodgin
I'm not sure where to ask this question, so I pushed it out to several
groups.  You need not reply to all of them if you don't think it is a
topical subject.
I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation.  So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.
-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?
For example, if I have an 8-bit byte and I want to swizzle the bits
    Input:   07 06 05 04 03 02 01 00
   Output:   05 04 07 02 01 03 00 06
    v = get_value();
    o = 0;
    swizzle1(o, v, 0, 6);
    swizzle1(o, v, 1, 0);
    swizzle1(o, v, 2, 3);
    swizzle1(o, v, 3, 1);
    swizzle1(o, v, 4, 2);
    swizzle1(o, v, 5, 7);
    swizzle1(o, v, 6, 4);
    swizzle1(o, v, 7, 5);
    // Untested, off the top of my head
    void swizzle(unsigned char& o, unsigned char v, int s, int d)
    {
        o |= (((v & (1 << s)) >> s) << d);
    }
And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.
However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.
Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?
Thank you in advance.
https://www.techopedia.com/definition/22413/swizzling
--
Copyright 2020 Pete Olcott
olcott
2020-09-08 16:50:03 UTC
Permalink
Post by olcott
Post by Rick C. Hodgin
I'm not sure where to ask this question, so I pushed it out to several
groups.  You need not reply to all of them if you don't think it is a
topical subject.
I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation.  So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.
-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?
For example, if I have an 8-bit byte and I want to swizzle the bits
     Input:   07 06 05 04 03 02 01 00
    Output:   05 04 07 02 01 03 00 06
     v = get_value();
     o = 0;
     swizzle1(o, v, 0, 6);
     swizzle1(o, v, 1, 0);
     swizzle1(o, v, 2, 3);
     swizzle1(o, v, 3, 1);
     swizzle1(o, v, 4, 2);
     swizzle1(o, v, 5, 7);
     swizzle1(o, v, 6, 4);
     swizzle1(o, v, 7, 5);
     // Untested, off the top of my head
     void swizzle(unsigned char& o, unsigned char v, int s, int d)
     {
         o |= (((v & (1 << s)) >> s) << d);
     }
And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.
However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.
Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?
Thank you in advance.
https://www.techopedia.com/definition/22413/swizzling
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_R, Format.Swizzle[0]);
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_G, Format.Swizzle[1]);
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_B, Format.Swizzle[2]);
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_A, Format.Swizzle[3]);
https://gli.g-truc.net/0.7.0/index.html
--
Copyright 2020 Pete Olcott
Kaz Kylheku
2020-09-08 17:47:38 UTC
Permalink
Post by olcott
https://www.techopedia.com/definition/22413/swizzling
[Newsgroups trimmed]

I'm afraid that is a poorly researched, woefully incomplete article.

The Wikipedia has a substantial better coverage, starting with the
disambiguating page that currently leads to four topics.

https://en.wikipedia.org/wiki/Swizzling

The general pattern is that swizzling refers to any sort of clever
substitution or rearrangement that solves a problem. The reference to
mixing alcoholic cocktail drinks gives the word a shady nuance, like the
rearrangement retrofitted into a solution that was designed without it,
between components that are none the wiser, and are being fooled. This
nuance is supported in with British English, in which "swizzle" is an
alternative word for "swiz", which refers to a swindle or disappointing
situation in general. If we combine the "mixing" meaning with the
"swindle" nuance, the word gives a sense that something is being
exchanged without someone's or something's knowledge.
R.Wieser
2020-09-08 18:26:31 UTC
Permalink
(Too many xpost groups, had to remove one)

Rick,
Post by Rick C. Hodgin
o |= (((v & (1 << s)) >> s) << d);
If you reverse the two tables (having the output bits in order from high to
low) you could left-shift the output by one and than OR the output with the
right-shifted input masked with 1. In this specific case (all eight bits
swizzeled) you do not even need to clear the output.

My C(++) isn't worth anything, but I imagine it could look something like
this :

o <<= 1
o |= (v >> s) & 1

That takes, at least on a x86, 4 machine instructions per bit.

The thing with writing C(++) that should work /everywhere/ is that you can't
use optimalisations for a specific processor.

If you would use the x86, which has instructions that use the Carry flag as
the ninth bit, you would only need 2 machine instructions per bit (rotate
desired source bit into carry, rotate carry bit into target).

And a remark : you've made that "swizzle" a function. Which means you will
probably have a relative large overhead, possibly doubling if not tripling
the ammount of instructions executed for each bit extraction and insertion
(starting with the "call" and "return"). Rewiting it as a macro would
probably be a good idea.

In the case of an x86 and some smart-ass usage of its nine-bit rotate
instructions means that a full 8 bit swizzle uses only 16 instructions (or
17 if you want the source to be unchanged afterwards) - both code /and/
execution ...

Regards,
Rudy Wieser
David Brown
2020-09-08 20:00:49 UTC
Permalink
Post by R.Wieser
(Too many xpost groups, had to remove one)
Rick,
Post by Rick C. Hodgin
o |= (((v & (1 << s)) >> s) << d);
If you reverse the two tables (having the output bits in order from high to
low) you could left-shift the output by one and than OR the output with the
right-shifted input masked with 1. In this specific case (all eight bits
swizzeled) you do not even need to clear the output.
My C(++) isn't worth anything, but I imagine it could look something like
o <<= 1
o |= (v >> s) & 1
That takes, at least on a x86, 4 machine instructions per bit.
The thing with writing C(++) that should work /everywhere/ is that you can't
use optimalisations for a specific processor.
If you would use the x86, which has instructions that use the Carry flag as
the ninth bit, you would only need 2 machine instructions per bit (rotate
desired source bit into carry, rotate carry bit into target).
A good compiler can sometimes (not always) do that kind of thing in the
generated code, if it recognises the pattern. Often, however, these
kind of small instructions are effectively free on x86 processors as you
wait for memory (of course that depends very heavily on the rest of the
code and how you are using this function).
Post by R.Wieser
And a remark : you've made that "swizzle" a function. Which means you will
probably have a relative large overhead, possibly doubling if not tripling
the ammount of instructions executed for each bit extraction and insertion
(starting with the "call" and "return"). Rewiting it as a macro would
probably be a good idea.
Such a function would be (or should be) inlined by the compiler - using
an inline function is almost always a better choice than a macro if you
don't /need/ it to be a macro.
Post by R.Wieser
In the case of an x86 and some smart-ass usage of its nine-bit rotate
instructions means that a full 8 bit swizzle uses only 16 instructions (or
17 if you want the source to be unchanged afterwards) - both code /and/
execution ...
Regards,
Rudy Wieser
Rick C. Hodgin
2020-09-09 01:51:01 UTC
Permalink
Post by R.Wieser
(Too many xpost groups, had to remove one)
Rick,
Post by Rick C. Hodgin
o |= (((v & (1 << s)) >> s) << d);
If you reverse the two tables (having the output bits in order from high to
low) you could left-shift the output by one and than OR the output with the
right-shifted input masked with 1. In this specific case (all eight bits
swizzeled) you do not even need to clear the output.
My C(++) isn't worth anything, but I imagine it could look something like
o <<= 1
o |= (v >> s) & 1
That takes, at least on a x86, 4 machine instructions per bit.
The thing with writing C(++) that should work /everywhere/ is that you can't
use optimalisations for a specific processor.
I appreciate your response, Rudy. I'm writing my own CAlive language,
and my request for an algorithm is to have my internal intermediate
language able to then generate optimum code sequences for a target ISA
given the constraints of that ISA.

It's a fairly complex task which is why I reached out for it. I'm
adding swizzle operator support so you can name an operator and apply
inlined swizzle operation to data (as an operator and not as a func-
tion call).
Post by R.Wieser
And a remark : you've made that "swizzle" a function. Which means you will
probably have a relative large overhead, possibly doubling if not tripling
the ammount of instructions executed for each bit extraction and insertion
(starting with the "call" and "return"). Rewiting it as a macro would
probably be a good idea.
It was an example only of a worst-case scenario for swizzling data. I'm
sorry that wasn't more clear. :-)
--
Rick C. Hodgin
R.Wieser
2020-09-09 08:53:56 UTC
Permalink
Rick,
Post by Rick C. Hodgin
I appreciate your response, Rudy. I'm writing my own CAlive language,
I already thought I recognised your name (and its weight around here), and
by it wondered if I could tell you anything at all.
Post by Rick C. Hodgin
and my request for an algorithm is to have my internal intermediate
language able to then generate optimum code sequences for a target ISA
given the constraints of that ISA.
It seems to me that I concentrated on the wrong thing. Thanks for not
biting my head off. :-)

Regards,
Rudy Wieser

Loading...