think I would store the two bits in the bottom byte, and store the
22 bits in the upper 24 bits of the 32-bit word.
To load:
int32_t packed = array[x]; // load 22 + 2 bits
int32_t two_bits = packed & 3; // extract the 2-bit field
int32_t twentytwo_bits = packed >> 8; // extract the signed 22-
bit (actually 24-bit) field
to store:
int32_t twentytwo_bits; // 22 bit signed value sign extended
to 32-bits
int32_t two_bits; // two-bit field (not sign extended)
array{x] = (twentytwo_bits << 8) | two_bits.
Depending on what sort of comparison needs to be done, you can
probably vectorize this without too much trouble. Certainly the
shifts and Boolean operations are all there. vperm will be useful
for doing one step extraction of parts. You can do alignment at the
same time.
Hello,
I realise this was a few months back, but reading it got me thinking a
lot on this..
I'm thinking for this terrain generator it would be a good candidate
for vectorization,
but I'm not completely clued up on the subject. So that's a separate
topic.
On this topic I was considering packing two sets of 4 bits into a
single byte, to keep
memory usage to the minimum possible, but I'm not sure if the
operation overhead
this would cause is worth it..
If I was to use a whole byte for each value, that would double the
memory used,
but if I pack them, then it would double the ops required to read it,
correct?
Or is it possible to get the memory reduction without adding extra ops
somehow?