Sat 21 March 2020 | -- (permalink)
If you've done much work with old-school Internet protocols, you'll
have encountered a lot of densly-packed protocol header definitions.
One sees this less often now, but back in the day, bits were
expensive, each bit of core memory having to be whittled by hand by a
master craftsman, so we tended to pack things in pretty tightly.
Python's struct
library is good for anything that's aligned on a
byte boundary, but what about a two bit field, followed by a three bit
field, followed by a seven bit field...? Yeah, one can do messy
things with shifting and masking, but it sure would be nice to
encapsulate all of that in one place so the rest of your protocol
implementation could just deal with the semantics rather than the
encoding.
For this, we need to roll back to the days when dinosaurs roamed the machine rooms and Real Programmers wrote large programs in assembly language. Modern folk mythology notwithstanding, programmers in those days were just as fond of code correctness and labor saving devices as programmers now, it's just that the available tools were different.
One way of dealing with header definitions in that kind of environment was with bit masks. Yes, of course one could specify both a mask and a shift value, but that was error prone and hard to read. Even in those days, many programmers believed in letting the machine do the dirty work.
So, without further ado, we bring you a sample of a mask-based bitfield implementation, using Python closures and properties to present a sane attribute-based API.
class Whatever:
def __init__(self, _value = 0, **kwargs):
self._value = _value
for k, v in kwargs.items():
setattr(self, k, v)
def __int__(self):
return self._value
def _bitfield(mask):
scale = mask & (1 + ~mask)
assert (mask + scale) & mask == 0, "Non-contiguous mask"
def getter(self):
return (self._value & mask) // scale
def setter(self, newval):
self._value &= ~mask
self._value |= mask & (newval * scale)
return property(getter, setter)
# Sample field definitions, replace with whatever your protocol needs
f1 = _bitfield(0xFF000000)
f2 = _bitfield(0x00800000)
f3 = _bitfield(0x00700000)
f4 = _bitfield(0x000C0000)
f5 = _bitfield(0x00030000)
f6 = _bitfield(0x0000FFF0)
f7 = _bitfield(0x0000000F)
# Sample usage
whatever1 = Whatever(f1 = 3, f5 = 1, f6 = 18)
protocol_send(struct.pack("!I", whatever1))
whatever2 = Whatever(struct.unpack("!I", protocol_receive())[0])
if whatever2.f1 > 3:
print("{0.f1} {0.f2} {0.f3} {0.f4} {0.f5} {0.f6} {0.f7}".format(whatever2))
One could of course code this as a generic bitfield constructor using
getattr()
and setattr()
, combine it with struct
packing and
unpacking for a more complex header, and so forth.
The expression mask & (1 + ~mask)
is an ancient bit of
ones-complement math which returns the least significant bit of a
contiguous mask. In its simplest form, as used here, this allows one
to shift values in and out of fields via multiplication and division.
For most purposes, this should be fine, but if one is really worried
about runtime efficiency, one can translate this to shifts:
def _bitfield(mask):
scale = mask & (1 + ~mask)
shift = scale.bit_length() - 1
assert (mask + scale) & mask == 0, "Non-contiguous mask"
def getter(self):
return (self._value & mask) >> shift
def setter(self, newval):
self._value &= ~mask
self._value |= mask & (newval << shift)
If one is still stuck with Python 2, one can also calculate shift
this way:
shift = int(math.log(scale, 2))
Or, if the call to math.log()
is too scary, this way:
shift = 0
assert scale != 0
while scale & 1 == 0:
scale >>= 1
shift += 1
Keep in mind that scale
and shift
are only calculated during
construction of the property closures, not during execution of the
resulting properties, so extreme effort here won't gain very much.
Last, if one wants to go really hard-core, one can specify the masks in binary rather than in hexadecimal, so that one can read them right out of the ASCII-graphics specifications, eg:
# +--------------------------------+
# |00000000001111111111222222222233|
# |01234567890123456789012345678901|
# +--------------------------------+
f1 = _bitfield(0b11111111000000000000000000000000)
f2 = _bitfield(0b00000000100000000000000000000000)
f3 = _bitfield(0b00000000011100000000000000000000)
f4 = _bitfield(0b00000000000011000000000000000000)
f5 = _bitfield(0b00000000000000110000000000000000)
f6 = _bitfield(0b00000000000000001111111111110000)
f7 = _bitfield(0b00000000000000000000000000001111)
Making the mask argument a string to be fed to int(x, 2)
after
stripping non-digits so that one can define fields directly from ASCII
graphics and figuring out how the contiguous-mask assertion works left
as exercises for the reader.