"I'm not proud of being a congenital pain in the ass. But I will take money for it."

Bitfields in Python using masks, properties, and closures

Sat 21 March 2020 | -- (permalink)

If you've done much work with old-school Internet protocols, you'll have encountered a lot of densly-packed protocol header definitions. One sees this less often now, but back in the day, bits were expensive, each bit of core memory having to be whittled by hand by a master craftsman, so we tended to pack things in pretty tightly. Python's struct library is good for anything that's aligned on a byte boundary, but what about a two bit field, followed by a three bit field, followed by a seven bit field...? Yeah, one can do messy things with shifting and masking, but it sure would be nice to encapsulate all of that in one place so the rest of your protocol implementation could just deal with the semantics rather than the encoding.

For this, we need to roll back to the days when dinosaurs roamed the machine rooms and Real Programmers wrote large programs in assembly language. Modern folk mythology notwithstanding, programmers in those days were just as fond of code correctness and labor saving devices as programmers now, it's just that the available tools were different.

One way of dealing with header definitions in that kind of environment was with bit masks. Yes, of course one could specify both a mask and a shift value, but that was error prone and hard to read. Even in those days, many programmers believed in letting the machine do the dirty work.

So, without further ado, we bring you a sample of a mask-based bitfield implementation, using Python closures and properties to present a sane attribute-based API.

class Whatever:

    def __init__(self, _value = 0, **kwargs):
        self._value = _value
        for k, v in kwargs.items():
            setattr(self, k, v)

    def __int__(self):
        return self._value

    def _bitfield(mask):
        scale = mask & (1 + ~mask)

        assert (mask + scale) & mask == 0, "Non-contiguous mask"

        def getter(self):
            return (self._value & mask) // scale

        def setter(self, newval):
            self._value &= ~mask
            self._value |=  mask & (newval * scale)

        return property(getter, setter)

    # Sample field definitions, replace with whatever your protocol needs

    f1 = _bitfield(0xFF000000)
    f2 = _bitfield(0x00800000)
    f3 = _bitfield(0x00700000)
    f4 = _bitfield(0x000C0000)
    f5 = _bitfield(0x00030000)
    f6 = _bitfield(0x0000FFF0)
    f7 = _bitfield(0x0000000F)

# Sample usage

whatever1 = Whatever(f1 = 3, f5 = 1, f6 = 18)

protocol_send(struct.pack("!I", whatever1))

whatever2 = Whatever(struct.unpack("!I", protocol_receive())[0])

if whatever2.f1 > 3:
    print("{0.f1} {0.f2} {0.f3} {0.f4} {0.f5} {0.f6} {0.f7}".format(whatever2))

One could of course code this as a generic bitfield constructor using getattr() and setattr(), combine it with struct packing and unpacking for a more complex header, and so forth.

The expression mask & (1 + ~mask) is an ancient bit of ones-complement math which returns the least significant bit of a contiguous mask. In its simplest form, as used here, this allows one to shift values in and out of fields via multiplication and division. For most purposes, this should be fine, but if one is really worried about runtime efficiency, one can translate this to shifts:

    def _bitfield(mask):
        scale = mask & (1 + ~mask)
        shift = scale.bit_length() - 1

        assert (mask + scale) & mask == 0, "Non-contiguous mask"

        def getter(self):
            return (self._value & mask) >> shift

        def setter(self, newval):
            self._value &= ~mask
            self._value |=  mask & (newval << shift)

If one is still stuck with Python 2, one can also calculate shift this way:

        shift = int(math.log(scale, 2))

Or, if the call to math.log() is too scary, this way:

        shift = 0
        assert scale != 0
        while scale & 1 == 0:
            scale >>= 1
            shift +=  1

Keep in mind that scale and shift are only calculated during construction of the property closures, not during execution of the resulting properties, so extreme effort here won't gain very much.

Last, if one wants to go really hard-core, one can specify the masks in binary rather than in hexadecimal, so that one can read them right out of the ASCII-graphics specifications, eg:

    #               +--------------------------------+
    #               |00000000001111111111222222222233|
    #               |01234567890123456789012345678901|
    #               +--------------------------------+
    f1 = _bitfield(0b11111111000000000000000000000000)
    f2 = _bitfield(0b00000000100000000000000000000000)
    f3 = _bitfield(0b00000000011100000000000000000000)
    f4 = _bitfield(0b00000000000011000000000000000000)
    f5 = _bitfield(0b00000000000000110000000000000000)
    f6 = _bitfield(0b00000000000000001111111111110000)
    f7 = _bitfield(0b00000000000000000000000000001111)

Making the mask argument a string to be fed to int(x, 2) after stripping non-digits so that one can define fields directly from ASCII graphics and figuring out how the contiguous-mask assertion works left as exercises for the reader.