View Source arrow_bitmap (arrow v0.1.0)
Validity Bitmap implementation for arrow.
Defines a function validity_bitmap/1 to return the Validity Bitmap[1] along with the Null Count[2], of an Array.
An important thing to consider about our implementation of the Null Count is that we need to support both undefined and nil as null values as they are the conventions for null values in Erlang, and Elixir respectively.
There are 5 important characteristics to remember about the validity bitmap:
- A null value is represented by a 0 bit, and a non null value by a 1 bit.
- Every 8 elements's validities are batched into a byte, which are then reversed as Arrow uses least-significant bit (LSB) numbering (more in attached reference).
- If a "batch" consists of less than 8 elements, its validity needs to be padded by 0 bits so that it can make a byte.
- Each byte is stored in a slot of a Buffer (see docs for
arrow_buffer). This buffer with the validities of each batch of 8 elements make up what is called the Validity Bitmap. - If the Null Count is 0, we can allocate the Validity Bitmap as a NULL pointer (which in Erlang's case is
undefined).
[1]: https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps
[2]: https://arrow.apache.org/docs/format/Columnar.html#null-countLink to this section Summary
Functions
Returns the Validity Bitmap along with the Null Count, of an Array.
Link to this section Functions
-spec validity_bitmap(Value :: [arrow_type:native_type()] | list()) -> {Bitmap :: #buffer{}, non_neg_integer()}.