View Source arrow_array behaviour (arrow v0.1.0)
Provides a record, a behaviour, and functions to work with Apache Arrow Arrays.
This module as mentioned, provides conveniences for working with Apache Arrow Arrays[1] of different Layouts[2]. Firstly, it provides a record to represent all the Layout. Secondly, it provides a behaviour for different layouts to adhere to common functionality. Lastly, it provides functions to work with Arrays in a common manner.
the-structure-of-an-array
The Structure of an Array
arrow's implementation of an Array has the following fields in its record definition:
layout, of typeatom(), which represents the Layout of the Array.type, of typearrow_type:arrow_type(), which represents the Logical Type[3] of the Array.len, of typepos_integer(), which represents the Array's Length[4].element_len, of typepos_integer()orundefined, which represents the Length of each element in an Array. Currently it only has an integer value in the Fixed-Size List Layout[5]null_count, of typenon_neg_integer(), which represents the Array's Null Count[4], or the number of undefined values in the Array.validity_bitmap, which is a buffer (arrow_buffer) or the atomundefined, which represents the Array's Validity Bitmap[6].offsets, which is a buffer (arrow_buffer) which represents the Offsets[7], or the start position of each slot in the data buffer of an Array.data, which is a buffer (arrow_buffer), which represents the Array's Value Buffer, whose layout differs based on the Array Layout.
Certain fields are not required for certain layouts. For example, for the Fixed-Sized Primitive Layout, the offsets field is not required, in which case it is assigned as undefined. Similarly, the validity_bitmap is not required if there are no null values, in which case it is also assigned as undefined.
the-behaviour-of-an-array
The Behaviour of an Array
As of right now, a layout needs to implement the from_erlang/2 callback. This is then used in the from_erlang/3 function to create new arrays.
functions-for-working-with-arrays
Functions for working with Arrays
As of right now, only functions to access the various fields, to create new arrays, and to serialize arrays to arrow exist.
references
References
[1]: https://arrow.apache.org/docs/format/Glossary.html#term-array
[2]: https://arrow.apache.org/docs/format/Glossary.html#term-physical-layout
[3]: https://arrow.apache.org/docs/format/Glossary.html#term-type
[4]: https://arrow.apache.org/docs/format/Columnar.html#null-count
[5]: https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout
[6]: https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps
[7]: https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layoutLink to this section Summary
Callbacks
Functions
Serializes an array into the Arrow binary form.
Link to this section Types
-type array() :: #array{}.
-type layout() :: fixed_primitive | variable_binary | fixed_list | variable_list.
Link to this section Callbacks
-callback from_erlang(Value :: [arrow_type:native_type()], Opts :: map()) -> Array :: #array{}.
Link to this section Functions
-spec data(Array :: #array{}) -> Data :: #buffer{} | #array{} | undefined.
-spec element_len(Array :: #array{}) -> Length :: pos_integer() | undefined.
-spec from_erlang(Layout :: layout(), Value :: [arrow_type:native_type()], Opts :: map() | arrow_type:arrow_type()) -> Array :: #array{}.
-spec layout(Array :: #array{}) -> Layout :: layout().
-spec len(Array :: #array{}) -> Length :: pos_integer().
-spec null_count(Array :: #array{}) -> NullCount :: non_neg_integer().
-spec offsets(Array :: #array{}) -> Offsets :: #buffer{} | undefined.
-spec to_arrow(Array :: #array{}) -> Arrow :: binary().
Serializes an array into the Arrow binary form.
Serializes the buffers of an Array and concatenates them in the following order:
validityoffsetsdata
In case an array doesn't have any of the following buffers, it is ommitted. (e.g. validity in arrays with a null count of 0, offsets in fixed primitive arrays). In the case of a nested array, data will be serialized form of nested array.
-spec type(Array :: #array{}) -> Type :: arrow_type:arrow_type().
-spec validity_bitmap(Array :: #array{}) -> ValidityBitmap :: #buffer{} | undefined.