View Source arrow_array behaviour (arrow v0.1.0)

Provides a record, a behaviour, and functions to work with Apache Arrow Arrays.

This module as mentioned, provides conveniences for working with Apache Arrow Arrays[1] of different Layouts[2]. Firstly, it provides a record to represent all the Layout. Secondly, it provides a behaviour for different layouts to adhere to common functionality. Lastly, it provides functions to work with Arrays in a common manner.

the-structure-of-an-array

The Structure of an Array

arrow's implementation of an Array has the following fields in its record definition:

  1. layout, of type atom(), which represents the Layout of the Array.
  2. type, of type arrow_type:arrow_type(), which represents the Logical Type[3] of the Array.
  3. len, of type pos_integer(), which represents the Array's Length[4].
  4. element_len, of type pos_integer() or undefined, which represents the Length of each element in an Array. Currently it only has an integer value in the Fixed-Size List Layout[5]
  5. null_count, of type non_neg_integer(), which represents the Array's Null Count[4], or the number of undefined values in the Array.
  6. validity_bitmap, which is a buffer (arrow_buffer) or the atom undefined, which represents the Array's Validity Bitmap[6].
  7. offsets, which is a buffer (arrow_buffer) which represents the Offsets[7], or the start position of each slot in the data buffer of an Array.
  8. data, which is a buffer (arrow_buffer), which represents the Array's Value Buffer, whose layout differs based on the Array Layout.

Certain fields are not required for certain layouts. For example, for the Fixed-Sized Primitive Layout, the offsets field is not required, in which case it is assigned as undefined. Similarly, the validity_bitmap is not required if there are no null values, in which case it is also assigned as undefined.

the-behaviour-of-an-array

The Behaviour of an Array

As of right now, a layout needs to implement the from_erlang/2 callback. This is then used in the from_erlang/3 function to create new arrays.

functions-for-working-with-arrays

Functions for working with Arrays

As of right now, only functions to access the various fields, to create new arrays, and to serialize arrays to arrow exist.

references

References

[1]: https://arrow.apache.org/docs/format/Glossary.html#term-array

[2]: https://arrow.apache.org/docs/format/Glossary.html#term-physical-layout

[3]: https://arrow.apache.org/docs/format/Glossary.html#term-type

[4]: https://arrow.apache.org/docs/format/Columnar.html#null-count

[5]: https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout

[6]: https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps

[7]: https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout

Link to this section Summary

Types

Represents an Array.
Represents the Layout of an Array.

Callbacks

Creates a new array of a certain layout, given its value and options from its erlang representation.

Functions

Returns the data of an array.
Returns the length of an array.
A common way to create a new array, given its layout, value, and options. from its erlang representation.
Returns the layout of an array.
Returns the length of an array.
Returns the null count of an array.
Returns the offsets of an array.

Serializes an array into the Arrow binary form.

Returns the type of an array.
Returns the validity bitmap of an array.

Link to this section Types

-type array() :: #array{}.
Represents an Array.
-type layout() :: fixed_primitive | variable_binary | fixed_list | variable_list.
Represents the Layout of an Array.

Link to this section Callbacks

-callback from_erlang(Value :: [arrow_type:native_type()], Opts :: map()) -> Array :: #array{}.
Creates a new array of a certain layout, given its value and options from its erlang representation.

Link to this section Functions

-spec data(Array :: #array{}) -> Data :: #buffer{} | #array{} | undefined.
Returns the data of an array.
-spec element_len(Array :: #array{}) -> Length :: pos_integer() | undefined.
Returns the length of an array.
Link to this function

from_erlang(Layout, Value, Opts)

View Source
-spec from_erlang(Layout :: layout(),
            Value :: [arrow_type:native_type()],
            Opts :: map() | arrow_type:arrow_type()) ->
               Array :: #array{}.
A common way to create a new array, given its layout, value, and options. from its erlang representation.
-spec layout(Array :: #array{}) -> Layout :: layout().
Returns the layout of an array.
-spec len(Array :: #array{}) -> Length :: pos_integer().
Returns the length of an array.
-spec null_count(Array :: #array{}) -> NullCount :: non_neg_integer().
Returns the null count of an array.
-spec offsets(Array :: #array{}) -> Offsets :: #buffer{} | undefined.
Returns the offsets of an array.
-spec to_arrow(Array :: #array{}) -> Arrow :: binary().

Serializes an array into the Arrow binary form.

Serializes the buffers of an Array and concatenates them in the following order:

  1. validity
  2. offsets
  3. data

In case an array doesn't have any of the following buffers, it is ommitted. (e.g. validity in arrays with a null count of 0, offsets in fixed primitive arrays). In the case of a nested array, data will be serialized form of nested array.

Do note that this is just binary form that includes the buffers in an Array, and not IPC.
-spec type(Array :: #array{}) -> Type :: arrow_type:arrow_type().
Returns the type of an array.
-spec validity_bitmap(Array :: #array{}) -> ValidityBitmap :: #buffer{} | undefined.
Returns the validity bitmap of an array.