Home¶
Overview¶
batcharray is lightweight library built on top of NumPy
to manipulate nested data structure with NumPy arrays.
This library provides functions for arrays where the first axis is the batch axis.
It also provides functions for arrays representing a batch of sequences where the first axis
is the batch axis and the second axis is the sequence axis.
Key Features¶
- Batch Operations: Manipulate arrays with a dedicated batch dimension
- Sequence Support: Handle time-series and sequential data efficiently
- Nested Structures: Work with dictionaries and lists of arrays seamlessly
- Masked Arrays: Built-in support for handling missing or invalid data
- Type Safety: Computation models for different array types
- Zero Dependencies: Lightweight with only NumPy as a core dependency
Quick Links¶
- Get Started - Installation and setup
- Tutorials - Step-by-step guides
- User Guide - Detailed documentation
- API Reference - Complete function reference
Motivation¶
Let's imagine you have a batch which is represented by a dictionary with three arrays, and you want
to take the first 2 items.
batcharray provides the function slice_along_batch that allows to slide all the arrays:
>>> import numpy as np
>>> from batcharray.nested import slice_along_batch
>>> batch = {
... "a": np.array([[2, 6], [0, 3], [4, 9], [8, 1], [5, 7]]),
... "b": np.array([4, 3, 2, 1, 0]),
... "c": np.array([1.0, 2.0, 3.0, 4.0, 5.0]),
... }
>>> slice_along_batch(batch, stop=2)
{'a': array([[2, 6], [0, 3]]), 'b': array([4, 3]), 'c': array([1., 2.])}
Similarly, it is possible to split a batch in multiple batches by using the
function split_along_batch:
>>> import numpy as np
>>> from batcharray.nested import split_along_batch
>>> batch = {
... "a": np.array([[2, 6], [0, 3], [4, 9], [8, 1], [5, 7]]),
... "b": np.array([4, 3, 2, 1, 0]),
... "c": np.array([1.0, 2.0, 3.0, 4.0, 5.0]),
... }
>>> split_along_batch(batch, split_size_or_sections=2)
[{'a': array([[2, 6], [0, 3]]), 'b': array([4, 3]), 'c': array([1., 2.])},
{'a': array([[4, 9], [8, 1]]), 'b': array([2, 1]), 'c': array([3., 4.])},
{'a': array([[5, 7]]), 'b': array([0]), 'c': array([5.])}]
Please check the documentation to see all the implemented functions.
Documentation Structure¶
For Beginners¶
- Get Started - Install and set up batcharray
- Working with Batches Tutorial - Learn basic batch operations
For Users¶
- Working with Sequences Tutorial - Handle time-series data
- Advanced Nested Operations Tutorial - Master complex structures
- User Guide: Array Operations - Complete guide to array functions
- User Guide: Nested Structures - Working with nested data
- User Guide: Computation Models - Low-level abstractions
API Reference¶
- array module - Single array operations
- nested module - Nested structure operations
- computation module - Computation models
- constants module - Package constants
- types module - Type definitions
API stability¶
While
batcharray is in development stage, no API is guaranteed to be stable from one
release to the next.
In fact, it is very likely that the API will change multiple times before a stable 1.0.0 release.
In practice, this means that upgrading batcharray to a new version will possibly break any code
that was using the old version of batcharray.
License¶
batcharray is licensed under BSD 3-Clause "New" or "Revised" license available
in LICENSE file.