Home¶

Overview¶

batchtensor is a lightweight library built on top of PyTorch to manipulate nested data structures with PyTorch tensors. This library provides functions for tensors where the first dimension is the batch dimension. It also provides functions for tensors representing a batch of sequences where the first dimension is the batch dimension and the second dimension is the sequence dimension.

Key Features¶

Nested Structure Support: Work with dictionaries, lists, and tuples containing tensors
Batch Operations: Efficiently process batches of data along the batch dimension
Sequence Operations: Handle sequential/temporal data along the sequence dimension
Consistent API: Unified interface for both single tensors and nested structures
Type Safety: Fully typed with comprehensive type hints
Well Documented: Extensive documentation with examples for all functions
Lightweight: Minimal dependencies (PyTorch and coola)
Performance: Leverages PyTorch's optimized operations

Main Modules¶

batchtensor.nested: Operations for nested data structures
batchtensor.tensor: Operations for individual tensors
batchtensor.utils: Utility functions for seed management
batchtensor.constants: Dimension constants

Motivation¶

Let's imagine you have a batch which is represented by a dictionary with three tensors, and you want to take the first 2 items. batchtensor provides the function slice_along_batch that allows slicing all the tensors:

>>> import torch
>>> from batchtensor.nested import slice_along_batch
>>> batch = {
...     "a": torch.tensor([[2, 6], [0, 3], [4, 9], [8, 1], [5, 7]]),
...     "b": torch.tensor([4, 3, 2, 1, 0]),
...     "c": torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0]),
... }
>>> slice_along_batch(batch, stop=2)
{'a': tensor([[2, 6], [0, 3]]), 'b': tensor([4, 3]), 'c': tensor([1., 2.])}

Similarly, it is possible to split a batch into multiple batches by using the function split_along_batch:

>>> import torch
>>> from batchtensor.nested import split_along_batch
>>> batch = {
...     "a": torch.tensor([[2, 6], [0, 3], [4, 9], [8, 1], [5, 7]]),
...     "b": torch.tensor([4, 3, 2, 1, 0]),
...     "c": torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0]),
... }
>>> split_along_batch(batch, split_size_or_sections=2)
({'a': tensor([[2, 6], [0, 3]]), 'b': tensor([4, 3]), 'c': tensor([1., 2.])},
 {'a': tensor([[4, 9], [8, 1]]), 'b': tensor([2, 1]), 'c': tensor([3., 4.])},
 {'a': tensor([[5, 7]]), 'b': tensor([0]), 'c': tensor([5.])})

Please check the user guide and API reference to see all the implemented functions and detailed examples.

Quick Links¶

Get Started: Installation instructions
User Guide: Comprehensive tutorials and examples
API Reference: Complete function documentation
GitHub Repository: Source code and issue tracker

API stability¶

While batchtensor is in development stage, no API is guaranteed to be stable from one release to the next. In fact, it is very likely that the API will change multiple times before a stable 1.0.0 release. In practice, this means that upgrading batchtensor to a new version will possibly break any code that was using the old version of batchtensor.

License¶

batchtensor is licensed under BSD 3-Clause "New" or "Revised" license available in LICENSE file.