Constants¶

The batchtensor.constants module defines important constants used throughout the library to identify batch and sequence dimensions.

Overview¶

The constants module provides standardized dimension indices that ensure consistency across all batchtensor operations. These constants are used internally by the library and can also be used in your own code when working with batchtensor functions.

Available Constants¶

BATCH_DIM¶

The batch dimension constant identifies the dimension used for batching in tensors.

>>> from batchtensor.constants import BATCH_DIM
>>> BATCH_DIM
0

Usage: This constant is set to 0, indicating that the batch dimension is always the first dimension (dimension 0) of tensors when using batchtensor functions.

Convention:

For batch tensors: shape is (batch_size, *)
The batch dimension contains independent samples
Operations along this dimension process each sample in the batch

SEQ_DIM¶

The sequence dimension constant identifies the dimension used for sequences in tensors.

>>> from batchtensor.constants import SEQ_DIM
>>> SEQ_DIM
1

Usage: This constant is set to 1, indicating that the sequence dimension is always the second dimension (dimension 1) of tensors when using batchtensor functions.

Convention:

For sequence tensors: shape is (batch_size, seq_len, *)
The sequence dimension contains sequential/temporal data
Operations along this dimension process time steps or sequence positions

Why Use Constants?¶

Using constants instead of hard-coded numbers provides several benefits:

Code Clarity: BATCH_DIM is more readable than 0
Consistency: Ensures all operations use the same dimension conventions
Maintainability: If dimension conventions ever change, updating the constant updates all uses
Self-Documentation: Code using these constants is self-explanatory

Practical Examples¶

Using Constants in Your Code¶

When working with batchtensor functions, you can use these constants to make your code more explicit:

>>> import torch
>>> from batchtensor.constants import BATCH_DIM, SEQ_DIM
>>> # Create a batch of sequences
>>> # Shape: (batch_size=2, seq_len=3, features=4)
>>> data = torch.randn(2, 3, 4)
>>> # Check dimensions
>>> batch_size = data.size(BATCH_DIM)
>>> seq_len = data.size(SEQ_DIM)
>>> print(f"Batch size: {batch_size}, Sequence length: {seq_len}")
Batch size: 2, Sequence length: 3

Manual Operations with Constants¶

When you need to perform operations directly with PyTorch but want to maintain consistency with batchtensor conventions:

>>> import torch
>>> from batchtensor.constants import BATCH_DIM, SEQ_DIM
>>> data = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
>>> # Sum along batch dimension
>>> batch_sum = data.sum(dim=BATCH_DIM)
>>> batch_sum
tensor([[ 6,  8],
        [10, 12]])
>>> # Sum along sequence dimension
>>> seq_sum = data.sum(dim=SEQ_DIM)
>>> seq_sum
tensor([[ 4,  6],
        [12, 14]])

Verifying Tensor Shapes¶

Use constants to verify that your tensors have the expected shape:

>>> import torch
>>> from batchtensor.constants import BATCH_DIM, SEQ_DIM
>>> def validate_batch_tensor(tensor, expected_batch_size):
...     """Validate that a tensor has the expected batch size."""
...     actual_batch_size = tensor.size(BATCH_DIM)
...     if actual_batch_size != expected_batch_size:
...         raise ValueError(
...             f"Expected batch size {expected_batch_size}, " f"got {actual_batch_size}"
...         )
...     return True
...
>>> tensor = torch.randn(32, 10)  # batch_size=32, features=10
>>> validate_batch_tensor(tensor, expected_batch_size=32)
True

Creating Batches and Sequences¶

Use constants when manually creating or reshaping tensors:

>>> import torch
>>> from batchtensor.constants import BATCH_DIM, SEQ_DIM
>>> # Create individual samples
>>> sample1 = torch.tensor([[1, 2], [3, 4]])  # seq_len=2, features=2
>>> sample2 = torch.tensor([[5, 6], [7, 8]])
>>> # Stack into a batch along batch dimension
>>> batch = torch.stack([sample1, sample2], dim=BATCH_DIM)
>>> batch.shape
torch.Size([2, 2, 2])
>>> # Verify dimensions
>>> print(f"Batch size: {batch.size(BATCH_DIM)}")
Batch size: 2
>>> print(f"Sequence length: {batch.size(SEQ_DIM)}")
Sequence length: 2

Internal Usage¶

These constants are used internally throughout batchtensor. For example (simplified pseudocode):

# In batchtensor.tensor.reduction
def sum_along_batch(tensor: torch.Tensor, keepdim: bool = False) -> torch.Tensor:
    """Sum tensor along the batch dimension."""
    return tensor.sum(dim=BATCH_DIM, keepdim=keepdim)


# In batchtensor.tensor.slicing (simplified for illustration)
def select_along_seq(tensor: torch.Tensor, index: int) -> torch.Tensor:
    """Select a specific index along the sequence dimension."""
    return tensor.select(dim=SEQ_DIM, index=index)

This ensures consistency across all functions in the library.

Compatibility with Other Libraries¶

While batchtensor uses these specific dimension conventions, they are compatible with common PyTorch practices:

Batch-first convention: Most PyTorch modules (like nn.Linear, nn.GRU with batch_first=True) expect batch as the first dimension
Standard computer vision: Images are typically (batch, channels, height, width), compatible with BATCH_DIM=0
NLP sequence models: With batch_first=True, sequences are (batch, seq_len, features), matching our conventions

Best Practices¶

Import the constants when you need to reference dimensions explicitly
Use in assertions to validate tensor shapes in your code
Prefer batchtensor functions over manual dimension handling when possible
Document assumptions about tensor shapes in your own functions