Data Loading¶
The XPCSDataLoader class provides a
unified interface for loading two-time correlation matrices from various
file formats produced by synchrotron beamline pipelines.
Supported Formats¶
The loader auto-detects the file format from the internal structure:
- HDF5 (
.h5,.hdf5) The most common output from beamline reduction pipelines. Auto-detection distinguishes:
APS-U format – Current APS upgrade pipeline layout.
APS legacy format – Older 8-ID-I / XPCS analysis pipeline.
Generic HDF5 – Any HDF5 file containing a
C2ortwo_timedataset at a known path.
- NumPy (
.npz) Compressed NumPy archives containing
c2,t1, andt2time-axis arrays. Useful for sharing preprocessed data or synthetic test cases.- MATLAB (
.mat) Version 5 MAT files with variables
C2andt.
Basic Usage¶
from heterodyne.data.xpcs_loader import XPCSDataLoader
# Load a single-angle dataset
loader = XPCSDataLoader("run42_q3.h5")
data = loader.load()
print(data.c2.shape) # (N_frames, N_frames)
print(data.t1[:5]) # First 5 row time-axis values in seconds
print(data.t2[:5]) # First 5 column time-axis values
The returned XPCSData object carries
the \(C_2\) matrix, the t1 and t2 time axes, and any metadata
present in the source file (q-value, temperature, exposure time, etc.).
Multi-Angle Data¶
For heterodyne analysis each azimuthal angle \(\phi\) typically corresponds to a separate file or dataset group. Load them individually and pass the angle values to the fitting functions:
import numpy as np
phi_angles = [0.0, 22.5, 45.0, 67.5, 90.0, 112.5, 135.0, 157.5]
datasets = []
for phi in phi_angles:
loader = XPCSDataLoader(f"run42_phi{phi:.1f}.h5")
datasets.append(loader.load())
# Stack C2 matrices for multi-angle fitting
c2_stack = np.stack([d.c2 for d in datasets], axis=0)
print(c2_stack.shape) # (8, N_frames, N_frames)
Inspecting Loaded Data¶
Before fitting, verify that the data is well-formed:
data = loader.load()
# Check for NaN or negative values on the diagonal
diag = np.diag(data.c2)
assert not np.any(np.isnan(diag)), "NaN on C2 diagonal"
assert np.all(diag > 0), "Non-positive diagonal values"
# Verify time axes are monotonically increasing
dt1 = np.diff(data.t1)
dt2 = np.diff(data.t2)
assert np.all(dt1 > 0), "Non-monotonic t1"
assert np.all(dt2 > 0), "Non-monotonic t2"
# Print summary
print(f"Frames: {data.c2.shape[0]}")
print(f"Duration: {data.t1[-1] - data.t1[0]:.1f} s")
print(f"Frame rate: {1.0 / np.median(dt1):.1f} Hz")
NPZ Caching¶
For large HDF5 files, the loader supports transparent NPZ caching.
On the first load, a .npz companion file is written next to the
source. Subsequent loads read the NPZ directly, which is significantly
faster.
Cache validity is checked via mtime comparison: if the source file is newer than the cache, the cache is regenerated automatically.
loader = XPCSDataLoader("run42_q3.h5", use_cache=True)
data = loader.load() # First call: reads HDF5, writes .npz cache
data = loader.load() # Second call: reads .npz (faster)
Memory Management¶
For datasets with thousands of frames the \(C_2\) matrix can consume gigabytes of memory. The loader uses adaptive chunking when reading HDF5 files to avoid peak-memory spikes:
Files smaller than 2 GB are read in a single pass.
Larger files are read in row-chunks and assembled incrementally.
If memory is still a concern, consider trimming the frame range before fitting:
# Load only frames 100--500
data = loader.load(frame_start=100, frame_end=500)