Recording Setup
This section explains how to define a Recording object, the primary input required to run spike sorting.
SpikeSift reads extracellular recordings directly from disk using memory-mapped access.
This requires a binary file with a well-defined structure and known probe geometry.
What format should the data have?
The binary file must represent a 2D array of shape (num_samples, num_channels).
Data must be stored in sample-major (channel-interleaved) order — all channel values for the first sample come first, followed by all channel values for the second sample, and so on.
Each value must match the specified data_type. Supported data types include:
int8,uint8int16,uint16int32,uint32int64,uint64float32,float64
The file must contain only the signal data — no headers, markers, or extra content —
unless explicitly handled via parameters like header or sample_offset.
What does the probe geometry represent?
The probe_geometry defines the physical layout of the recording sites.
It must be a NumPy array of shape (num_channels, 2), where each row gives the (x, y) position
(in micrometers) of one channel on the probe.
This layout is used for:
Grouping nearby channels during waveform extraction
Tracking and aligning drifting neurons across time.
Warning
The order of channels in
probe_geometrymust exactly match the order in the binary file. If your acquisition software saves channels in a different order, you must permute the probe accordingly.
How to define the probe layout
You can define the probe manually or load it from a file.
Example: a vertical probe with 16 channels spaced 20 micrometers apart:
import numpy as np
from spikesift import Recording
probe = np.array([[0, i * 20] for i in range(16)], dtype=np.float32)
recording = Recording(
binary_file="recording.dat",
data_type="int16",
probe_geometry=probe,
sampling_frequency=20000
)
Note
If your probe is stored in .prb, .json, .nwb, or other formats, consider using the probeinterface library to convert it into a NumPy array.
How to handle headers and padding
If the binary file contains metadata or padding before the actual data, use:
header: number of bytes to skip at the beginning of the filesample_offset: number of samples (not bytes) to skip after the header
These options help SpikeSift locate the start of the valid signal.
Example: skip a 1024-byte header and 1000 samples of padding:
recording = Recording(
binary_file="recording_with_header.dat",
data_type="float32",
probe_geometry=probe,
sampling_frequency=30000,
header=1024,
sample_offset=1000
)
Other useful parameters
Additional arguments provide greater flexibility:
num_samples: Restricts how many samples to read (after applying header and sample offset). Useful if the file contains trailing padding or you only want to sort a portion.recording_offset: Sets the logical start time (in samples) of this recording within a larger session. This does not affect how data are read — only how spike times are reported. This is essential for aligning multiple recordings.
Example: read 5 seconds of data and report spike times as if the recording started at 60 seconds:
recording = Recording(
binary_file="block.dat",
data_type="int16",
probe_geometry=probe,
sampling_frequency=30000,
num_samples=5 * 30000, # only read 5 seconds
recording_offset=60 * 30000 # treat this as starting at t = 60s
)
Note
recording_offsetensures that spike times from separate files or blocks remain correctly aligned in time when merging them.