API Reference

This section documents the main user-facing functions and classes provided by SpikeSift. All internal modules and helper functions are omitted for clarity.

class spikesift.Recording(*, binary_file, data_type, probe_geometry, sampling_frequency, num_samples=None, header=0, sample_offset=0, recording_offset=0)[source]

Represents an extracellular recording stored in a flat binary file.

This class manages metadata and provides efficient access to raw voltage data for spike sorting. It assumes a flat binary layout with channels interleaved sample-wise.

Parameters:

binary_file (str) – Path to the binary file containing the raw recording.
data_type (dtype) – NumPy-compatible data type (e.g., float32, int16).
probe_geometry (ndarray of shape (recording_channels, 2)) – Spatial coordinates (in micrometers) of each recording channel.
sampling_frequency (float) – Sampling rate in Hz. Must be at least 1000.
num_samples (int, optional) – Total number of samples to load. If omitted, the number is inferred from file size and header.
header (int, optional (default=0)) – Number of bytes to skip at the beginning of the file.
sample_offset (int, optional (default=0)) – Number of samples to skip after the header.
recording_offset (int, optional (default=0)) – Logical start time in samples, used for aligning or merging segments. Does not affect how data are read.

Warning

After creation, this object should be treated as read-only.
Binary layout must be flat and channel-interleaved sample-wise.
The order of channels in probe_geometry must match the binary file.

VALID_DATA_TYPES = ('int8', 'uint8', 'int16', 'uint16', 'int32', 'uint32', 'int64', 'uint64', 'float32', 'float64')

read(*, start, num_samples)[source]

Reads a segment of the binary recording.

Parameters:

start (int) – Sample index to begin reading, after accounting for header and sample_offset.
num_samples (int) – Number of consecutive samples to read.

Returns:

Extracted signal data as a NumPy array.

Return type:

ndarray, shape (num_samples, recording_channels)

Warning

This method is intended for debugging and manual inspection only.
SpikeSift handles all necessary data access internally during sorting.

validate(*, verbose=False)[source]

Finalizes setup and verifies recording consistency.

Parameters:: verbose (bool, optional) – If True, prints a summary of the recording.
Raises:: ValueError – If any of the file, geometry, or offset parameters are invalid.

Warning

This method is called automatically during spike sorting.
Manual calls are typically only necessary for debugging or inspection.

class spikesift.core.SortedRecording(*, sorted_segments, assignment_chain, probe_geometry)[source]

Represents a fully sorted and drift-corrected extracellular recording.

This class merges spike clusters across multiple independently sorted segments, and provides access to global spike times, amplitude vectors, and segment boundaries.

Parameters:

sorted_segments (list of SortedSegment (internal)) – List of sorted segments, each containing spike clusters and amplitude representations.
assignment_chain (list of ndarray of shape (num_clusters,)) –
One-to-one mappings between adjacent segments.
- Each array maps cluster indices from one segment to the next.
- Unassigned entries are marked with -1.
probe_geometry (ndarray of shape (recording_channels, 2)) – 2D electrode layout used for drift compensation.

Warning

Do not modify sorted_segments, assignment_chain, or probe_geometry in place. They are shared across recordings and treated as immutable.

all_spikes()[source]

Returns spike times for all valid clusters.

Returns:: Dictionary mapping cluster IDs to spike times.
Return type:: dict of int -> ndarray

amplitude_vectors(cluster_id)[source]

Returns the amplitude vectors for a single cluster across all segments.

Parameters:: cluster_id (int) – ID of the cluster.
Returns:: Amplitude vector for each segment.
Return type:: ndarray of shape (num_segments, recording_channels)
Raises:: ValueError – If the cluster ID is not valid for this recording.

Warning

Values reflect both spike-related activity and background fluctuations, and may be nonzero even on channels where the neuron is inactive.

cluster_ids()[source]

Returns all valid cluster IDs for this recording.

Returns:: Set of cluster IDs that are valid across the entire recording.
Return type:: set of int

Warning

IDs may refer to different units across different SortedRecording objects.
To compare clusters between recordings, use map_clusters().

end_time()[source]

Returns the global end time of the recording (in samples).

Returns:: End time in samples.
Return type:: int

segment_boundaries()[source]

Returns start and end sample indices for all segments.

Returns:: List of (start_sample, end_sample) pairs, one per segment.
Return type:: list of tuple

spikes(cluster_id)[source]

Returns spike times for the specified cluster.

Parameters:: cluster_id (int) – The cluster ID to retrieve.
Returns:: 1D NumPy array of spike times for the selected cluster.
Return type:: ndarray
Raises:: ValueError – If the cluster ID is not valid for this recording.

Warning

Cluster IDs are only valid within this SortedRecording instance.
To avoid invalid lookups, use .cluster_ids() to retrieve the set of valid cluster IDs.

split_into_segments()[source]

Splits the recording into its original unmerged segments.

Returns:: Each entry corresponds to one original segment.
Return type:: list of SortedRecording

start_time()[source]

Returns the global start time of the recording (in samples).

Returns:: Start time in samples.
Return type:: int

valid_cluster_id(cluster_id)[source]

Checks whether a cluster ID is valid across the entire recording.

Parameters:: cluster_id (int) – The cluster ID to validate.
Returns:: True if the cluster is consistently matched across all segments; False otherwise.
Return type:: bool

Warning

A cluster is considered valid only if it is present in every segment of the recording.
Clusters that disappear or fragment in later segments will return False.

spikesift.perform_spike_sorting(recording, *, min_segment_length=10, detection_sensitivity=10, min_spikes_per_cluster=5, merging_threshold=0.4, max_drift=30, detection_polarity=-1, verbose=True)[source]

Performs complete spike sorting on an extracellular recording.

Parameters:

recording (Recording) – The input recording object.
min_segment_length (float, optional (default=10)) –
Minimum segment duration (in seconds) for adaptive segmentation.
- Controls how the recording is partitioned
- Must be at least 0.1 seconds
- Values below 0.1 are automatically clipped
- If the recording itself is shorter than this, it is processed as a single segment
detection_sensitivity (float, optional (default=10)) –
Multiplier for spike detection thresholds.
- Must be positive
- Higher values reduce false positives, but may miss weaker spikes
- Lower values increase sensitivity, but may introduce noise
min_spikes_per_cluster (float, optional (default=5)) –
Minimum number of spikes required for a cluster to be considered valid.
- Must be at least 2
- Values below 2 are silently clipped
- Although spike counts are integers, this threshold is treated as a float and compared directly
merging_threshold (float, optional (default=0.4)) –
Similarity threshold for merging clusters based on spatial waveform differences.
- Must be between 0 and 1 (exclusive)
- Higher values allow more aggressive merging
- Lower values enforce stricter separation
max_drift (float, optional (default=30)) –
Maximum vertical shift (in micrometers) used for aligning clusters across segments.
- Must be non-negative
- Internally rounded to the nearest multiple of 5
- Larger values enable alignment over larger displacements
detection_polarity (float, optional (default=-1)) –
Scalar applied to the signal prior to spike detection.
- Use -1.0 to detect negative-going spikes (default)
- Use +1.0 to detect positive-going spikes
- Any other nonzero value is allowed; only the sign affects detection
verbose (bool, optional (default=True)) – If True, displays progress bar and recording information.

Returns:

A fully sorted recording, including spike times, cluster identities, and amplitude vectors.

Return type:

SortedRecording

Raises:

ValueError – If any input parameter is invalid or improperly typed.

Warning

Recordings shorter than 10 milliseconds cannot be processed and will raise an error.
SpikeSift requires at least 4 channels for spike sorting.

spikesift.merge_recordings(sorted_recordings, *, max_drift=30)[source]

Aligns and merges multiple independently sorted recordings into a unified result.

Parameters:

sorted_recordings (list of SortedRecording) –
List of independently sorted recordings to be merged. Each entry must:
- Contain at least one valid segment
- Use the same probe geometry
- Be sorted in time and have non-overlapping segments
max_drift (float, optional (default=30)) –
Maximum vertical shift (in micrometers) allowed when aligning clusters across segments.
- Must be non-negative
- Internally rounded to the nearest multiple of 5
- Higher values allow alignment over larger displacements

Returns:

A single merged recording containing all aligned spike clusters.

Return type:

SortedRecording

Raises:

ValueError – If the input list is empty, contains invalid types, includes inconsistent geometries, or includes overlapping segment time ranges.

Warning

This function assumes all inputs were produced by SpikeSift and remain unmodified.

spikesift.map_clusters(source, target, *, max_drift=30)[source]

Computes a one-to-one mapping from clusters in source to their counterparts in target.

Parameters:

source (SortedRecording) – First sorted recording to compare.
target (SortedRecording) – Second sorted recording to compare.
max_drift (float, optional (default=30)) –
Maximum vertical displacement (in micrometers) used during alignment.
- Must be non-negative
- Internally rounded to the nearest multiple of 5
- Higher values permit alignment across larger drift magnitudes

Returns:

Mapping from cluster IDs in source to corresponding cluster IDs in target. Only valid, unambiguous one-to-one matches are included.

Return type:

dict of int -> int

Raises:

ValueError – If inputs are invalid or incompatible (e.g., mismatched geometry).

Warning

This function assumes that both source and target were generated using SpikeSift and have not been manually modified.