API Reference

This section documents the main user-facing functions and classes provided by SpikeSift. All internal modules and helper functions are omitted for clarity.

class spikesift.Recording(*, binary_file, data_type, probe_geometry, sampling_frequency, num_samples=None, header=0, sample_offset=0, recording_offset=0)[source]

Represents an extracellular recording stored in a flat binary file.

This class manages metadata and provides efficient access to raw voltage data for spike sorting. It assumes a flat binary layout with channels interleaved sample-wise.

Parameters:
  • binary_file (str) – Path to the binary file containing the raw recording.

  • data_type (dtype) – NumPy-compatible data type (e.g., float32, int16).

  • probe_geometry (ndarray of shape (recording_channels, 2)) – Spatial coordinates (in micrometers) of each recording channel.

  • sampling_frequency (float) – Sampling rate in Hz. Must be at least 1000.

  • num_samples (int, optional) – Total number of samples to load. If omitted, the number is inferred from file size and header.

  • header (int, optional (default=0)) – Number of bytes to skip at the beginning of the file.

  • sample_offset (int, optional (default=0)) – Number of samples to skip after the header.

  • recording_offset (int, optional (default=0)) – Logical start time in samples, used for aligning or merging segments. Does not affect how data are read.

Warning

  • After creation, this object should be treated as read-only.

  • Binary layout must be flat and channel-interleaved sample-wise.

  • The order of channels in probe_geometry must match the binary file.

VALID_DATA_TYPES = ('int8', 'uint8', 'int16', 'uint16', 'int32', 'uint32', 'int64', 'uint64', 'float32', 'float64')
read(*, start, num_samples)[source]

Reads a segment of the binary recording.

Parameters:
  • start (int) – Sample index to begin reading, after accounting for header and sample_offset.

  • num_samples (int) – Number of consecutive samples to read.

Returns:

Extracted signal data as a NumPy array.

Return type:

ndarray, shape (num_samples, recording_channels)

Warning

  • This method is intended for debugging and manual inspection only.

  • SpikeSift handles all necessary data access internally during sorting.

validate(*, verbose=False)[source]

Finalizes setup and verifies recording consistency.

Parameters:

verbose (bool, optional) – If True, prints a summary of the recording.

Raises:

ValueError – If any of the file, geometry, or offset parameters are invalid.

Warning

  • This method is called automatically during spike sorting.

  • Manual calls are typically only necessary for debugging or inspection.

class spikesift.core.SortedRecording(*, sorted_segments, assignment_chain, probe_geometry)[source]

Represents a fully sorted and drift-corrected extracellular recording.

This class merges spike clusters across multiple independently sorted segments, and provides access to global spike times, amplitude vectors, and segment boundaries.

Parameters:
  • sorted_segments (list of SortedSegment (internal)) – List of sorted segments, each containing spike clusters and amplitude representations.

  • assignment_chain (list of ndarray of shape (num_clusters,)) –

    One-to-one mappings between adjacent segments.

    • Each array maps cluster indices from one segment to the next.

    • Unassigned entries are marked with -1.

  • probe_geometry (ndarray of shape (recording_channels, 2)) – 2D electrode layout used for drift compensation.

Warning

  • Do not modify sorted_segments, assignment_chain, or probe_geometry in place. They are shared across recordings and treated as immutable.

all_spikes()[source]

Returns spike times for all valid clusters.

Returns:

Dictionary mapping cluster IDs to spike times.

Return type:

dict of int -> ndarray

amplitude_vectors(cluster_id)[source]

Returns the amplitude vectors for a single cluster across all segments.

Parameters:

cluster_id (int) – ID of the cluster.

Returns:

Amplitude vector for each segment.

Return type:

ndarray of shape (num_segments, recording_channels)

Raises:

ValueError – If the cluster ID is not valid for this recording.

Warning

  • Values reflect both spike-related activity and background fluctuations, and may be nonzero even on channels where the neuron is inactive.

cluster_ids()[source]

Returns all valid cluster IDs for this recording.

Returns:

Set of cluster IDs that are valid across the entire recording.

Return type:

set of int

Warning

  • IDs may refer to different units across different SortedRecording objects.

  • To compare clusters between recordings, use map_clusters().

end_time()[source]

Returns the global end time of the recording (in samples).

Returns:

End time in samples.

Return type:

int

segment_boundaries()[source]

Returns start and end sample indices for all segments.

Returns:

List of (start_sample, end_sample) pairs, one per segment.

Return type:

list of tuple

spikes(cluster_id)[source]

Returns spike times for the specified cluster.

Parameters:

cluster_id (int) – The cluster ID to retrieve.

Returns:

1D NumPy array of spike times for the selected cluster.

Return type:

ndarray

Raises:

ValueError – If the cluster ID is not valid for this recording.

Warning

  • Cluster IDs are only valid within this SortedRecording instance.

  • To avoid invalid lookups, use .cluster_ids() to retrieve the set of valid cluster IDs.

split_into_segments()[source]

Splits the recording into its original unmerged segments.

Returns:

Each entry corresponds to one original segment.

Return type:

list of SortedRecording

start_time()[source]

Returns the global start time of the recording (in samples).

Returns:

Start time in samples.

Return type:

int

valid_cluster_id(cluster_id)[source]

Checks whether a cluster ID is valid across the entire recording.

Parameters:

cluster_id (int) – The cluster ID to validate.

Returns:

True if the cluster is consistently matched across all segments; False otherwise.

Return type:

bool

Warning

  • A cluster is considered valid only if it is present in every segment of the recording.

  • Clusters that disappear or fragment in later segments will return False.

spikesift.perform_spike_sorting(recording, *, min_segment_length=10, detection_sensitivity=10, min_spikes_per_cluster=5, merging_threshold=0.4, max_drift=30, detection_polarity=-1, verbose=True)[source]

Performs complete spike sorting on an extracellular recording.

Parameters:
  • recording (Recording) – The input recording object.

  • min_segment_length (float, optional (default=10)) –

    Minimum segment duration (in seconds) for adaptive segmentation.

    • Controls how the recording is partitioned

    • Must be at least 0.1 seconds

    • Values below 0.1 are automatically clipped

    • If the recording itself is shorter than this, it is processed as a single segment

  • detection_sensitivity (float, optional (default=10)) –

    Multiplier for spike detection thresholds.

    • Must be positive

    • Higher values reduce false positives, but may miss weaker spikes

    • Lower values increase sensitivity, but may introduce noise

  • min_spikes_per_cluster (float, optional (default=5)) –

    Minimum number of spikes required for a cluster to be considered valid.

    • Must be at least 2

    • Values below 2 are silently clipped

    • Although spike counts are integers, this threshold is treated as a float and compared directly

  • merging_threshold (float, optional (default=0.4)) –

    Similarity threshold for merging clusters based on spatial waveform differences.

    • Must be between 0 and 1 (exclusive)

    • Higher values allow more aggressive merging

    • Lower values enforce stricter separation

  • max_drift (float, optional (default=30)) –

    Maximum vertical shift (in micrometers) used for aligning clusters across segments.

    • Must be non-negative

    • Internally rounded to the nearest multiple of 5

    • Larger values enable alignment over larger displacements

  • detection_polarity (float, optional (default=-1)) –

    Scalar applied to the signal prior to spike detection.

    • Use -1.0 to detect negative-going spikes (default)

    • Use +1.0 to detect positive-going spikes

    • Any other nonzero value is allowed; only the sign affects detection

  • verbose (bool, optional (default=True)) – If True, displays progress bar and recording information.

Returns:

A fully sorted recording, including spike times, cluster identities, and amplitude vectors.

Return type:

SortedRecording

Raises:

ValueError – If any input parameter is invalid or improperly typed.

Warning

  • Recordings shorter than 10 milliseconds cannot be processed and will raise an error.

  • SpikeSift requires at least 4 channels for spike sorting.

spikesift.merge_recordings(sorted_recordings, *, max_drift=30)[source]

Aligns and merges multiple independently sorted recordings into a unified result.

Parameters:
  • sorted_recordings (list of SortedRecording) –

    List of independently sorted recordings to be merged. Each entry must:

    • Contain at least one valid segment

    • Use the same probe geometry

    • Be sorted in time and have non-overlapping segments

  • max_drift (float, optional (default=30)) –

    Maximum vertical shift (in micrometers) allowed when aligning clusters across segments.

    • Must be non-negative

    • Internally rounded to the nearest multiple of 5

    • Higher values allow alignment over larger displacements

Returns:

A single merged recording containing all aligned spike clusters.

Return type:

SortedRecording

Raises:

ValueError – If the input list is empty, contains invalid types, includes inconsistent geometries, or includes overlapping segment time ranges.

Warning

  • This function assumes all inputs were produced by SpikeSift and remain unmodified.

spikesift.map_clusters(source, target, *, max_drift=30)[source]

Computes a one-to-one mapping from clusters in source to their counterparts in target.

Parameters:
  • source (SortedRecording) – First sorted recording to compare.

  • target (SortedRecording) – Second sorted recording to compare.

  • max_drift (float, optional (default=30)) –

    Maximum vertical displacement (in micrometers) used during alignment.

    • Must be non-negative

    • Internally rounded to the nearest multiple of 5

    • Higher values permit alignment across larger drift magnitudes

Returns:

Mapping from cluster IDs in source to corresponding cluster IDs in target. Only valid, unambiguous one-to-one matches are included.

Return type:

dict of int -> int

Raises:

ValueError – If inputs are invalid or incompatible (e.g., mismatched geometry).

Warning

  • This function assumes that both source and target were generated using SpikeSift and have not been manually modified.