


% Demonstration code for the spike sorter derived from the Fee et al.
% (J. Neurosci Meth (1996) 69: 175-88) algorithm. Requires Matlab
% version 6.1 or later and the Statistics Toolbox.
% Last Modified: 8/13/06
% samar mehta, sbmehta@ucsd.edu
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% INTRO %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The sorting procedure itself is fairly easy to perform (although it is
% currently mostly command line driven -- not much GUI yet -- and you
% have to perform your own spike detection to create the inputs described
% below). Automated spike sorting, however, always requires a human
% 'sanity check' and can sometimes also need intervention. To this end,
% several visualization tools are included. This file steps through an
% extended example of their use.
%
% We start with an overview of how to sort if you want to get started
% quickly. This is followed by example code and example data which
% will hopefully clarify the details. The example does not, however,
% cover everything that the code can do . . . if there is something that
% you feel is not working or information that you feel would be useful,
% please feel free to look at the code itself (which is fully documented)
% and to use HELP. (or email sbmehta@ucsd.edu)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% OVERVIEW %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The input to the sorting code is a Matlab structure with the following
% fields: {WAVEFORMS, SPIKETIMES, FS, THRESHT, THRESHV}
% WAVEFORMS is an N x M matrix of spike data, where each row contains
% the voltage values from a single spike (this can be from a single
% electrode or e.g., the concatenated data from a tetrode). More
% generally, any feature (not necessarily voltage) can appear in this
% matrix, although it is your responsibility to make sure that they
% are normalized such that differing units between features do not
% dominate the variance of the data.
% SPIKETIMES is an N x 1 vector of spike times (in seconds). The nth
% entry should correspond to the nth row of 'waveforms'.
% FS is the sampling rate (in Hz) of the recording.
% THRESHT is the column index in WAVEFORMS where threshold crossing
% occurred (i.e., we are assuming the the spikes were extracted from a
% raw data trace via threshold detection). For example, if the THRESHT
% is 10 and WAVEFORMS is N x 30, then the 10th column of WAVEFORMS
% contains the sample that crossed threshold, with 9 samples before and
% 20 samples after. The threshold detection must be done in such a way
% that no two detected spikes are ever less than M samples apart.
% THRESHV is a pair of numbers corresponding to the high and low voltage
% thresholds used to extract the spikes. If only one threshold was used,
% the other can be given as +/- Inf. For example, [-50, Inf] means that
% spikes were detected when the voltage dropped below -50 (arbitrary units)
% and no positive going threshold was defined.
% The WAVEFORMS, SPIKETIMES and FS are required. The code will run
% if THRESHT and THRESHV are not defined, but the dejittering step will
% not work and the height/width graphical displays may produce errors.
% Sorting options can also be defined in the SPIKES structure to direct the
% algorithm. The currently defined options are:
% SPIKES.OPTIONS.OUTLIER_CUTOFF (see help for SS_OUTLIERS)
% SPIKES.OPTIONS.REFRACTORY_PERIOD (see help for SS_AGGREGATE)
% These are not mandatory; if not defined, default values are used.
%
% The end product of the sorting is an N x 1 vector of assignments, with any
% outliers assigned to a cluster with label 0. If you prefer to have the
% outliers segregated from the data, see HELP SS_AGGREGATE; in this case,
% the assignment vector will be of size P x 1, where P is the number of
% spikes not treated as outliers, and the WAVEFORMS and SPIKETIMES variables
% will also be modified to have P rows. The remaining spikes (i.e., the
% outliers) can then be found in a new field called OUTLIERS.
% The assignments are numerical labels with no inherent meaning; their only
% interpretation is that spikes labeled with, e.g., 1 belong to the same cluster.
%
% If the input information is collected into a Matlab structure called 'spikes',
% the following code will produce the output assignments vector:
spikes = ss_dejitter(spikes);
spikes = ss_outliers(spikes);
spikes = ss_kmeans(spikes);
spikes = ss_energy(spikes);
spikes = ss_aggregate(spikes);
assignments = spikes.hierarchy.assigns;
%
% If this seems cryptic, read on ...
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% STEP-BY-STEP %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The above code will sort the spikes but provides little feedback along
% the way. Until this is all wrapped into a GUI, effective sorting requires
% the use of the included command line tools to give you a sense for how
% the data look. Note that this some of these steps might be easier to
% follow if you browse through the Fee et al. paper first.
%
% If you have put your own data into a 'spikes' data structure as described
% above, you can run the code on your own data. This code, however, includes
% three demonstration data sets chosen to highlight the functionality. i
% suggest that that you try one of these data sets first to make sure that
% things run the same on your computer as they do on mine.

0001 %% Demonstration code for the spike sorter derived from the Fee et al. 0002 %% (J. Neurosci Meth (1996) 69: 175-88) algorithm. Requires Matlab 0003 %% version 6.1 or later and the Statistics Toolbox. 0004 %% Last Modified: 8/13/06 0005 %% samar mehta, sbmehta@ucsd.edu 0006 %% 0007 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% INTRO %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0008 %% The sorting procedure itself is fairly easy to perform (although it is 0009 %% currently mostly command line driven -- not much GUI yet -- and you 0010 %% have to perform your own spike detection to create the inputs described 0011 %% below). Automated spike sorting, however, always requires a human 0012 %% 'sanity check' and can sometimes also need intervention. To this end, 0013 %% several visualization tools are included. This file steps through an 0014 %% extended example of their use. 0015 %% 0016 %% We start with an overview of how to sort if you want to get started 0017 %% quickly. This is followed by example code and example data which 0018 %% will hopefully clarify the details. The example does not, however, 0019 %% cover everything that the code can do . . . if there is something that 0020 %% you feel is not working or information that you feel would be useful, 0021 %% please feel free to look at the code itself (which is fully documented) 0022 %% and to use HELP. (or email sbmehta@ucsd.edu) 0023 %% 0024 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% OVERVIEW %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0025 %% The input to the sorting code is a Matlab structure with the following 0026 %% fields: {WAVEFORMS, SPIKETIMES, FS, THRESHT, THRESHV} 0027 %% WAVEFORMS is an N x M matrix of spike data, where each row contains 0028 %% the voltage values from a single spike (this can be from a single 0029 %% electrode or e.g., the concatenated data from a tetrode). More 0030 %% generally, any feature (not necessarily voltage) can appear in this 0031 %% matrix, although it is your responsibility to make sure that they 0032 %% are normalized such that differing units between features do not 0033 %% dominate the variance of the data. 0034 %% SPIKETIMES is an N x 1 vector of spike times (in seconds). The nth 0035 %% entry should correspond to the nth row of 'waveforms'. 0036 %% FS is the sampling rate (in Hz) of the recording. 0037 %% THRESHT is the column index in WAVEFORMS where threshold crossing 0038 %% occurred (i.e., we are assuming the the spikes were extracted from a 0039 %% raw data trace via threshold detection). For example, if the THRESHT 0040 %% is 10 and WAVEFORMS is N x 30, then the 10th column of WAVEFORMS 0041 %% contains the sample that crossed threshold, with 9 samples before and 0042 %% 20 samples after. The threshold detection must be done in such a way 0043 %% that no two detected spikes are ever less than M samples apart. 0044 %% THRESHV is a pair of numbers corresponding to the high and low voltage 0045 %% thresholds used to extract the spikes. If only one threshold was used, 0046 %% the other can be given as +/- Inf. For example, [-50, Inf] means that 0047 %% spikes were detected when the voltage dropped below -50 (arbitrary units) 0048 %% and no positive going threshold was defined. 0049 %% The WAVEFORMS, SPIKETIMES and FS are required. The code will run 0050 %% if THRESHT and THRESHV are not defined, but the dejittering step will 0051 %% not work and the height/width graphical displays may produce errors. 0052 %% Sorting options can also be defined in the SPIKES structure to direct the 0053 %% algorithm. The currently defined options are: 0054 %% SPIKES.OPTIONS.OUTLIER_CUTOFF (see help for SS_OUTLIERS) 0055 %% SPIKES.OPTIONS.REFRACTORY_PERIOD (see help for SS_AGGREGATE) 0056 %% These are not mandatory; if not defined, default values are used. 0057 %% 0058 %% The end product of the sorting is an N x 1 vector of assignments, with any 0059 %% outliers assigned to a cluster with label 0. If you prefer to have the 0060 %% outliers segregated from the data, see HELP SS_AGGREGATE; in this case, 0061 %% the assignment vector will be of size P x 1, where P is the number of 0062 %% spikes not treated as outliers, and the WAVEFORMS and SPIKETIMES variables 0063 %% will also be modified to have P rows. The remaining spikes (i.e., the 0064 %% outliers) can then be found in a new field called OUTLIERS. 0065 %% The assignments are numerical labels with no inherent meaning; their only 0066 %% interpretation is that spikes labeled with, e.g., 1 belong to the same cluster. 0067 %% 0068 %% If the input information is collected into a Matlab structure called 'spikes', 0069 %% the following code will produce the output assignments vector: 0070 % spikes = ss_dejitter(spikes); 0071 % spikes = ss_outliers(spikes); 0072 % spikes = ss_kmeans(spikes); 0073 % spikes = ss_energy(spikes); 0074 % spikes = ss_aggregate(spikes); 0075 % assignments = spikes.hierarchy.assigns; 0076 %% 0077 %% If this seems cryptic, read on ... 0078 %% 0079 %% 0080 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% STEP-BY-STEP %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0081 %% The above code will sort the spikes but provides little feedback along 0082 %% the way. Until this is all wrapped into a GUI, effective sorting requires 0083 %% the use of the included command line tools to give you a sense for how 0084 %% the data look. Note that this some of these steps might be easier to 0085 %% follow if you browse through the Fee et al. paper first. 0086 %% 0087 %% If you have put your own data into a 'spikes' data structure as described 0088 %% above, you can run the code on your own data. This code, however, includes 0089 %% three demonstration data sets chosen to highlight the functionality. i 0090 %% suggest that that you try one of these data sets first to make sure that 0091 %% things run the same on your computer as they do on mine. 0092 0093 %% *** The directory containing this file, the supporting code, the demo data 0094 %% and all of its subdirectories must be added to the Matlab path. The 0095 %% following line accomplishes this if you haven't done it already. 0096 % addpath(genpath(pwd), '-end'); 0097 %% You can also just run the startup.m script in this directory. 0098 0099 %%%%% LOADING DATA 0100 % If you are using your own data, load it into a variable named 'spikes' now. 0101 % If you want to use the demo data, you can just uncomment and run one of the 0102 % following lines: 0103 0104 % load spikes1; % Real data (barrel cortex). Good sig to noise but electrode drifts 3/4 way through. 0105 % load spikes2; % Simulated data (Bionics simulator). High sig to noise (ideal case). 0106 % load spikes3; % Real data (barrel cortex). Mediocre sig to noise; tough to sort manually. 0107 0108 0109 %%%%% LOOKING AT THE RAW DATA 0110 % The following commands plot the raw data in two ways (there are, of course, 0111 % many others). The top plot is simply all of the raw waveforms overlaid on top 0112 % of one another. If you have more than 100-200 spikes, it becomes impossible to 0113 % determine the density of the spikes, however. So the bottom plot displays the 0114 % same information but as a density after binning in time/voltage (see help for 0115 % 'histxt'). The demo file 'spikes2' is a good example of why this helps. 0116 figure(1); colormap jet; 0117 subplot(2,1,1); plot(spikes.waveforms'); axis tight; title('Raw Data'); 0118 subplot(2,1,2); histxt(spikes.waveforms); h = gca; 0119 0120 % You can also use the following two functions at any point throughout the 0121 % following code. Try running them at various points to visualize the data 0122 % cluster in 2-D or 3-D as it is sorted. Also try: 0123 % Click on the axis labels to change the features used for the projection. 0124 % Double-click outside the plot to draw the current data as a density. 0125 % For SSG_DATABROWSE2D, clicking on a data point will make that point's 0126 % cluster easier to see. 0127 % For SSG_DATABROWSE3D, clicking and dragging on the axis allows you to 0128 % rotate the data in 3 dimensions. 0129 ssg_databrowse2d(spikes); 0130 ssg_databrowse3d(spikes); 0131 0132 %%%%% ALIGNING SPIKES 0133 % Noise on the electrode can jitter the exact time at which threshold crossing 0134 % occurs. If you are using this method to extract your spikes, this can be 0135 % a significant source of variability for spikes from the same neuron. The 0136 % next step 'de-jitters' the data by aligning all spikes on the same sample. 0137 % (Note: this requires THRESHT and THRESHV to be defined as described above). 0138 % The data are then replotted just as they were in figure 1. Put both plots 0139 % side by side. The density plot for the dejittered data should look tighter. 0140 spikes = ss_dejitter(spikes); 0141 figure(2); colormap jet; 0142 subplot(2,1,1); plot(spikes.waveforms'); axis tight; title('Centered Data'); 0143 subplot(2,1,2); histxt(spikes.waveforms); clim = get(gca, 'CLim'); if(ishandle(h)), set(h, 'CLim', clim); end; 0144 0145 %%%%% REMOVING OUTLIERS 0146 % The density estimation techniques used in spike sorting routines are typically 0147 % not robust -- that is, they are sensitive to outliers -- so we need to remove 0148 % outliers. 'outliers' are spikes that look so much unlike the other events in 0149 % the data that they cannot be sorted reliably. This does not mean that they are 0150 % not interesting (they often include overlapping spikes/doublets in addition to 0151 % electrical artifact), it just means that you'll have to look at them by hand. 0152 % The following code removes the worst offenders. See the help for 'ss_outliers' 0153 % if you want to be more/less conservative. (We don't replot density because 0154 % this won't change much since outliers are only a small percent of the data). 0155 spikes = ss_outliers(spikes); 0156 figure(3); 0157 plot(spikes.waveforms'); axis tight; title('Centered Data w/ Outliers Removed'); 0158 % (note: if you want to look at the outliers themselves, you can do this as follows:) 0159 % figure(9); plot(spikes.outliers.waveforms'); axis tight; 0160 % In this plot, some of the spikes might not look like outliers. This is often 0161 % a visual artifact arising from plotting overlapping waveforms -- try looking at 0162 % them one by one before resorting to the following: 0163 % NOTE: If the above function is removing spikes that you do not consider 0164 % outliers, you have two choices. You can manually assign these waveforms 0165 % to clusters after the automatic clustering is done or you can lower the 0166 % sensitivity of the outliers function. Lowering the sensitivity may 0167 % cause problems, though, because if severe enough outliers are left in 0168 % the data, they will confuse the quality of the automatic clustering. 0169 % With that warning, to change sensitivity, set the following: 0170 % spikes.options.outlier_cutoff = cutoff; 0171 % before the ss_outliers call above. Cutoff is normally (1 - 1/M), 0172 % where M is the total number of spikes; this means that _using the 0173 % outliers statistical heuristics_ on average one spike in the data set 0174 % will be rejected incorrectly. In practice, the number can be anything 0175 % close to 1.0 -- e.g., to make the algorithm throw away fewer waveforms, 0176 % try a cutoff of (1 - 0.0001/M). 0177 0178 %%%%% INITIAL CLUSTERING 0179 % The Fee algorithm deals with possibly non-Gaussian data (e.g., bursting neurons) 0180 % by doing the sorting in two steps. The first step fits many local Gaussian 0181 % spheres to the data to identify groups of spikes with similar shapes; these will 0182 % later be combined into spike assignments. This two step procedure is a good place 0183 % to do a sanity check; do the results of the local clustering look like the 0184 % algorithm is capturing local density? The following code plots the waveforms 0185 % (now colored according to local similarity) and the type of height-width that 0186 % is often used for manual sorting (colored similarly). 0187 spikes = ss_kmeans(spikes); 0188 figure(4); set(gcf, 'Renderer', 'OpenGL'); 0189 clusterXT(spikes, spikes.overcluster.assigns); title('Local Clusters'); 0190 % (note: this is a good time to check stationarity of your data. For example, 0191 % the data set in 'spikes1' contains electrode drift towards the end of the 0192 % record. You can see this be looking at the local clusters vs. time:) 0193 % figure(9); plot(spikes.spiketimes, spikes.overcluster.assigns, '.'); xlabel('Time (sec)'); ylabel('Cluster #'); 0194 % (note: you can get much of the information from the above two plots by re-running 0195 % the SSG_DATABROWSE functions. e.g.,) 0196 % ssg_databrowse2d(spikes); 0197 0198 %%%%% FINAL CLUSTERING 0199 % The local clusters are now combined into larger groups corresponding (with luck) 0200 % to neurons. This step is typically the most time consuming (especially for 0201 % larger data sets). After the aggregation, the final results are summarized in 0202 % a series of plots. 0203 % Figure 5 contains one row of graphs for each final cluster. The left plot is 0204 % a density image and the two right plots are interspike interval histograms 0205 % (at two time scales). If the clustering is good, there should be few events 0206 % with interspike intervals less than 2 msec; the ISI score displayed in the 0207 % middle column reflects this (smaller numbers are better). 0208 % Figure 6 looks like Figure 4 from above, but recolored after aggregation. 0209 % The bottom plot also contains a legend matching the colors to the cluster # 0210 % labels in Figure 5. 0211 % Figure 7 shows how the local clusters were combined into the final clusters. 0212 % The color and length of the lines in this tree are described in the help 0213 % text for SS_AGGREGATE. 0214 % Figure 8 shows cross-correlations in spike timing between the final clusters. 0215 spikes = ss_energy(spikes); spikes = ss_aggregate(spikes); 0216 figure(5); colormap jet; 0217 showclust(spikes, spikes.hierarchy.assigns); 0218 figure(6); set(gcf, 'Renderer', 'OpenGL'); 0219 clusterXT(spikes, spikes.hierarchy.assigns); title('Final Clusters'); 0220 figure(7); 0221 aggtree(spikes); title('Aggregation Tree'); 0222 figure(8); 0223 correlations(spikes); title('Auto- and Cross- Correlations'); 0224 0225 % This is, once again, a good time to run the SSG_DATABROWSE function to look at 0226 % projections of the clustered data and to check if the results seem reasonable. 0227 % ssg_databrowse3d(spikes); 0228 0229 0230 % (note: For neurons with relatively high firing rates, the algorithm can watch 0231 % for refractory periods to determine when to stop the aggregation. Since 0232 % this does not work well for neurons with very low firing rates, it can 0233 % sometimes make mistakes towards the end of the aggregation, joining 0234 % clusters that should not be joined or vice versa. A person looking at 0235 % figures 5-7 can usually see this fairly quickly. The problem can be 0236 % corrected using the following types of commands:) 0237 % spikes = merge_clusters(spikes, clusternumber1, clusternumber2); 0238 % spikes = split_cluster(spikes, clusternumber); 0239 % After you do this, replot figures 5-7 above and see if things look better). 0240 0241 %%%%% Getting the cluster assignments 0242 % The vector of cluster assignments can be found in the following location: 0243 % spikes.hierarchy.assigns