Home > chronux_1_50 > spikesort > SpikeSortingDemo.m

SpikeSortingDemo

PURPOSE ^

% Demonstration code for the spike sorter derived from the Fee et al.

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

% Demonstration code for the spike sorter derived from the Fee et al.
% (J. Neurosci Meth (1996) 69: 175-88) algorithm.  Requires Matlab
% version 6.1 or later and the Statistics Toolbox.
%                                          Last Modified: 8/13/06
%                                          samar mehta, sbmehta@ucsd.edu
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% INTRO %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The sorting procedure itself is fairly easy to perform (although it is
%  currently mostly command line driven -- not much GUI yet -- and you 
%  have to perform your own spike detection to create the inputs described
%  below).  Automated spike sorting, however, always requires a human
%  'sanity check' and can sometimes also need intervention.  To this end,
%  several visualization tools are included.  This file steps through an
%  extended example of their use.
%
% We start with an overview of how to sort if you want to get started
%  quickly.  This is followed by example code and example data which
%  will hopefully clarify the details.  The example does not, however,
%  cover everything that the code can do . . . if there is something that
%  you feel is not working or information that you feel would be useful,
%  please feel free to look at the code itself (which is fully documented)
%  and to use HELP.  (or email sbmehta@ucsd.edu)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% OVERVIEW %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The input to the sorting code is a Matlab structure with the following
%  fields:    {WAVEFORMS, SPIKETIMES, FS, THRESHT, THRESHV}
%     WAVEFORMS is an N x M matrix of spike data, where each row contains
%      the voltage values from a single spike (this can be from a single
%      electrode or e.g., the concatenated data from a tetrode).  More
%      generally, any feature (not necessarily voltage) can appear in this
%      matrix, although it is your responsibility to make sure that they
%      are normalized such that differing units between features do not
%      dominate the variance of the data.
%     SPIKETIMES is an N x 1 vector of spike times (in seconds).  The nth
%      entry should correspond to the nth row of 'waveforms'.
%     FS is the sampling rate (in Hz) of the recording.
%     THRESHT is the column index in WAVEFORMS where threshold crossing
%      occurred (i.e., we are assuming the the spikes were extracted from a
%      raw data trace via threshold detection).  For example, if the THRESHT
%      is 10 and WAVEFORMS is N x 30, then the 10th column of WAVEFORMS
%      contains the sample that crossed threshold, with 9 samples before and
%      20 samples after.  The threshold detection must be done in such a way
%      that no two detected spikes are ever less than M samples apart.
%     THRESHV is a pair of numbers corresponding to the high and low voltage
%      thresholds used to extract the spikes.  If only one threshold was used,
%      the other can be given as +/- Inf.  For example, [-50, Inf] means that
%      spikes were detected when the voltage dropped below -50 (arbitrary units)
%      and no positive going threshold was defined.
%  The WAVEFORMS, SPIKETIMES and FS are required.  The code will run
%   if THRESHT and THRESHV are not defined, but the dejittering step will
%   not work and the height/width graphical displays may produce errors.
%  Sorting options can also be defined in the SPIKES structure to direct the
%   algorithm.  The currently defined options are:
%     SPIKES.OPTIONS.OUTLIER_CUTOFF     (see help for SS_OUTLIERS)
%     SPIKES.OPTIONS.REFRACTORY_PERIOD  (see help for SS_AGGREGATE)
%   These are not mandatory; if not defined, default values are used.
%
% The end product of the sorting is an N x 1 vector of assignments, with any
%   outliers assigned to a cluster with label 0.  If you prefer to have the
%   outliers segregated from the data, see HELP SS_AGGREGATE; in this case,
%   the assignment vector will be of size P x 1, where P is the number of
%   spikes not treated as outliers, and the WAVEFORMS and SPIKETIMES variables
%   will also be modified to have P rows.  The remaining spikes (i.e., the
%   outliers) can then be found in a new field called OUTLIERS.
%   The assignments are numerical labels with no inherent meaning; their only
%   interpretation is that spikes labeled with, e.g., 1 belong to the same cluster.
%
% If the input information is collected into a Matlab structure called 'spikes',
%  the following code will produce the output assignments vector:
          spikes = ss_dejitter(spikes);
          spikes = ss_outliers(spikes);
          spikes = ss_kmeans(spikes);
          spikes = ss_energy(spikes);
          spikes = ss_aggregate(spikes);
          assignments = spikes.hierarchy.assigns;
%
% If this seems cryptic, read on ...
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% STEP-BY-STEP %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The above code will sort the spikes but provides little feedback along
%  the way.  Until this is all wrapped into a GUI, effective sorting requires
%  the use of the included command line tools to give you a sense for how
%  the data look.  Note that this some of these steps might be easier to
%  follow if you browse through the Fee et al. paper first.
%
% If you have put your own data into a 'spikes' data structure as described
%  above, you can run the code on your own data.  This code, however, includes
%  three demonstration data sets chosen to highlight the functionality.  i
%  suggest that that you try one of these data sets first to make sure that
%  things run the same on your computer as they do on mine.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 %% Demonstration code for the spike sorter derived from the Fee et al.
0002 %% (J. Neurosci Meth (1996) 69: 175-88) algorithm.  Requires Matlab
0003 %% version 6.1 or later and the Statistics Toolbox.
0004 %%                                          Last Modified: 8/13/06
0005 %%                                          samar mehta, sbmehta@ucsd.edu
0006 %%
0007 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% INTRO %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0008 %% The sorting procedure itself is fairly easy to perform (although it is
0009 %%  currently mostly command line driven -- not much GUI yet -- and you
0010 %%  have to perform your own spike detection to create the inputs described
0011 %%  below).  Automated spike sorting, however, always requires a human
0012 %%  'sanity check' and can sometimes also need intervention.  To this end,
0013 %%  several visualization tools are included.  This file steps through an
0014 %%  extended example of their use.
0015 %%
0016 %% We start with an overview of how to sort if you want to get started
0017 %%  quickly.  This is followed by example code and example data which
0018 %%  will hopefully clarify the details.  The example does not, however,
0019 %%  cover everything that the code can do . . . if there is something that
0020 %%  you feel is not working or information that you feel would be useful,
0021 %%  please feel free to look at the code itself (which is fully documented)
0022 %%  and to use HELP.  (or email sbmehta@ucsd.edu)
0023 %%
0024 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% OVERVIEW %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0025 %% The input to the sorting code is a Matlab structure with the following
0026 %%  fields:    {WAVEFORMS, SPIKETIMES, FS, THRESHT, THRESHV}
0027 %%     WAVEFORMS is an N x M matrix of spike data, where each row contains
0028 %%      the voltage values from a single spike (this can be from a single
0029 %%      electrode or e.g., the concatenated data from a tetrode).  More
0030 %%      generally, any feature (not necessarily voltage) can appear in this
0031 %%      matrix, although it is your responsibility to make sure that they
0032 %%      are normalized such that differing units between features do not
0033 %%      dominate the variance of the data.
0034 %%     SPIKETIMES is an N x 1 vector of spike times (in seconds).  The nth
0035 %%      entry should correspond to the nth row of 'waveforms'.
0036 %%     FS is the sampling rate (in Hz) of the recording.
0037 %%     THRESHT is the column index in WAVEFORMS where threshold crossing
0038 %%      occurred (i.e., we are assuming the the spikes were extracted from a
0039 %%      raw data trace via threshold detection).  For example, if the THRESHT
0040 %%      is 10 and WAVEFORMS is N x 30, then the 10th column of WAVEFORMS
0041 %%      contains the sample that crossed threshold, with 9 samples before and
0042 %%      20 samples after.  The threshold detection must be done in such a way
0043 %%      that no two detected spikes are ever less than M samples apart.
0044 %%     THRESHV is a pair of numbers corresponding to the high and low voltage
0045 %%      thresholds used to extract the spikes.  If only one threshold was used,
0046 %%      the other can be given as +/- Inf.  For example, [-50, Inf] means that
0047 %%      spikes were detected when the voltage dropped below -50 (arbitrary units)
0048 %%      and no positive going threshold was defined.
0049 %%  The WAVEFORMS, SPIKETIMES and FS are required.  The code will run
0050 %%   if THRESHT and THRESHV are not defined, but the dejittering step will
0051 %%   not work and the height/width graphical displays may produce errors.
0052 %%  Sorting options can also be defined in the SPIKES structure to direct the
0053 %%   algorithm.  The currently defined options are:
0054 %%     SPIKES.OPTIONS.OUTLIER_CUTOFF     (see help for SS_OUTLIERS)
0055 %%     SPIKES.OPTIONS.REFRACTORY_PERIOD  (see help for SS_AGGREGATE)
0056 %%   These are not mandatory; if not defined, default values are used.
0057 %%
0058 %% The end product of the sorting is an N x 1 vector of assignments, with any
0059 %%   outliers assigned to a cluster with label 0.  If you prefer to have the
0060 %%   outliers segregated from the data, see HELP SS_AGGREGATE; in this case,
0061 %%   the assignment vector will be of size P x 1, where P is the number of
0062 %%   spikes not treated as outliers, and the WAVEFORMS and SPIKETIMES variables
0063 %%   will also be modified to have P rows.  The remaining spikes (i.e., the
0064 %%   outliers) can then be found in a new field called OUTLIERS.
0065 %%   The assignments are numerical labels with no inherent meaning; their only
0066 %%   interpretation is that spikes labeled with, e.g., 1 belong to the same cluster.
0067 %%
0068 %% If the input information is collected into a Matlab structure called 'spikes',
0069 %%  the following code will produce the output assignments vector:
0070 %          spikes = ss_dejitter(spikes);
0071 %          spikes = ss_outliers(spikes);
0072 %          spikes = ss_kmeans(spikes);
0073 %          spikes = ss_energy(spikes);
0074 %          spikes = ss_aggregate(spikes);
0075 %          assignments = spikes.hierarchy.assigns;
0076 %%
0077 %% If this seems cryptic, read on ...
0078 %%
0079 %%
0080 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% STEP-BY-STEP %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0081 %% The above code will sort the spikes but provides little feedback along
0082 %%  the way.  Until this is all wrapped into a GUI, effective sorting requires
0083 %%  the use of the included command line tools to give you a sense for how
0084 %%  the data look.  Note that this some of these steps might be easier to
0085 %%  follow if you browse through the Fee et al. paper first.
0086 %%
0087 %% If you have put your own data into a 'spikes' data structure as described
0088 %%  above, you can run the code on your own data.  This code, however, includes
0089 %%  three demonstration data sets chosen to highlight the functionality.  i
0090 %%  suggest that that you try one of these data sets first to make sure that
0091 %%  things run the same on your computer as they do on mine.
0092 
0093 %% *** The directory containing this file, the supporting code, the demo data
0094 %%     and all of its subdirectories must be added to the Matlab path.  The
0095 %%     following line accomplishes this if you haven't done it already.
0096 % addpath(genpath(pwd), '-end');
0097 %%     You can also just run the startup.m script in this directory.
0098 
0099 %%%%% LOADING DATA
0100 % If you are using your own data, load it into a variable named 'spikes' now.
0101 % If you want to use the demo data, you can just uncomment and run one of the
0102 % following lines:
0103 
0104 % load spikes1;    % Real data (barrel cortex).  Good sig to noise but electrode drifts 3/4 way through.
0105 % load spikes2;    % Simulated data (Bionics simulator).  High sig to noise (ideal case).
0106 % load spikes3;    % Real data (barrel cortex).  Mediocre sig to noise; tough to sort manually.
0107 
0108 
0109 %%%%% LOOKING AT THE RAW DATA
0110 % The following commands plot the raw data in two ways (there are, of course,
0111 % many others).  The top plot is simply all of the raw waveforms overlaid on top
0112 % of one another.  If you have more than 100-200 spikes, it becomes impossible to
0113 % determine the density of the spikes, however.  So the bottom plot displays the
0114 % same information but as a density after binning in time/voltage (see help for
0115 % 'histxt').  The demo file 'spikes2' is a good example of why this helps.
0116 figure(1); colormap jet;
0117 subplot(2,1,1); plot(spikes.waveforms'); axis tight; title('Raw Data');
0118 subplot(2,1,2); histxt(spikes.waveforms); h = gca;
0119 
0120 % You can also use the following two functions at any point throughout the
0121 % following code.  Try running them at various points to visualize the data
0122 % cluster in 2-D or 3-D as it is sorted.  Also try:
0123 %    Click on the axis labels to change the features used for the projection.
0124 %    Double-click outside the plot to draw the current data as a density.
0125 %    For SSG_DATABROWSE2D, clicking on a data point will make that point's
0126 %        cluster easier to see.
0127 %    For SSG_DATABROWSE3D, clicking and dragging on the axis allows you to
0128 %        rotate the data in 3 dimensions.
0129 ssg_databrowse2d(spikes);
0130 ssg_databrowse3d(spikes);
0131 
0132 %%%%% ALIGNING SPIKES
0133 % Noise on the electrode can jitter the exact time at which threshold crossing
0134 % occurs.  If you are using this method to extract your spikes, this can be
0135 % a significant source of variability for spikes from the same neuron.  The
0136 % next step 'de-jitters' the data by aligning all spikes on the same sample.
0137 % (Note: this requires THRESHT and THRESHV to be defined as described above).
0138 % The data are then replotted just as they were in figure 1.  Put both plots
0139 % side by side.  The density plot for the dejittered data should look tighter.
0140 spikes = ss_dejitter(spikes);
0141 figure(2); colormap jet;
0142 subplot(2,1,1); plot(spikes.waveforms'); axis tight; title('Centered Data');
0143 subplot(2,1,2); histxt(spikes.waveforms); clim = get(gca, 'CLim'); if(ishandle(h)), set(h, 'CLim', clim); end;
0144 
0145 %%%%% REMOVING OUTLIERS
0146 % The density estimation techniques used in spike sorting routines are typically
0147 % not robust -- that is, they are sensitive to outliers -- so we need to remove
0148 % outliers.  'outliers' are spikes that look so much unlike the other events in
0149 % the data that they cannot be sorted reliably.  This does not mean that they are
0150 % not interesting (they often include overlapping spikes/doublets in addition to
0151 % electrical artifact), it just means that you'll have to look at them by hand.
0152 % The following code removes the worst offenders.  See the help for 'ss_outliers'
0153 % if you want to be more/less conservative.  (We don't replot density because
0154 % this won't change much since outliers are only a small percent of the data).
0155 spikes = ss_outliers(spikes);
0156 figure(3); 
0157 plot(spikes.waveforms'); axis tight; title('Centered Data w/ Outliers Removed');
0158 % (note: if you want to look at the outliers themselves, you can do this as follows:)
0159 % figure(9); plot(spikes.outliers.waveforms'); axis tight;
0160 % In this plot, some of the spikes might not look like outliers.  This is often
0161 % a visual artifact arising from plotting overlapping waveforms -- try looking at
0162 % them one by one before resorting to the following:
0163 % NOTE: If the above function is removing spikes that you do not consider
0164 %         outliers, you have two choices.  You can manually assign these waveforms
0165 %         to clusters after the automatic clustering is done or you can lower the
0166 %         sensitivity of the outliers function.  Lowering the sensitivity may
0167 %         cause problems, though, because if severe enough outliers are left in
0168 %         the data, they will confuse the quality of the automatic clustering.
0169 %         With that warning, to change sensitivity, set the following:
0170 %                   spikes.options.outlier_cutoff = cutoff;
0171 %         before the ss_outliers call above.  Cutoff is normally (1 - 1/M),
0172 %         where M is the total number of spikes; this means that _using the
0173 %         outliers statistical heuristics_ on average one spike in the data set
0174 %         will be rejected incorrectly.  In practice, the number can be anything
0175 %         close to 1.0 -- e.g., to make the algorithm throw away fewer waveforms,
0176 %         try a cutoff of (1 - 0.0001/M).
0177 
0178 %%%%% INITIAL CLUSTERING
0179 % The Fee algorithm deals with possibly non-Gaussian data (e.g., bursting neurons)
0180 % by doing the sorting in two steps.  The first step fits many local Gaussian
0181 % spheres to the data to identify groups of spikes with similar shapes; these will
0182 % later be combined into spike assignments.  This two step procedure is a good place
0183 % to do a sanity check; do the results of the local clustering look like the
0184 % algorithm is capturing local density?  The following code plots the waveforms
0185 % (now colored according to local similarity) and the type of height-width that
0186 % is often used for manual sorting (colored similarly).
0187 spikes = ss_kmeans(spikes);
0188 figure(4);  set(gcf, 'Renderer', 'OpenGL');
0189 clusterXT(spikes, spikes.overcluster.assigns);  title('Local Clusters');
0190 % (note: this is a good time to check stationarity of your data.  For example,
0191 %         the data set in 'spikes1' contains electrode drift towards the end of the
0192 %         record.  You can see this be looking at the local clusters vs. time:)
0193 % figure(9); plot(spikes.spiketimes, spikes.overcluster.assigns, '.'); xlabel('Time (sec)'); ylabel('Cluster #');
0194 % (note: you can get much of the information from the above two plots by re-running
0195 %         the SSG_DATABROWSE functions.  e.g.,)
0196 % ssg_databrowse2d(spikes);
0197 
0198 %%%%% FINAL CLUSTERING
0199 % The local clusters are now combined into larger groups corresponding (with luck)
0200 % to neurons.  This step is typically the most time consuming (especially for
0201 % larger data sets).  After the aggregation, the final results are summarized in
0202 % a series of plots.
0203 % Figure 5 contains one row of graphs for each final cluster.  The left plot is
0204 % a density image and the two right plots are interspike interval histograms
0205 % (at two time scales).  If the clustering is good, there should be few events
0206 % with interspike intervals less than 2 msec; the ISI score displayed in the
0207 % middle column reflects this (smaller numbers are better).
0208 % Figure 6 looks like Figure 4 from above, but recolored after aggregation.
0209 % The bottom plot also contains a legend matching the colors to the cluster #
0210 % labels in Figure 5.
0211 % Figure 7 shows how the local clusters were combined into the final clusters.
0212 % The color and length of the lines in this tree are described in the help
0213 % text for SS_AGGREGATE.
0214 % Figure 8 shows cross-correlations in spike timing between the final clusters.
0215 spikes = ss_energy(spikes); spikes = ss_aggregate(spikes);
0216 figure(5); colormap jet;
0217 showclust(spikes, spikes.hierarchy.assigns);
0218 figure(6); set(gcf, 'Renderer', 'OpenGL');
0219 clusterXT(spikes, spikes.hierarchy.assigns); title('Final Clusters');
0220 figure(7); 
0221 aggtree(spikes); title('Aggregation Tree');
0222 figure(8);
0223 correlations(spikes);  title('Auto- and Cross- Correlations');
0224 
0225 % This is, once again, a good time to run the SSG_DATABROWSE function to look at
0226 % projections of the clustered data and to check if the results seem reasonable.
0227 % ssg_databrowse3d(spikes);
0228 
0229 
0230 % (note: For neurons with relatively high firing rates, the algorithm can watch
0231 %         for refractory periods to determine when to stop the aggregation.  Since
0232 %         this does not work well for neurons with very low firing rates, it can
0233 %         sometimes make mistakes towards the end of the aggregation, joining
0234 %         clusters that should not be joined or vice versa.  A person looking at
0235 %         figures 5-7 can usually see this fairly quickly.  The problem can be
0236 %         corrected using the following types of commands:)
0237 % spikes = merge_clusters(spikes, clusternumber1, clusternumber2);
0238 % spikes = split_cluster(spikes, clusternumber);
0239 %         After you do this, replot figures 5-7 above and see if things look better).
0240 
0241 %%%%% Getting the cluster assignments
0242 % The vector of cluster assignments can be found in the following location:
0243 %                          spikes.hierarchy.assigns

Generated on Mon 09-Oct-2006 00:54:52 by m2html © 2003