


SS_AGGREGATE ISI-restricted heirarchical cluster aggregation.
SPIKES = SS_AGGREGATE(SPIKES) takes and returns a spike-sorting object
SPIKES after aggregating clusters (requires a previous overclustering
and an interface energy matrix calculation). The aggregation tree is
stored in SPIKES.HIERARCHY.TREE and the new assignments are stored in
SPIKES.HIERARCHY.ASSIGNS.
The algorithm computes a similarity/connection matrix using the interface
energy. It then chooses the cluster pair with the highest connection
strength, aggregates them, recalculates connection strengths, and then
repeats the process. Cluster aggregation is contingent on passing an
interspike interval (ISI) test if SPIKES.SPIKETIMES is defined; if this
test is not passed, the pair is not aggregated and aggregation continues.
Aggregation stops when either (1) all remaining pairs fail the ISI test
or (2) the connection strength drops below a (heuristic) cutoff of 0.01.
The ISI test needs an idea of the expected refractory period to test
for violations; as a default, it uses 1.5 times the width of the
waveforms in the SPIKES structure. To override this, define the
value SPIKES.OPTIONS.REFRACTORY_PERIOD with units of seconds (e.g.,
set to 0.0017 to indicate an expected refractory period of 1.7 msec).
The SPIKES.HIERARCHY.TREE output is a matrix describing the aggregation.
Each aggregation step entry produces a row, listed in the order they
were performed. The first two columns are the indices of the clusters
that were aggregated; the index assigned to the aggregate for future
entries is the lower of the two original indices. The third column is
the connection strength between the clusters before aggregation and the
fourth column is the isi statistic for the aggregate (0 if isi statistics
are not being used).
After aggregation, outliers that were previously removed are typically
reinserted into the spikes list so that the list aligns with the original
(pre-sorted) list. The outlier waveforms and spike times are thus by
default added back to the SPIKES.WAVEFORMS and SPIKES.SPIKETIMES fields
respectively, after all other aggregation is complete. These waveforms
are assigned the label 0 in SPIKES.HIERARCHY.ASSIGNS and the other,
non-outlier, spikes are renumbered accordingly. To prevent this from
occuring, pass in 0 as the second argument to this function, i.e.,
SPIKES = SS_AGGREGATE(SPIKES, 0);
References:
Fee MS et al (1996). J. Neurosci Methods (69): 175-88

0001 function spikes = ss_aggregate(spikes, reintegrate_outliers) 0002 % SS_AGGREGATE ISI-restricted heirarchical cluster aggregation. 0003 % SPIKES = SS_AGGREGATE(SPIKES) takes and returns a spike-sorting object 0004 % SPIKES after aggregating clusters (requires a previous overclustering 0005 % and an interface energy matrix calculation). The aggregation tree is 0006 % stored in SPIKES.HIERARCHY.TREE and the new assignments are stored in 0007 % SPIKES.HIERARCHY.ASSIGNS. 0008 % 0009 % The algorithm computes a similarity/connection matrix using the interface 0010 % energy. It then chooses the cluster pair with the highest connection 0011 % strength, aggregates them, recalculates connection strengths, and then 0012 % repeats the process. Cluster aggregation is contingent on passing an 0013 % interspike interval (ISI) test if SPIKES.SPIKETIMES is defined; if this 0014 % test is not passed, the pair is not aggregated and aggregation continues. 0015 % Aggregation stops when either (1) all remaining pairs fail the ISI test 0016 % or (2) the connection strength drops below a (heuristic) cutoff of 0.01. 0017 % 0018 % The ISI test needs an idea of the expected refractory period to test 0019 % for violations; as a default, it uses 1.5 times the width of the 0020 % waveforms in the SPIKES structure. To override this, define the 0021 % value SPIKES.OPTIONS.REFRACTORY_PERIOD with units of seconds (e.g., 0022 % set to 0.0017 to indicate an expected refractory period of 1.7 msec). 0023 % 0024 % The SPIKES.HIERARCHY.TREE output is a matrix describing the aggregation. 0025 % Each aggregation step entry produces a row, listed in the order they 0026 % were performed. The first two columns are the indices of the clusters 0027 % that were aggregated; the index assigned to the aggregate for future 0028 % entries is the lower of the two original indices. The third column is 0029 % the connection strength between the clusters before aggregation and the 0030 % fourth column is the isi statistic for the aggregate (0 if isi statistics 0031 % are not being used). 0032 % 0033 % After aggregation, outliers that were previously removed are typically 0034 % reinserted into the spikes list so that the list aligns with the original 0035 % (pre-sorted) list. The outlier waveforms and spike times are thus by 0036 % default added back to the SPIKES.WAVEFORMS and SPIKES.SPIKETIMES fields 0037 % respectively, after all other aggregation is complete. These waveforms 0038 % are assigned the label 0 in SPIKES.HIERARCHY.ASSIGNS and the other, 0039 % non-outlier, spikes are renumbered accordingly. To prevent this from 0040 % occuring, pass in 0 as the second argument to this function, i.e., 0041 % SPIKES = SS_AGGREGATE(SPIKES, 0); 0042 % 0043 % References: 0044 % Fee MS et al (1996). J. Neurosci Methods (69): 175-88 0045 0046 % Last Modified By: sbm on Sun Oct 9 22:53:16 2005 0047 0048 debug = 1; 0049 0050 starttime = clock; 0051 0052 cutoff = 0.01; % arbitrarily stop aggregation when overlap density is < 1% of main cluster density 0053 0054 %%%%% ARGUMENT CHECKING 0055 if (~isfield(spikes, 'hierarchy') | ~isfield(spikes.hierarchy, 'interface_energy')) 0056 error('SS:energy_not_computed', 'An energy matrix must be computed before aggregation.'); 0057 elseif (~isfield(spikes, 'overcluster') | ~isfield(spikes.overcluster, 'assigns')) 0058 error('SS:overcluster_not_computed', 'The data must be overclustered before aggregation.'); 0059 elseif (isfield(spikes, 'spiketimes') & ~isfield(spikes, 'Fs')) 0060 error('SS:bad_stop_condition', 'A sampling frequency Fs must be supplied for an ISI stop condition.'); 0061 end 0062 if (isfield(spikes.hierarchy, 'assigns') && any(spikes.hierarchy.assigns == 0)) 0063 error('SS:aggregate_after_outliers', 'Aggregation can not be performed after outliers are reintegrated into the data.'); 0064 end 0065 if (nargin < 2), reintegrate_outliers = 1; end 0066 0067 tmin = size(spikes.waveforms,2)./spikes.Fs; % minimum possible time btw spikes 0068 if (isfield(spikes, 'options') && isfield(spikes.options, 'refractory_period')) 0069 tref = spikes.options.refractory_period; 0070 else 0071 tref = max(0.002, tmin*1.5); % crude guess as to refractory period 0072 end 0073 0074 %%%%% INITIALIZE A FEW THINGS 0075 assignments = spikes.overcluster.assigns; 0076 interface_energy = spikes.hierarchy.interface_energy; 0077 numclusts = max(assignments); 0078 numpts = full(sparse(assignments, 1, 1, numclusts, 1)); 0079 tree = []; 0080 untested = 3*ones(numclusts); % they're all initially untested 0081 0082 handle_fig = figure; handle_img = imagesc(untested); axis square; 0083 colormap([0 0 0; 0.9 0.4 0.4; 1 1 0; 0.4 0.8 0.2]); % [bk, rd, yl, gn] => (0 combined, 1 not allowed, 2 testing, 3 untested) 0084 %%%%% AGGREGATE HIGHEST CONNECTION STRENGTHS UNTIL ALL TRIED 0085 while (any(any(triu(untested,1)==3))) % only the upper triangular half is meaningful 0086 % compute connection strengths from interface energies 0087 % first, normalize energy: 0088 normalize = ((numpts * numpts') - diag(numpts)); % Off diag: Na*Nb, On diag: Na^2-Na ... 0089 normalize = normalize - diag(0.5*diag(normalize)); % ... and divide diagonal by 2 0090 norm_energy = interface_energy ./ normalize; 0091 % then, compute connection strength 0092 self = repmat(diag(norm_energy), [1,numclusts]); 0093 connect_strength = 2 .* norm_energy ./ (self + self'); 0094 connect_strength = connect_strength .* (1-eye(numclusts)); % diag entries <- 0, so we won't agg clusters with themselves 0095 0096 % Find best remaining pair 0097 remaining = ((untested == 3) .* connect_strength); 0098 best = max(remaining(:)); % highest untested connection strength 0099 0100 if (best < cutoff) % No point continuing if connection strengths have gotten really lousy 0101 break; 0102 end 0103 0104 [clust1 clust2] = find(connect_strength == best); % who're the lucky winners? 0105 first = min(clust1(1),clust2(1)); % if we get 2 best pairs, just take the 1st 0106 second = max(clust1(1),clust2(1)); 0107 untested(first,second) = 2; untested(first,second) = 2; % mark that we're trying this one 0108 set(handle_img, 'CData', triu(untested)); title(['Trying ' num2str(first) ' and ' num2str(second)]); drawnow; 0109 0110 % Is this aggregation allowed? 0111 if (isfield(spikes, 'spiketimes')) % if we were given spike times, use them ... 0112 t1 = spikes.spiketimes(find(assignments == first)); 0113 t2 = spikes.spiketimes(find(assignments == second)); 0114 if (debug) 0115 score = sum([diff(sort([t1; t2])) < tref]) ./ (length(t1) + length(t2)); 0116 allow = (score < 0.05); 0117 scores = [0 0 score]; 0118 else 0119 [allow, scores] = isiQuality(t1, t2, tmin, 0.010, tref, spikes.Fs); 0120 end 0121 else % ... otherwise, there are no restrictions on aggregation 0122 allow = 1; 0123 scores = [0 0 0]; 0124 end 0125 0126 if (allow) % Bookkeeping ... 0127 % Aggregation subsumes the higher index cluster into the lower. Start by adding 0128 % (denormalized) interaction energies for the second (higher index) cluster 0129 % to those of the first and zeroing the old entries of the second. Because of the 0130 % triangular structure of the connection matrix, repeat for both rows and columns ... 0131 interface_energy(first,:) = interface_energy(first,:) + interface_energy(second,:); 0132 interface_energy(second,:) = 0; 0133 interface_energy(:,first) = interface_energy(:,first) + interface_energy(:,second); 0134 interface_energy(:,second) = 0; 0135 interface_energy(second,second) = 1; % keep self-energy at 1 (we may divide by it later) 0136 % since we added rows & columns, some energy values will have spilled over into the 0137 % lower half of the energy matrix (which must be upper triangular). The next 2 steps 0138 % recover those values. 0139 overflow = tril(interface_energy, -1); % spillover below diagonal 0140 interface_energy = interface_energy + overflow' - overflow; % reflect above diagonal 0141 0142 % update counts vector 0143 numpts(first) = numpts(first) + numpts(second); 0144 numpts(second) = 2; % leaving this as 2 prevents div by zero during normalization above 0145 0146 % make a tree entry for the aggregation we just performed 0147 tree = cat(1, tree, [first, second, best, scores(3)]); 0148 0149 % Now actually change the numbers 0150 assignments(find(assignments == second)) = first; 0151 0152 % Finally, indicate that potential aggregations between the new cluster and 0153 % other (nonempty) clusters are untested while pairs involving clusters that 0154 % have already been emptied should not be tested. 0155 untested(first,:) = 3; untested(:,first) = 3; 0156 untested(tree(:,2),:) = 0; untested(:,tree(:,2)) = 0; 0157 else 0158 untested(first,second) = 1; untested(second,first) = 1; 0159 end 0160 end 0161 close(handle_fig); 0162 0163 %spikes.hierarchy.interface_energy_aggregated = interface_energy; 0164 spikes.hierarchy.tree = tree; 0165 spikes.hierarchy.assigns = assignments; 0166 0167 if (reintegrate_outliers && isfield(spikes, 'outliers') && ~isempty(spikes.outliers.badinds)) 0168 % First, we make room by putting all of the non-outliers back into their original places 0169 spikes.waveforms(spikes.outliers.goodinds,:) = spikes.waveforms; 0170 spikes.spiketimes(spikes.outliers.goodinds,:) = spikes.spiketimes; 0171 spikes.hierarchy.assigns(spikes.outliers.goodinds) = spikes.hierarchy.assigns; 0172 0173 % Then we fill in the outliers ... 0174 spikes.waveforms(spikes.outliers.badinds,:) = spikes.outliers.waveforms; 0175 spikes.spiketimes(spikes.outliers.badinds,:) = spikes.outliers.spiketimes; 0176 spikes.hierarchy.assigns(spikes.outliers.badinds) = 0; % ... and add the '0' label. 0177 0178 % We'll also want to add the assignments to the 'overcluster' list (this is 0179 % important for post-clustering splitting). 0180 spikes.overcluster.assigns(spikes.outliers.goodinds) = spikes.overcluster.assigns; 0181 spikes.overcluster.assigns(spikes.outliers.badinds) = 0; 0182 0183 spikes = rmfield(spikes, 'outliers'); % don't need this any more -- its redundant. 0184 end 0185 0186 spikes.tictoc.aggregate = etime(clock, starttime);