Home > chronux_1_15 > spikesort > helper > compareAssignments.m

compareAssignments

PURPOSE ^

COMPAREASSIGNMENTS Computes a quality measure of the similarity between assignments.

SYNOPSIS ^

function [score, scoreIndep, p] = compareAssignments(assigns1, assigns2, showTables)

DESCRIPTION ^

 COMPAREASSIGNMENTS   Computes a quality measure of the similarity between assignments.

 [score, scoreIfIndependent] = compareAssignments(assignments1, assignments2)
     The inputs, 'assignments1' and 'assignments2', must be two column vectors of
     the same length where each row contains an integer category label for the
     corresponding sample.  The integer labels used in the assignment vectors need
     have no intrinsic meaning (in particular, e.g., category 1 in 'assignments1'
     has no relationship to category 1 in 'assignments2').

     The first output, 'score', is a scalar between 0 and 1 that measures the
     similarity between the two classifications.  A 'score' of 1 implies perfect
     correspondance, ignoring actual labels.  For example, if all samples in
     'assignments1' are labelled by 1 and relabelled as 2 in 'assignments2', the
     'score' would be 1.  Deviations from this correspondance are penalized in a
     a fashion that recognizes category splitting/merging and penalizes these less
     than completely random redistribution.

     The algorithm is motivated by a Chi^2 two-way classification; however, here we
     return a similarity score rather than simply testing the hypothesis that the
     classifications are independent.  The expected score if the classifications
     were independent is returned as the second output, 'scoreIfIndependent', with
     the standard Chi^2 two-way p-value returned as an optional third output (this
     requires the statistics toolbox).  This p-value represents the probability that
     the two assignments were independent.

     Conceptually (though not computationally), the algorithm considers all N*(N-1)
     pairs of data samples and counts pairs that cosegregate, where a pair of samples
     is defined as cosegregating if they either share the same category in both
     assignments or if they do not share category in either assignment.  For example,
     consider the following assignments:
            sample #          assignments1         assignments2
               1                   1                     2
               2                   1                     2
               3                   2                     3
               4                   1                     3
     The pairs (1,2) and (1,3) cosegregate while the pair (1,4) does not (since they
     share a label in 'assignments1' but not in 'assignments2').  'score' is the fraction
     of pairs that cosegregate between the two assignments.

     (An optional third boolean input argument 'showTables' (default 0) produces a graphical
     output with the contingency table, conditional probabilities and marginals for the
     assignments.  The 'score' described above is calculated efficiently using these matrices).

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [score, scoreIndep, p] = compareAssignments(assigns1, assigns2, showTables)
0002 
0003 % COMPAREASSIGNMENTS   Computes a quality measure of the similarity between assignments.
0004 %
0005 % [score, scoreIfIndependent] = compareAssignments(assignments1, assignments2)
0006 %     The inputs, 'assignments1' and 'assignments2', must be two column vectors of
0007 %     the same length where each row contains an integer category label for the
0008 %     corresponding sample.  The integer labels used in the assignment vectors need
0009 %     have no intrinsic meaning (in particular, e.g., category 1 in 'assignments1'
0010 %     has no relationship to category 1 in 'assignments2').
0011 %
0012 %     The first output, 'score', is a scalar between 0 and 1 that measures the
0013 %     similarity between the two classifications.  A 'score' of 1 implies perfect
0014 %     correspondance, ignoring actual labels.  For example, if all samples in
0015 %     'assignments1' are labelled by 1 and relabelled as 2 in 'assignments2', the
0016 %     'score' would be 1.  Deviations from this correspondance are penalized in a
0017 %     a fashion that recognizes category splitting/merging and penalizes these less
0018 %     than completely random redistribution.
0019 %
0020 %     The algorithm is motivated by a Chi^2 two-way classification; however, here we
0021 %     return a similarity score rather than simply testing the hypothesis that the
0022 %     classifications are independent.  The expected score if the classifications
0023 %     were independent is returned as the second output, 'scoreIfIndependent', with
0024 %     the standard Chi^2 two-way p-value returned as an optional third output (this
0025 %     requires the statistics toolbox).  This p-value represents the probability that
0026 %     the two assignments were independent.
0027 %
0028 %     Conceptually (though not computationally), the algorithm considers all N*(N-1)
0029 %     pairs of data samples and counts pairs that cosegregate, where a pair of samples
0030 %     is defined as cosegregating if they either share the same category in both
0031 %     assignments or if they do not share category in either assignment.  For example,
0032 %     consider the following assignments:
0033 %            sample #          assignments1         assignments2
0034 %               1                   1                     2
0035 %               2                   1                     2
0036 %               3                   2                     3
0037 %               4                   1                     3
0038 %     The pairs (1,2) and (1,3) cosegregate while the pair (1,4) does not (since they
0039 %     share a label in 'assignments1' but not in 'assignments2').  'score' is the fraction
0040 %     of pairs that cosegregate between the two assignments.
0041 %
0042 %     (An optional third boolean input argument 'showTables' (default 0) produces a graphical
0043 %     output with the contingency table, conditional probabilities and marginals for the
0044 %     assignments.  The 'score' described above is calculated efficiently using these matrices).
0045 
0046 %   Last Modified By: sbm on Thu Jun  2 17:25:54 2005
0047 
0048 if ((size(assigns1, 2) > 1) | (size(assigns2, 2) > 1) | (size(assigns1,1) ~= size(assigns2, 1)))
0049     error('Error in assignment vectors.  The first two inputs must be column vectors of equal length.');
0050 end
0051 
0052 if ((nargin < 3) | (showTables == 0))    % if we're not doing graphics, this is more memory efficient.
0053     assigns1 = sortassignments(assigns1);
0054     assigns2 = sortassignments(assigns2);
0055     showTables = 0;
0056 end
0057 
0058 s = warning('off');
0059 
0060 numSamples = size(assigns1, 1);
0061 numCategories1 = length(unique(assigns1));
0062 numCategories2 = length(unique(assigns2));
0063 
0064 %  Construct classification table and marginals
0065 joint = full(sparse(assigns1, assigns2, 1, max(assigns1), max(assigns2))) ./ numSamples;
0066 marginal1 = sum(joint, 2);
0067 marginal2 = sum(joint, 1);
0068 
0069 % This somewhat cryptic expression computes the score described above.  i'll comment it
0070 % later to explain.
0071 score = (2 * joint(:)' * joint(:)) - sum(sum(joint' * joint)) - sum(sum(joint * joint'));
0072 score = 1 + (numSamples / (numSamples - 1)) * score;
0073 
0074 % Now get the score expected if the classifications were independent; we do this by
0075 % reconstructing a joint under the assumption of independent classifications (i.e.,
0076 % p(x,y) = p(x)p(y)) and then using the same mystery expression to find the score.
0077 jointIndep = (marginal1 * marginal2);
0078 scoreIndep = (2 * jointIndep(:)' * jointIndep(:)) ...
0079              - sum(sum(jointIndep' * jointIndep)) - sum(sum(jointIndep * jointIndep'));
0080 scoreIndep = 1 + (numSamples / (numSamples-1)) * scoreIndep;
0081 
0082 % if a p-value was requested, compute Chi^2
0083 if (nargout > 2)
0084     X2 = numSamples .* [((joint - jointIndep).^2)./jointIndep];  % chi^2
0085     X2(isnan(X2)) = 0;  % (clean up divide by zeros)
0086     X2 = sum(X2(:));
0087     df = (numCategories1 - 1) * (numCategories2 - 1);  % degrees of freedom
0088     p = 1 - chi2cdf(X2,df);
0089 end
0090 
0091 % Optional graphical output
0092 if (showTables)
0093     % construct conditional tables
0094     oneGivenTwo = joint ./ repmat(marginal2, [size(joint,1), 1]);
0095     oneGivenTwo(find(isnan(oneGivenTwo))) = 0;  % (deal with divide by zeros)
0096     twoGivenOne = joint ./ repmat(marginal1, [1, size(joint,2)]);
0097     twoGivenOne(find(isnan(twoGivenOne))) = 0; % (deal with divide by zeros)
0098 
0099     figure;
0100     subplot(2,2,1);  imagesc(joint);
0101     title('Two-Way Classification Table'); ylabel('Assignments 1'); xlabel('Assignments 2');
0102     subplot(2,2,2);  imagesc(oneGivenTwo);
0103     title('Assignments 1  given  Assignments 2'); ylabel('Assignments 1'); xlabel('Assignments 2');
0104     subplot(2,2,3);  imagesc(twoGivenOne);
0105     title('Assignments 2  given  Assignments 1'); ylabel('Assignments 1'); xlabel('Assignments 2');
0106     subplot(4,2,6);  bar(marginal1); axis tight;
0107     title('Assignments 1 Marginal');
0108     subplot(4,2,8);  bar(marginal2); axis tight;
0109     title('Assignments 2 Marginal');
0110     pixval on;
0111 end
0112 
0113 warning(s);

Generated on Tue 15-Aug-2006 22:51:57 by m2html © 2003