Home > chronux_2_00 > spikesort > helper > compareAssignments.m

compareAssignments

PURPOSE ^

COMPAREASSIGNMENTS Computes a quality measure of the similarity between assignments.

SYNOPSIS ^

function [score, scoreIndep, p] = compareAssignments(assigns1, assigns2, showTables)

DESCRIPTION ^

 COMPAREASSIGNMENTS   Computes a quality measure of the similarity between assignments.

 [score, scoreIfIndependent] = compareAssignments(assignments1, assignments2)
     The inputs, 'assignments1' and 'assignments2', must be two column vectors of
     the same length where each row contains an integer category label for the
     corresponding sample.  The integer labels used in the assignment vectors need
     have no intrinsic meaning (in particular, e.g., category 1 in 'assignments1'
     has no relationship to category 1 in 'assignments2').

     The first output, 'score', is a scalar between 0 and 1 that measures the
     similarity between the two classifications.  A 'score' of 1 implies perfect
     correspondance, ignoring actual labels.  For example, if all samples in
     'assignments1' are labelled by 1 and relabelled as 2 in 'assignments2', the
     'score' would be 1.  Deviations from this correspondance are penalized in a
     a fashion that recognizes category splitting/merging and penalizes these less
     than completely random redistribution.

     The algorithm is motivated by a Chi^2 two-way classification; however, here we
     return a similarity score rather than simply testing the hypothesis that the
     classifications are independent.  The expected score if the classifications
     were independent is returned as the second output, 'scoreIfIndependent', with
     the standard Chi^2 two-way p-value returned as an optional third output (this
     requires the statistics toolbox).  This p-value represents the probability that
     the two assignments were independent.

     Conceptually (though not computationally), the algorithm considers all N*(N-1)
     pairs of data samples and counts pairs that cosegregate, where a pair of samples
     is defined as cosegregating if they either share the same category in both
     assignments or if they do not share category in either assignment.  For example,
     consider the following assignments:
            sample #          assignments1         assignments2
               1                   1                     2
               2                   1                     2
               3                   2                     3
               4                   1                     3
     The pairs (1,2) and (1,3) cosegregate while the pair (1,4) does not (since they
     share a label in 'assignments1' but not in 'assignments2').  'score' is the fraction
     of pairs that cosegregate between the two assignments.

     (An optional third boolean input argument 'showTables' (default 0) produces a graphical
     output with the contingency table, conditional probabilities and marginals for the
     assignments.  The 'score' described above is calculated efficiently using these matrices).

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [score, scoreIndep, p] = compareAssignments(assigns1, assigns2, showTables)
0002 
0003 % COMPAREASSIGNMENTS   Computes a quality measure of the similarity between assignments.
0004 %
0005 % [score, scoreIfIndependent] = compareAssignments(assignments1, assignments2)
0006 %     The inputs, 'assignments1' and 'assignments2', must be two column vectors of
0007 %     the same length where each row contains an integer category label for the
0008 %     corresponding sample.  The integer labels used in the assignment vectors need
0009 %     have no intrinsic meaning (in particular, e.g., category 1 in 'assignments1'
0010 %     has no relationship to category 1 in 'assignments2').
0011 %
0012 %     The first output, 'score', is a scalar between 0 and 1 that measures the
0013 %     similarity between the two classifications.  A 'score' of 1 implies perfect
0014 %     correspondance, ignoring actual labels.  For example, if all samples in
0015 %     'assignments1' are labelled by 1 and relabelled as 2 in 'assignments2', the
0016 %     'score' would be 1.  Deviations from this correspondance are penalized in a
0017 %     a fashion that recognizes category splitting/merging and penalizes these less
0018 %     than completely random redistribution.
0019 %
0020 %     The algorithm is motivated by a Chi^2 two-way classification; however, here we
0021 %     return a similarity score rather than simply testing the hypothesis that the
0022 %     classifications are independent.  The expected score if the classifications
0023 %     were independent is returned as the second output, 'scoreIfIndependent', with
0024 %     the standard Chi^2 two-way p-value returned as an optional third output (this
0025 %     requires the statistics toolbox).  This p-value represents the probability that
0026 %     the two assignments were independent.
0027 %
0028 %     Conceptually (though not computationally), the algorithm considers all N*(N-1)
0029 %     pairs of data samples and counts pairs that cosegregate, where a pair of samples
0030 %     is defined as cosegregating if they either share the same category in both
0031 %     assignments or if they do not share category in either assignment.  For example,
0032 %     consider the following assignments:
0033 %            sample #          assignments1         assignments2
0034 %               1                   1                     2
0035 %               2                   1                     2
0036 %               3                   2                     3
0037 %               4                   1                     3
0038 %     The pairs (1,2) and (1,3) cosegregate while the pair (1,4) does not (since they
0039 %     share a label in 'assignments1' but not in 'assignments2').  'score' is the fraction
0040 %     of pairs that cosegregate between the two assignments.
0041 %
0042 %     (An optional third boolean input argument 'showTables' (default 0) produces a graphical
0043 %     output with the contingency table, conditional probabilities and marginals for the
0044 %     assignments.  The 'score' described above is calculated efficiently using these matrices).
0045 
0046 if ((size(assigns1, 2) > 1) || (size(assigns2, 2) > 1) || (size(assigns1,1) ~= size(assigns2, 1)))
0047     error('Error in assignment vectors.  The first two inputs must be column vectors of equal length.');
0048 end
0049 
0050 if ((nargin < 3) || (showTables == 0))    % if we're not doing graphics, this is more memory efficient.
0051     assigns1 = sortassignments(assigns1);
0052     assigns2 = sortassignments(assigns2);
0053     showTables = 0;
0054 end
0055 
0056 s = warning('MATLAB:divideByZero', 'off');
0057 
0058 numSamples = size(assigns1, 1);
0059 numCategories1 = length(unique(assigns1));
0060 numCategories2 = length(unique(assigns2));
0061 
0062 %  Construct classification table and marginals
0063 joint = full(sparse(assigns1, assigns2, 1, max(assigns1), max(assigns2))) ./ numSamples;
0064 marginal1 = sum(joint, 2);
0065 marginal2 = sum(joint, 1);
0066 
0067 % This somewhat cryptic expression computes the score described above.  i'll comment it
0068 % later to explain.
0069 score = (2 * joint(:)' * joint(:)) - sum(sum(joint' * joint)) - sum(sum(joint * joint'));
0070 score = 1 + (numSamples / (numSamples - 1)) * score;
0071 
0072 % Now get the score expected if the classifications were independent; we do this by
0073 % reconstructing a joint under the assumption of independent classifications (i.e.,
0074 % p(x,y) = p(x)p(y)) and then using the same mystery expression to find the score.
0075 jointIndep = (marginal1 * marginal2);
0076 scoreIndep = (2 * jointIndep(:)' * jointIndep(:)) ...
0077              - sum(sum(jointIndep' * jointIndep)) - sum(sum(jointIndep * jointIndep'));
0078 scoreIndep = 1 + (numSamples / (numSamples-1)) * scoreIndep;
0079 
0080 % if a p-value was requested, compute Chi^2
0081 if (nargout > 2)
0082     X2 = numSamples .* (((joint - jointIndep).^2)./jointIndep);  % chi^2
0083     X2(isnan(X2)) = 0;  % (clean up divide by zeros)
0084     X2 = sum(X2(:));
0085     df = (numCategories1 - 1) * (numCategories2 - 1);  % degrees of freedom
0086     p = 1 - chi2cdf(X2,df);
0087 end
0088 
0089 % Optional graphical output
0090 if (showTables)
0091     % construct conditional tables
0092     oneGivenTwo = joint ./ repmat(marginal2, [size(joint,1), 1]);
0093     oneGivenTwo(find(isnan(oneGivenTwo))) = 0;  % (deal with divide by zeros)
0094     twoGivenOne = joint ./ repmat(marginal1, [1, size(joint,2)]);
0095     twoGivenOne(find(isnan(twoGivenOne))) = 0; % (deal with divide by zeros)
0096 
0097     figure;
0098     subplot(2,2,1);  imagesc(joint);
0099     title('Two-Way Classification Table'); ylabel('Assignments 1'); xlabel('Assignments 2');
0100     subplot(2,2,2);  imagesc(oneGivenTwo);
0101     title('Assignments 1  given  Assignments 2'); ylabel('Assignments 1'); xlabel('Assignments 2');
0102     subplot(2,2,3);  imagesc(twoGivenOne);
0103     title('Assignments 2  given  Assignments 1'); ylabel('Assignments 1'); xlabel('Assignments 2');
0104     subplot(4,2,6);  bar(marginal1); axis tight;
0105     title('Assignments 1 Marginal');
0106     subplot(4,2,8);  bar(marginal2); axis tight;
0107     title('Assignments 2 Marginal');
0108     pixval on;
0109 end
0110 
0111 warning(s);

Generated on Fri 15-Aug-2008 11:35:42 by m2html © 2003