Is an external variable (not part of the data used for clustering) to be used to evaluate the clustering result?
Please give substantive justification.
Analysis has pragmatic aim (i.e., the data have to be clustered in order to help with some kind of task, whether this corresponds with a true underlying structure or not): Which external variables are relevant for this pragmatic aim, and which type of relationship should they bear with the clustering?
Examples: clustering should provide good prediction of external variable, clusters should be homogeneous with respect to external variable, external variables should discriminate between clusters, clustering should more or less correspond with one or more known useful clusterings as represented by external variables, prespecified subsets of objects (as characterized by external variables) should be assigned to the same or to different clusters, etc.
Please give a substantive justification.If yes: What are these? Please give substantive justification.
If yes: Please specify your preference. Please give substantive justification.
If yes: Please specify your preference. In case of a preference for not all objects being clustered, please specify which types of objects may not be assigned to any cluster. If objects that are not to be assigned to any cluster are known, please specify which ones. Please give substantive justification.
If yes: Please specify your preference. In case of a preference for overlapping clusters, please specify whether overlapping clusters are required to be nested and whether a hierarchy is required. Please give substantive justification.
If yes: Is there a minimum or maximum cluster size, and, if yes, which one(s)? Should clusters be similar or dissimilar in size, and if yes, in which respect? Please give substantive justification.
If yes: What are these requirements? Small within-cluster dissimilarities (and, if yes, in which respect)? Common pattern of values (and, if yes, which type of pattern)? Other (and, if yes, what form do these requirements take)? Please give substantive justification.
If yes: Should the within-cluster heterogeneity take the form of a particular geometric pattern (and, if yes, which one)? Within-cluster independence of variables or a specific type of within-cluster dependence structure (and, if yes, which one)? Other (and, if yes, what form do these requirements take)? Please give substantive justification.
If yes: What form do these requirements take? E.g., large between-cluster dissimilarities (and, if yes, in which respect)? Separation (and, if yes, of which kind)? Other (and, if yes, what form do these requirements take)? Please give substantive justification.
If yes: Between-cluster differences: What form do these requirements take? E.g., should lie in low-dimensional space, etc. Please give substantive justification.
If yes: What form should this similarity take? Please give substantive justification.
If yes: Please specify in which respect the clustering should be stable. Please give substantive justification.
If yes: Please specify for which population characteristics inferential quality is an issue and in which respect. Please give substantive justification.