Respiratory Admission Data for Glasgow 2008 to 2011

Downloads: respiratory_glasgow.arff, respiratory_glasgow.csv, respiratory_glasgow.RData

If you use this data set in publications please cite
Anderson, C., Lee, D., and Dean, N. (2014) Identifying clusters in Bayesian disease mapping. Biostatistics, 15(3), pp. 457-469

Summary of the data

Contributor:
Nema Dean
Source:
Scottish Neighbourhood Statistics (http://www.sns.gov.uk) NHS Duncan Lee, University of Glasgow (Duncan.Lee@glasgow.ac.uk)
License:
ODbL Open Database License

General information about the data

Abstract:
The data has observed and expected counts of hospital admissions with respiratory disease for the Intermediate Geography (IG) areas in the Greater Glasgow and Clyde Health board from 2008 to 2011 as well as a binary neighbourhood matrix.
Subject matter background:
The goal for the analysis of this data is to identify spatially contiguous clusters with cluster specific respiratory disease risk that differs across different clusters. It arises in the area of disease mapping. The objects for clustering are the 271 administrative units called Intermediate Geographies (IGs), which make up a partition of the Greater Glasgow and Clyde health board. Each IG has a population between 2,244 and 10,877 (median of 4,239). The observed hospital admission counts are individuals admitted in the calendar year in question with a primary diagnosis of respiratory disease corresponding to the International Classification of Disease tenth revision codes J00-J99 and R09.1. The expected counts are calculated using external standardization (based on age and sex adjusted rates for Scotland as a whole). The main time point of interest is 2011 but information on expected and observed count are available for previous years 2008 to 2010 as well as that year.
Data structure:
The first column gives the identifiers for each Intermediate Geography from the Greater Glasgow and Clyde health board. The second through fifth columns give the observed hospital admission counts of individuals admitted for each year from 2008 to 2011 with a primary diagnosis of respiratory disease corresponding to the International Classification of Disease tenth revision codes J00-J99 and R09.1. The sixth through ninth columns give the expected respiratory admission counts for each year from 2008 to 2011 calculated using external standardization (based on age and sex adjusted rates for Scotland as a whole). The tenth through two hundred and eightieth columns give the binary neighbourhood matrix for the 271 Intermediate Geographies (IGs), where 1 indicates pairs of IGs sharing a common border and 0 indicates no shared border.
Data objects and variables:
The objects are the 271 Intermediate Geographies from the Greater Glasgow and Clyde health board. There are no external variables not to be used for clustering.
Data values:
The counts, both expected and observed are non-negative numbers (whole numbers for the observed counts, real numbers for the expected counts). The neighbourhood matrix is made up of 0's and 1's. There are no missing values.
Preprocessing:
The common measure of analysis is either the Standardised Incidence Ratio (SIR) which is the ratio of observed to expected counts or the log(SIR). The SIR gives a raw estimate of disease risk for each area. The log is sometimes used to normalise the data. These transformations have not been applied to the current data.
Other relevant papers:
The cluster results from previous publications are not available in the current dataset.
Justification for clustering:
The identification of spatially contiguous clusters with high respiratory risk will enable further investigation into the potential causes of this elevated risk and allow for future interventions.

Internal criteria for clustering quality: cluster membership

Number of clusters
Smaller numbers of clusters are desirable from a practical point of view (with a view to intervention policies and cost).
Nature of clusters
Crisp clustering is preferred, as ambiguities in cluster membership would make practical follow-up decisions more difficult.
All objects clustered or not
All objects should be clustered but singleton clusters (or outliers) would be allowed. Again the reasoning is down to use in terms of decisions for future investigation/intervention.

Internal criteria for clustering quality: within and between cluster features

Ground for objects to belong to the same cluster:
Objects belonging to the same cluster should be spatially contiguous (i.e. share at least one common border with a member of the same cluster). Objects in the same cluster should have similar disease risk.
Weight: 4, moderately important
Stability of clustering:
Clusters should be stable with respect to outliers.
Weight: 3, of some importance