About this Analysis

This interactive dashboard accompanies a study exploring the spatial dimensions of local news coverage. We present the results of 25 clustering experiments applied to a set of spatial features derived from the locations mentioned in three continuous years (2020-2022) of news from 358 subnational media outlets from the UK. Each experiment represents a different configuration of input features and clustering algorithms (e.g., k-means, HDBSCAN), and aims to identify distinct types of local news outlets based on the spatial patterns of their reported locations.

Selected Experiment: Minimal Features + K-means

For the main study findings, we selected the 'Minimal - kmeans' experiment, which applies k-means clustering to a reduced set of minimally correlated spatial features. This configuration was chosen for its optimal balance between statistical quality and analytical interpretability. While other experiments achieved higher silhouette scores (particularly DIANA clustering with PCA-reduced features), they often produced severely imbalanced cluster solutions with most outlets concentrated in single dominant groups, limiting their analytical utility.

The minimal feature approach addresses multicollinearity issues inherent in spatial data while preserving the substantive meaning of geographic coverage patterns. This method successfully identified six distinct outlet typologies spanning five orders of magnitude in coverage area: Hyperlocal, Local-Regional, Rural, Metropolitan, Sub-Regional, and National outlets. The k-means algorithm's centroid-based approach proved particularly suitable for capturing the hierarchical nature of spatial news organization, from neighborhood-focused hyperlocals (∼33 km²) to national broadcasters (∼79,000 km²).

Interactive Features

Users can:

  • Select and explore different experiments, comparing PCA-based and feature-based visualisations of the cluster structure.
  • Inspect cluster profiles, showing aggregate feature statistics for each group.
  • Browse cluster members, filtering by individual outlet names.
  • View feature distributions, to assess the variability of spatial indicators within and across clusters.

This tool is designed to support transparency and interpretation in computational analyses of local media, offering researchers a way to investigate emergent patterns without enforcing a predefined typology. By comparing across experiments, users can assess the robustness of clustering solutions and understand how methodological choices influence the identification of media outlet spatial typologies.

Experiment Comparison

This table shows the performance metrics for all 25 clustering experiments conducted in the study.

Select an Experiment

Click on a cell in the table below to select an experiment:


PCA Feature Contributions

Cluster Characteristics
Distribution of Features by Cluster

Clustering Approach

This app presents the results of clustering experiments applied to UK local news outlets. Each experiment groups outlets based on spatial characteristics of their coverage, derived from place mentions in news content.

Input Variables

  • Area: Total area covered by outlet's reported locations (sq km).
  • Radius: Average radius of coverage.
  • Districts: Number of unique local authority districts mentioned.
  • Entropy: Distributional uniformity of location mentions.
  • Gini: Spatial inequality index.
  • Moran's I: Spatial autocorrelation of coverage.
  • DistCV: Coefficient of variation in distances between locations.
  • Pct10km: Percentage of mentions within 10 km of outlet centroid.

Clustering Algorithms

  • K-means: Partitions outlets into clusters by minimizing within-cluster variance.
  • Agglomerative Hierarchical: Builds a tree of clusters using bottom-up merging.
  • Diana: A divisive hierarchical method starting with one large cluster split recursively.
  • HDBSCAN: A density-based clustering method that detects clusters of varying shapes and densities, and can identify outliers.

Experimental Design

Each experiment represents a unique combination of input variables and clustering algorithms, optionally preceded by PCA for dimensionality reduction. The goal is to explore how varying these conditions affects the structure of resulting clusters.