Census Quality Framework
Data quality assessment and weighting methodology for the Italian Integrated System of Registers (SIR) based on the Austrian Framework.
Modern official statistics are shifting from traditional censuses to Register-based Censuses. This transition requires robust tools to evaluate the effectiveness and accuracy of administrative data. This project, conducted at ISTAT, focused on adapting the Austrian Quality Framework to the Italian context, specifically developing objective mathematical methods to aggregate quality indicators.
Objectives
- Assess the reliability of administrative registers by adapting a multi-phase quality framework.
- Compare and implement objective weighting strategies for composite quality indicators.
My role
- Conducted a comparative analysis of four aggregation methodologies: Wroclaw Taxonomic Method, Mazziotta-Pareto Index, Mean-Min Function, and PCA.
- Developed R scripts for data standardization, normalization, and indicator weighting.
- Validated the framework on the Italian “Base Register of Individuals” (RBI) for the “Sex” variable.
- Identified the most reliable sources for the Permanent Census based on objective quality scores.
Tech Stack
| Language | R |
|---|---|
| Methodology | Wroclaw Taxonomic Method, Mazziotta-Pareto, PCA, Mean-Min |
| Context | Official Statistics |
The Challenge: Beyond Arithmetic Means
In official statistics, register quality is multi-faceted. Standard assessments often use simple arithmetic means, treating all metadata and data dimensions as equally important. However, in a complex system like the Italian SIR, the relevance of documentation might differ significantly from pre-processing accuracy.
The project analyzed 11 registers across four mathematical models to determine which could best handle the “Legal Marital Status” (LMS) variable. The goal was to reach a final Quality Indicator ($q_{.j}$) that truly reflects the reliability of each source.
Methodology & Code
The pipeline utilized a modular approach to compare aggregation strategies:
- Normalization: All raw indicators were converted into 0-1 indices to ensure comparability across different units of measurement.
- Wroclaw Taxonomic Method (WTM): A distance-based approach measuring how far each register is from a “theoretical ideal” unit.
- Mazziotta-Pareto Index (MPI): A method that penalizes variability among sub-indicators, identifying “unbalanced” registers.
- PCA Analysis: A synthesis method extracting weights from the first principal component (capturing ~70-75% of total variance).
The analysis demonstrated that while PCA is powerful, it often requires forced post-normalization. The Wroclaw Method proved superior for ranking registers as it maintains the context of theoretical maximums without losing information.
Results: Defining the “Gold Standard”
The final phase applied the Wroclaw Taxonomic Method to the Italian Integrated System of Registers (SIR) for the variable “Sex”, comparing six primary sources, including Municipal Registries (LAC), the Tax Agency (AT), and Social Security (INPS).
By aggregating sub-dimensions such as Clarity, Punctuality, and Timeliness, the analysis revealed:
- Top Performer: The LAC (Municipal Registries) achieved a perfect score of 1.000, confirming its role as the primary quality standard.
- Critical Sources: Specific registers like INPSDMAG and ISCRNAS showed lower reliability scores, suggesting they should be used as secondary or comparative sources.
This objective ranking provides ISTAT with a reproducible metric to prioritize data sources, ensuring the Permanent Census is built on the most accurate data available.
Note: This project utilized the methodologies described in the 2016 Austrian Journal of Statistics and ISTAT technical reports.