IRIS - Res&Arch Institutional Research Information System - Research & Archive

There is growing interest in a data integration approach to survey sampling, particularly where population registers are linked for sampling and subsequent analysis. The reason for doing this is simple: it is only by linking the same indi- viduals in the different sources that it becomes possible to create a data set suitable for analysis. But data linkage is not error free. Many linkages are non- deterministic, based on how likely a linking decision corresponds to a correct match, that is, it brings together the same individual in all sources. High qual- ity linking will ensure that the probability of this happening is high. Analysis of the linked data should take account of this additional source of error when this is not the case. This is especially true for secondary analysis carried out without access to the linking information, that is, the often confidential data that agencies use in their record matching. We describe an inferential frame- work that allows for linkage errors when sampling from linked registers. After first reviewing current research activity in this area, we focus on secondary analysis and linear regression modeling, including the important special case of estimation of subpopulation and small area means. In doing so we consider both robustness and efficiency of the resulting linked data inferences.

Robust regression using probabilistically linked data

Chambers R. L.;Fabrizi E.;Ranalli M. G.;Salvati N.;Wang S.

2023

Abstract

There is growing interest in a data integration approach to survey sampling, particularly where population registers are linked for sampling and subsequent analysis. The reason for doing this is simple: it is only by linking the same indi- viduals in the different sources that it becomes possible to create a data set suitable for analysis. But data linkage is not error free. Many linkages are non- deterministic, based on how likely a linking decision corresponds to a correct match, that is, it brings together the same individual in all sources. High qual- ity linking will ensure that the probability of this happening is high. Analysis of the linked data should take account of this additional source of error when this is not the case. This is especially true for secondary analysis carried out without access to the linking information, that is, the often confidential data that agencies use in their record matching. We describe an inferential frame- work that allows for linkage errors when sampling from linked registers. After first reviewing current research activity in this area, we focus on secondary analysis and linear regression modeling, including the important special case of estimation of subpopulation and small area means. In doing so we consider both robustness and efficiency of the resulting linked data inferences.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista su cui è pubblicata l'opera
	
				WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
WIREs Computational Stats - 2022 - Chambers - Robust regression using probabilistically linked data (1).pdf accesso aperto Descrizione: Articolo Tipologia di allegato: PDF-editoriale Licenza: Creative commons Dimensione 2.9 MB Formato Adobe PDF Visualizza/Apri	2.9 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1555394

Citazioni

ND

2

2

social impact