Measures of Utility for Synthetic Data

60 mins, 421.37 MB, WebM 640x360, 29.97 fps, 44100 Hz, 958.86 kbits/sec

Share this media item:

Embed this media item:

<iframe width="_width_" height="_height_" src="https://sms.cam.ac.uk/media/2357410/embed" frameborder="0" scrolling="no" allowfullscreen></iframe>

Choose size:

About this item

Available Formats

About this item

Description:	Raab, G (University of Edinburgh, University of Edinburgh) Thursday 3rd November 2016 - 15:30 to 16:30


Created:	2016-11-11 15:50
Collection:	Data Linkage and Anonymisation
Publisher:	Isaac Newton Institute
Copyright:	Raab, G
Language:	eng (English)


Abstract:	When synthetic data are produced to overcome potential disclosure they can be used either in place of the original data or, more commonly, to allow researchers to develop code that will ultimately be run on the original data. The utility of synthetic data can be measured by comparing the results of the final analysis with the synthetic and original data. This is not possible until the final analysis is complete. General utility measures that measure the overall differences between the original and synthetic data are more useful for those creating synthetic data. This presentation will discuss two such >measures. The first is a propensity score measure originally proposed by Woo et. al., 2009 and the second is one based on comparing tables, suggested by Voas and Williamson, 2001. Their null distributions, when the synthesis model is "correct" will be discussed as well as their practical implementation as part of the synthpop package.

Abstract:

When synthetic data are produced to overcome potential disclosure they can be used either in place of the original data or, more commonly, to allow researchers to develop code that will ultimately be run on the original data. The utility of synthetic data can be measured by comparing the results of the final analysis with the synthetic and original data. This is not possible until the final analysis is complete. General utility measures that measure the overall differences between the original and synthetic data are more useful for those creating synthetic data. This presentation will discuss two such >measures. The first is a propensity score measure originally proposed by Woo et. al., 2009 and the second is one based on comparing tables, suggested by Voas and Williamson, 2001. Their null distributions, when the synthesis model is "correct" will be discussed as well as their practical implementation as part of the synthpop package.

Available Formats

Format	Quality	Bitrate	Size
MPEG-4 Video	640x360	1.93 Mbits/sec	870.80 MB	View	Download
WebM *	640x360	958.86 kbits/sec	421.37 MB	View	Download
iPod Video	480x270	496.67 kbits/sec	218.26 MB	View	Download
MP3	44100 Hz	253.11 kbits/sec	111.23 MB	Listen	Download
Auto	(Allows browser to choose a format it supports)