Data compression with statistical guarantees

Duration: 44 mins 40 secs
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Richardson, S
Monday 3rd July 2017 - 13:30 to 14:15
 
Created: 2017-07-21 09:59
Collection: Scalable inference; statistical, algorithmic, computational aspects
Publisher: Isaac Newton Institute
Copyright: Richardson, S
Language: eng (English)
 
Abstract: Joint talk with Daniel Ahfock (MRC Biostatistics Unit @ University of Cambridge)

The talk is concerned with translating recent ideas from computer science on probabilistic data-compression techniques into a statistical framework that can be ‘safely’ applied for speeding linear regression analyses for very larges sample sizes in bio-medicine.

Our motivation is to facilitate the use of multivariate regression and model exploration in tall data sets, so that, for example, genetic association analyses carried out on hundreds of thousands of subjects can investigate multivariate effects for a set of explanatory features, rather than be restricted to one feature at a time associations for computational feasibility.

Among the many approaches to dealing with tall data, probabilistic data compression techniques using random linear mapping, developed in the computer science community, so called sketching, are particularly suitable for linear regression problems. In the first part of the talk, we will present a hierarchical representation of sketching, which allows deriving statistical properties (distributional) of different sketching algorithms. In particular, we will discuss how the signal to noise ratio in the original data set is important for the choice of sketching algorithm. In the second part of the talk, we will further refine some of the approximation guarantees and consider iterative sketches. The talk will be illustrated on a genetic analysis of the link between a blood cell trait and the HLA region involving a sample of 130,000 people.

http://arxiv.org/abs/1706.03665
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.94 Mbits/sec 650.73 MB View Download
WebM 640x360    552.83 kbits/sec 180.93 MB View Download
iPod Video 480x270    522.28 kbits/sec 170.86 MB View Download
MP3 44100 Hz 249.75 kbits/sec 81.80 MB Listen Download
Auto * (Allows browser to choose a format it supports)