Data compression with statistical guarantees
Duration: 44 mins 40 secs
Share this media item:
Embed this media item:
Embed this media item:
About this item
Description: |
Richardson, S
Monday 3rd July 2017 - 13:30 to 14:15 |
---|
Created: | 2017-07-21 09:59 |
---|---|
Collection: | Scalable inference; statistical, algorithmic, computational aspects |
Publisher: | Isaac Newton Institute |
Copyright: | Richardson, S |
Language: | eng (English) |
Abstract: | Joint talk with Daniel Ahfock (MRC Biostatistics Unit @ University of Cambridge)
The talk is concerned with translating recent ideas from computer science on probabilistic data-compression techniques into a statistical framework that can be ‘safely’ applied for speeding linear regression analyses for very larges sample sizes in bio-medicine. Our motivation is to facilitate the use of multivariate regression and model exploration in tall data sets, so that, for example, genetic association analyses carried out on hundreds of thousands of subjects can investigate multivariate effects for a set of explanatory features, rather than be restricted to one feature at a time associations for computational feasibility. Among the many approaches to dealing with tall data, probabilistic data compression techniques using random linear mapping, developed in the computer science community, so called sketching, are particularly suitable for linear regression problems. In the first part of the talk, we will present a hierarchical representation of sketching, which allows deriving statistical properties (distributional) of different sketching algorithms. In particular, we will discuss how the signal to noise ratio in the original data set is important for the choice of sketching algorithm. In the second part of the talk, we will further refine some of the approximation guarantees and consider iterative sketches. The talk will be illustrated on a genetic analysis of the link between a blood cell trait and the HLA region involving a sample of 130,000 people. http://arxiv.org/abs/1706.03665 |
---|
Available Formats
Format | Quality | Bitrate | Size | |||
---|---|---|---|---|---|---|
MPEG-4 Video | 640x360 | 1.94 Mbits/sec | 650.73 MB | View | Download | |
WebM | 640x360 | 552.83 kbits/sec | 180.93 MB | View | Download | |
iPod Video | 480x270 | 522.28 kbits/sec | 170.86 MB | View | Download | |
MP3 | 44100 Hz | 249.75 kbits/sec | 81.80 MB | Listen | Download | |
Auto * | (Allows browser to choose a format it supports) |