Transposably invariant sample reuse: the pigeonhole bootstrap and blockwise cross-validation
Duration: 59 mins 2 secs
Share this media item:
Embed this media item:
Embed this media item:
About this item
Description: |
Owen, A (Stanford)
Wednesday 25 June 2008, 11:30-12:30 Future Directions in High-Dimensional Data Analysis |
---|
Created: | 2008-07-08 15:23 | ||
---|---|---|---|
Collection: | Statistical Theory and Methods for Complex, High-Dimensional Data | ||
Publisher: | Isaac Newton Institute | ||
Copyright: | Owen, A | ||
Language: | eng (English) | ||
Distribution: |
World
![]() |
||
Credits: |
|
||
Explicit content: | No | ||
Aspect Ratio: | 4:3 | ||
Screencast: | No | ||
Bumper: | /sms-ingest/static/new-4x3-bumper.dv | ||
Trailer: | /sms-ingest/static/new-4x3-trailer.dv |
Abstract: | Sample reuse methods like the bootstrap and cross-validation are widely used in statistics and machine learning. They provide measures of accuracy with some face value validity that is not dependent on strong model assumptions.
These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables. This talk looks at bootstrap and cross-validation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices. Similarly we look at a method of cross-validation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a non-negative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case. Related Links * http://stat.stanford.edu/~owen/reports - Page with research articles * http://stat.stanford.edu/~owen/reports/cvsvd.pdf - Technical report on bi-cross-validation (in revision) * http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&page=toc&handle=euclid.aoas/1196438015 - AOAS page with link to Pigeonhole boostrap paper |
---|
Available Formats
Format | Quality | Bitrate | Size | |||
---|---|---|---|---|---|---|
MPEG-4 Video | 480x360 | 1.84 Mbits/sec | 816.26 MB | View | Download | |
WebM | 480x360 | 580.22 kbits/sec | 250.94 MB | View | Download | |
Flash Video | 480x360 | 806.5 kbits/sec | 349.30 MB | View | Download | |
iPod Video | 480x360 | 505.21 kbits/sec | 218.81 MB | View | Download | |
QuickTime | 384x288 | 848.36 kbits/sec | 367.43 MB | View | Download | |
MP3 | 44100 Hz | 125.02 kbits/sec | 53.93 MB | Listen | Download | |
Auto * | (Allows browser to choose a format it supports) |