Transposably invariant sample reuse: the pigeonhole bootstrap and blockwise cross-validation

Duration: 59 mins 2 secs
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Owen, A (Stanford)
Wednesday 25 June 2008, 11:30-12:30
Future Directions in High-Dimensional Data Analysis
 
Created: 2008-07-08 15:23
Collection: Statistical Theory and Methods for Complex, High-Dimensional Data
Publisher: Isaac Newton Institute
Copyright: Owen, A
Language: eng (English)
Distribution: World     (downloadable)
Credits:
Author:  Owen, A
Explicit content: No
Aspect Ratio: 4:3
Screencast: No
Bumper: /sms-ingest/static/new-4x3-bumper.dv
Trailer: /sms-ingest/static/new-4x3-trailer.dv
 
Abstract: Sample reuse methods like the bootstrap and cross-validation are widely used in statistics and machine learning. They provide measures of accuracy with some face value validity that is not dependent on strong model assumptions.

These methods depend on repeating or omitting cases, while keeping all the variables in those cases. But for many data sets, it is not obvious whether the rows are cases and colunns are variables, or vice versa. For example, with movie ratings organized by movie and customer, both movie and customer IDs can be thought of as variables.

This talk looks at bootstrap and cross-validation methods that treat rows and columns of the matrix symmetrically. We get the same answer on X as on X'. McCullagh has proved that no exact bootstrap exists in a certain framework of this type (crossed random effects). We show that a method based on resampling both rows and columns of the data matrix tracks the true error, for some simple statistics applied to large data matrices.

Similarly we look at a method of cross-validation that leaves out blocks of the data matrix, generalizing a proposal due to Gabriel that is used in the crop science literature. We find empirically that this approach provides a good way to choose the number of terms in a truncated SVD model or a non-negative matrix factorization. We also apply some recent results in random matrix theory to the truncated SVD case.

Related Links

* http://stat.stanford.edu/~owen/reports - Page with research articles
* http://stat.stanford.edu/~owen/reports/cvsvd.pdf - Technical report on bi-cross-validation (in revision)
* http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&page=toc&handle=euclid.aoas/1196438015 - AOAS page with link to Pigeonhole boostrap paper
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 480x360    1.84 Mbits/sec 816.26 MB View Download
WebM 480x360    580.22 kbits/sec 250.94 MB View Download
Flash Video 480x360    806.5 kbits/sec 349.30 MB View Download
iPod Video 480x360    505.21 kbits/sec 218.81 MB View Download
QuickTime 384x288    848.36 kbits/sec 367.43 MB View Download
MP3 44100 Hz 125.02 kbits/sec 53.93 MB Listen Download
Auto * (Allows browser to choose a format it supports)