On-line predictive linear regression

Vladimir Vovk; Ilia Nouretdinov; Alex Gammerman

On-line predictive linear regression

Vladimir Vovk, Ilia Nouretdinov, Alex Gammerman

Research output: Working paper

119 Downloads (Pure)

Abstract

We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the usual prediction intervals guarantee that the probability of error is equal to the nominal significance level $\epsilon$, but this property per se does not imply that the long-run frequency of error is close to$\epsilon$; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error does equal the nominal significance level, up to statistical fluctuations, and we describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.

Original language	English
Publication status	Published - 21 Nov 2005

Keywords

math.ST
stat.TH
62G08; 62J07

Access to Document

pdf

Cite this

@techreport{953ab431eb5146298d9cea40b0e88a58,

title = "On-line predictive linear regression",

abstract = "We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the usual prediction intervals guarantee that the probability of error is equal to the nominal significance level $\epsilon$, but this property per se does not imply that the long-run frequency of error is close to$\epsilon$; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error does equal the nominal significance level, up to statistical fluctuations, and we describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.",

keywords = "math.ST, stat.TH, 62G08; 62J07",

author = "Vladimir Vovk and Ilia Nouretdinov and Alex Gammerman",

note = "24 pages; 6 figures",

year = "2005",

month = nov,

day = "21",

language = "English",

type = "WorkingPaper",

}

TY - UNPB

T1 - On-line predictive linear regression

AU - Vovk, Vladimir

AU - Nouretdinov, Ilia

AU - Gammerman, Alex

N1 - 24 pages; 6 figures

PY - 2005/11/21

Y1 - 2005/11/21

N2 - We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the usual prediction intervals guarantee that the probability of error is equal to the nominal significance level $\epsilon$, but this property per se does not imply that the long-run frequency of error is close to$\epsilon$; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error does equal the nominal significance level, up to statistical fluctuations, and we describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.

AB - We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the usual prediction intervals guarantee that the probability of error is equal to the nominal significance level $\epsilon$, but this property per se does not imply that the long-run frequency of error is close to$\epsilon$; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error does equal the nominal significance level, up to statistical fluctuations, and we describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.

KW - math.ST

KW - stat.TH

KW - 62G08; 62J07

M3 - Working paper

BT - On-line predictive linear regression

ER -