Abstract
This poster presents the current state of our work in progress on development and applica- tion of Inductive Venn-Abers Predictive Distribution framework.
As a sample task, we consider real-time household energy consumption forecasting prob- lem. Concretely, the machine learning problem is to predict evening consumption (at 18:00) based on the morning consumption (0:00–12:00) on the same day.
We use UCI public dataset on household power consumption (ECP) (1) to make pre- diction for the first 300 days. Only information from one particular household is used but in a fair real-time mode: in fair on-line mode: training on past days, not on the future.
The advantages of the method are following. First, outputting well-calibrated predic- tions that are valid in weak assumptions. Second, that this way of regression gives a rich prediction in the form of the whole predictive distribution. It can be converted to a confi- dence interval of any probability, with possibility of flexible choice of its location (lower or upper ray, the interval with a given centre, or of the smallest length). Third, easy adaption to non-linear dependencies.
In the previous poster (2) we also developed and demonstrated advantage of combining two ways of prediction by means of using bivariate isotonic regression as a merging tool.
The results are shown in Tab. 1. We merge two versions of k-Nearest-Neighbours algo- rithms with different value of k. Our evaluation criterion is Continuous Ranked Probability Score (CRPS) that is the integrated difference between CDF of the true and the predicted distributions. Like earlier in (2), the best quality is achieved in combining a relatively small (20) and large (100) values of the parameter.
The sample result for this setting is shown on Fig. 1-2. We present the predicted distri- butions as box-plots, the green part means 25% − 75% quantiles. The true labels are shown with black points. We present the plot in two versions: in the original chronological order, used for the on-line prediction, and re-ordered by median of the predictive distribution. In the second plot, the similar predictions are shown close to each other, so that the diversity of real labels can be visually compared to the predicted range.
As a sample task, we consider real-time household energy consumption forecasting prob- lem. Concretely, the machine learning problem is to predict evening consumption (at 18:00) based on the morning consumption (0:00–12:00) on the same day.
We use UCI public dataset on household power consumption (ECP) (1) to make pre- diction for the first 300 days. Only information from one particular household is used but in a fair real-time mode: in fair on-line mode: training on past days, not on the future.
The advantages of the method are following. First, outputting well-calibrated predic- tions that are valid in weak assumptions. Second, that this way of regression gives a rich prediction in the form of the whole predictive distribution. It can be converted to a confi- dence interval of any probability, with possibility of flexible choice of its location (lower or upper ray, the interval with a given centre, or of the smallest length). Third, easy adaption to non-linear dependencies.
In the previous poster (2) we also developed and demonstrated advantage of combining two ways of prediction by means of using bivariate isotonic regression as a merging tool.
The results are shown in Tab. 1. We merge two versions of k-Nearest-Neighbours algo- rithms with different value of k. Our evaluation criterion is Continuous Ranked Probability Score (CRPS) that is the integrated difference between CDF of the true and the predicted distributions. Like earlier in (2), the best quality is achieved in combining a relatively small (20) and large (100) values of the parameter.
The sample result for this setting is shown on Fig. 1-2. We present the predicted distri- butions as box-plots, the green part means 25% − 75% quantiles. The true labels are shown with black points. We present the plot in two versions: in the original chronological order, used for the on-line prediction, and re-ordered by median of the predictive distribution. In the second plot, the similar predictions are shown close to each other, so that the diversity of real labels can be visually compared to the predicted range.
Original language | English |
---|---|
Type | Poster |
Media of output | presented at conference |
Publisher | Proceedings of Machine Learning Research, 2022 Conformal and Probabilistic Prediction and Applications |
Number of pages | 2 |
Volume | 179 |
Publication status | Published - 26 Aug 2022 |