Stay with me: Lifetime maximization through heteroscedastic linear bandits with reneging

Ping Chun Hsieh*, Xi Liu, Anirban Bhattacharya, P. R. Kumar

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a "reneging" phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct "satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves 0(vT(log(T))3) regret. Finally, we validate the performance of HR-UCB via simulations.

Original languageEnglish
Title of host publication36th International Conference on Machine Learning, ICML 2019
PublisherInternational Machine Learning Society (IMLS)
Pages4957-4966
Number of pages10
ISBN (Electronic)9781510886988
StatePublished - 1 Jan 2019
Event36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States
Duration: 9 Jun 201915 Jun 2019

Publication series

Name36th International Conference on Machine Learning, ICML 2019
Volume2019-June

Conference

Conference36th International Conference on Machine Learning, ICML 2019
CountryUnited States
CityLong Beach
Period9/06/1915/06/19

Fingerprint Dive into the research topics of 'Stay with me: Lifetime maximization through heteroscedastic linear bandits with reneging'. Together they form a unique fingerprint.

Cite this