Bandit Problems With Infinitely Many Arms

Berry, Donald AChen, Robert WZame, AlanHeath, David CShepp, Larry A2023-05-232023-05-2319972016-08-04https://repository.upenn.edu/handle/20.500.14332/47683We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n → ∞. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0, 1) and consider various extensions. In the uniform case we show that the best lower bound for the expected failure proportion is between √2/√n and 2/√n and we exhibit classes of strategies that achieve the latter.bandit problemssequential experimentationdynamic allocation of Bernoulli processesstaying with a winnerswitching with a loserApplied StatisticsBandit Problems With Infinitely Many ArmsArticle