Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

Sondre Glimsdal; Ole-Christoffer Granmo

doi:10.1007/978-3-319-23868-5_22

Conference Papers Year : 2015

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

(1) , (1)

Sondre Glimsdal

Function : Author

University of Agder

Ole-Christoffer Granmo

Function : Author

University of Agder

Abstract

The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only told whether the optimal arm lies to the “left” or to the “right” of the one pulled, with the feedback being erroneous with probability 1 − p. Our novel scheme for this problem is based on a Bayesian representation of the solution space, and combines this representation with Thompson sampling to balance exploration against exploitation. Furthermore, we introduce the possibility of traitorous environments that lie about the direction of the optimal arm (adversarial learning problem). Empirical results show that our scheme deals with both traitorous and non-traitorous environments, significantly outperforming competing algorithms.

Keywords

N-Door Puzzle Multi-armed Bandit Problem Adversarial Learning Bayesian Learning Thompson Sampling

Domains

Computer Science [cs]

Fichier principal

978-3-319-23868-5_22_Chapter.pdf (264.19 Ko)

Origin	Files produced by the author(s)

Hal Ifip : Connect in order to contact the contributor

https://inria.hal.science/hal-01385366

Submitted on : Friday, October 21, 2016-11:43:13 AM

Last modification on : Friday, June 5, 2020-5:10:10 PM

Dates and versions

hal-01385366 , version 1 (21-10-2016)

Licence

Attribution

Identifiers

HAL Id : hal-01385366 , version 1
DOI : 10.1007/978-3-319-23868-5_22

Cite

Sondre Glimsdal, Ole-Christoffer Granmo. Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning. 11th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI 2015), Sep 2015, Bayonne, France. pp.307-317, ⟨10.1007/978-3-319-23868-5_22⟩. ⟨hal-01385366⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-WG IFIP-TC12 IFIP-AIAI IFIP-WG12-5 IFIP-AICT-458

44 View

108 Download

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share