Here is the bonus question from the final exam in my dynamic optimization class of last semester. It is based on problem 8 Chapter II in Ross’ book Introduction to stochastic dynamic programming. It appears there as `guess the optimal policy’ without asking for proof. The question seems very natural, but I couldn’t find any information about it (nor apparently the students). I have a solution but it is tricky and too specific to this problem. I will describe my solution next week but perhaps somebody can show me a better solution or a reference ?
We have two coins, a red one and a green one. When flipped, one lands heads with probability P1 and the other with probability P2. We do not know which coin is the P1 coin. We initially attach probability p to the red coin being the P1 coin. We receive one dollar for each heads and our objective is to maximize the total expected discounted return with discount factor β. Describe the optimal policy, including a proof of optimality.
6 comments
October 28, 2014 at 11:20 pm
Jyotirmoy Bhattacharya
Isn’t this a multi-armed bandit problem to which we can apply Gittins’s Index Theorem? [Gittins et al, ‘Multi Armed Bandit Allocation Indices’, Wiley, 2011]
October 29, 2014 at 1:37 pm
Dan
This is not a multi-armed bandit problem because there is the two arms have the same state
October 30, 2014 at 1:01 am
Anonymous
I do not see it that way. With an arm for each coin, the state of the arm is the outcome of the next toss of that coin. If q_i is the probability of heads for coin i, then because of independence, from any starting state the next state is H with probability q_i and tails with probability 1-q_i.
October 30, 2014 at 1:02 am
Jyotirmoy
The comment at 1:01 a.m. is from me. Couldn’t log in.
October 31, 2014 at 8:02 am
Anonymous
No it’s not it’s from me
November 8, 2014 at 10:33 am
Red coin, green coin | The Leisure of the Theory Class
[…] Here is the question I posted last week […]