Here is the bonus question from the final exam in my dynamic optimization class of last semester. It is based on problem 8 Chapter II in Ross’ book Introduction to stochastic dynamic programming. It appears there as `guess the optimal policy’ without asking for proof. The question seems very natural, but I couldn’t find any information about it (nor apparently the students). I have a solution but it is tricky and too specific to this problem. I will describe my solution next week but perhaps somebody can show me a better solution or a reference ?


We have two coins, a red one and a green one. When flipped, one lands heads with probability P1 and the other with probability P2. We do not know which coin is the P1 coin. We initially attach probability p to the red coin being the P1 coin. We receive one dollar for each heads and our objective is to maximize the total expected discounted return with discount factor β. Describe the optimal policy, including a proof of optimality.