Yatzy RL environment

Yatzy program to play with and get policy and probabilities

Features

single player setting to start with
encoding state
transition function for rules
checking terminal state and computing reward
actions to roll dice and select categories
value iteration to compute value function / policy
scorecard - which categories are used
check if category can be used

Getting probabilities

Value iteration framework naturally gives us expected values (average scores), but if we want the full probability distribution of outcomes, we have a couple of options:

Extend our existing transition function - Since we’re already building transitions (state → action → next state), we can track probabilities through these transitions. This is essentially using the Markov chain structure we’re already creating.
Monte Carlo simulation - Once we have our optimal policy, we can run many simulations and collect the distribution of final scores empirically.

The first approach is more exact but computationally intensive. The second is simpler to implement and often sufficient.

I’ve decided on following state representation

Current dice values - a sorted tuple (1, 2, 3, 3, 5)
Rolls left - int - determines whether we can reroll or must score
Filled categories - frozenset of strings - determines which scoring options are still available

Value function could be a dict:

# V = {state: value}
state = (dice, rolls, categories)
value = V[state]

Plan to Start:

Define the 15 category names (as strings for frozenset)
Write scoring functions for each category
Implement dice rolling and state transitions
Define valid actions (which dice to keep, which category to score)
Build value iteration algorithm
Test and iterate

categories

frozenset({'chance',
           'fives',
           'four_same',
           'fours',
           'full_house',
           'large_straight',
           'one_pair',
           'ones',
           'sixes',
           'small_straight',
           'three_same',
           'threes',
           'two_pairs',
           'twos',
           'yatzy'})

# dice values
dice_vals = [5, 4, 3, 2, 1]
sorted_vals = tuple(sorted(dice_vals))
sorted_vals

(1, 2, 3, 4, 5)

score = sum(sorted_vals)
score

random.choices([1,2,3,4,5,6], k=5)

[3, 1, 6, 3, 4]

source

roll_n_dice

 roll_n_dice (n)

roll_n_dice(5)

[2, 1, 4, 2, 5]

Planning more about how to represent state and actions

Okay, thinking about actions.

I would have 2 rolls left.

I can choose what dices to hold and which to re-roll. So basically 5 long binary word.

I can also choose small straight category, but no chance as its taken already.

I can also choose to set zero to any category that I’ve not taken already.

so action could be like

action = [00000, "small_straight", None]
# or
action = [11000, None, None]
# or
action = [00000, None, "ones"]

I need to have a function to compute possible actions. * allow re-roll if “rolls left” is > 0 * check which categories can be selected * which categories could be set to zero

Back to implementing

source

State

 State (dice_values:tuple[int,...], rolls_left:int,
        categories_picked:frozenset[str])

state = State((1, 2, 3, 4, 5), 2, frozenset({'chance', 'large_straight', 'yatzy'}))
state

State(dice_values=(1, 2, 3, 4, 5), rolls_left=2, categories_picked=frozenset({'chance', 'large_straight', 'yatzy'}))

source

Action

 Action (type:Literal['reroll','score'], value:tuple[int,...]|str)

Action("reroll", (0,0,1,0,0))

Action(type='reroll', value=(0, 0, 1, 0, 0))

Action("score", "ones")

Action(type='score', value='ones')

reroll_masks

[(0, 0, 0, 0, 0),
 (0, 0, 0, 0, 1),
 (0, 0, 0, 1, 0),
 (0, 0, 0, 1, 1),
 (0, 0, 1, 0, 0),
 (0, 0, 1, 0, 1),
 (0, 0, 1, 1, 0),
 (0, 0, 1, 1, 1),
 (0, 1, 0, 0, 0),
 (0, 1, 0, 0, 1),
 (0, 1, 0, 1, 0),
 (0, 1, 0, 1, 1),
 (0, 1, 1, 0, 0),
 (0, 1, 1, 0, 1),
 (0, 1, 1, 1, 0),
 (0, 1, 1, 1, 1),
 (1, 0, 0, 0, 0),
 (1, 0, 0, 0, 1),
 (1, 0, 0, 1, 0),
 (1, 0, 0, 1, 1),
 (1, 0, 1, 0, 0),
 (1, 0, 1, 0, 1),
 (1, 0, 1, 1, 0),
 (1, 0, 1, 1, 1),
 (1, 1, 0, 0, 0),
 (1, 1, 0, 0, 1),
 (1, 1, 0, 1, 0),
 (1, 1, 0, 1, 1),
 (1, 1, 1, 0, 0),
 (1, 1, 1, 0, 1),
 (1, 1, 1, 1, 0),
 (1, 1, 1, 1, 1)]

available_actions = []
if state.rolls_left > 0:
    rerolls = [Action("reroll", mask) for mask in reroll_masks]
    available_actions.extend(rerolls)
available_actions

[Action(type='reroll', value=(0, 0, 0, 0, 0)),
 Action(type='reroll', value=(0, 0, 0, 0, 1)),
 Action(type='reroll', value=(0, 0, 0, 1, 0)),
 Action(type='reroll', value=(0, 0, 0, 1, 1)),
 Action(type='reroll', value=(0, 0, 1, 0, 0)),
 Action(type='reroll', value=(0, 0, 1, 0, 1)),
 Action(type='reroll', value=(0, 0, 1, 1, 0)),
 Action(type='reroll', value=(0, 0, 1, 1, 1)),
 Action(type='reroll', value=(0, 1, 0, 0, 0)),
 Action(type='reroll', value=(0, 1, 0, 0, 1)),
 Action(type='reroll', value=(0, 1, 0, 1, 0)),
 Action(type='reroll', value=(0, 1, 0, 1, 1)),
 Action(type='reroll', value=(0, 1, 1, 0, 0)),
 Action(type='reroll', value=(0, 1, 1, 0, 1)),
 Action(type='reroll', value=(0, 1, 1, 1, 0)),
 Action(type='reroll', value=(0, 1, 1, 1, 1)),
 Action(type='reroll', value=(1, 0, 0, 0, 0)),
 Action(type='reroll', value=(1, 0, 0, 0, 1)),
 Action(type='reroll', value=(1, 0, 0, 1, 0)),
 Action(type='reroll', value=(1, 0, 0, 1, 1)),
 Action(type='reroll', value=(1, 0, 1, 0, 0)),
 Action(type='reroll', value=(1, 0, 1, 0, 1)),
 Action(type='reroll', value=(1, 0, 1, 1, 0)),
 Action(type='reroll', value=(1, 0, 1, 1, 1)),
 Action(type='reroll', value=(1, 1, 0, 0, 0)),
 Action(type='reroll', value=(1, 1, 0, 0, 1)),
 Action(type='reroll', value=(1, 1, 0, 1, 0)),
 Action(type='reroll', value=(1, 1, 0, 1, 1)),
 Action(type='reroll', value=(1, 1, 1, 0, 0)),
 Action(type='reroll', value=(1, 1, 1, 0, 1)),
 Action(type='reroll', value=(1, 1, 1, 1, 0)),
 Action(type='reroll', value=(1, 1, 1, 1, 1))]

can_select = categories - state.categories_picked
can_select

frozenset({'fives',
           'four_same',
           'fours',
           'full_house',
           'one_pair',
           'ones',
           'sixes',
           'small_straight',
           'three_same',
           'threes',
           'two_pairs',
           'twos'})

source

score_ns

 score_ns (n, values)

source

score_n_same

 score_n_same (n, values)

source

n_same

 n_same (n, values)

source

score_two_pairs

 score_two_pairs (values)

source

score_full_house

 score_full_house (values)

source

score_large_straight

 score_large_straight (values)

source

score_small_straight

 score_small_straight (values)

for category in can_select:
    score = score_fn[category](state.dice_values)
    print(category, score)

four_same 0
three_same 0
threes 3
small_straight 15
fours 4
fives 5
sixes 0
ones 1
one_pair 0
two_pairs 0
full_house 0
twos 2

available_actions

[Action(type='reroll', value=(0, 0, 0, 0, 0)),
 Action(type='reroll', value=(0, 0, 0, 0, 1)),
 Action(type='reroll', value=(0, 0, 0, 1, 0)),
 Action(type='reroll', value=(0, 0, 0, 1, 1)),
 Action(type='reroll', value=(0, 0, 1, 0, 0)),
 Action(type='reroll', value=(0, 0, 1, 0, 1)),
 Action(type='reroll', value=(0, 0, 1, 1, 0)),
 Action(type='reroll', value=(0, 0, 1, 1, 1)),
 Action(type='reroll', value=(0, 1, 0, 0, 0)),
 Action(type='reroll', value=(0, 1, 0, 0, 1)),
 Action(type='reroll', value=(0, 1, 0, 1, 0)),
 Action(type='reroll', value=(0, 1, 0, 1, 1)),
 Action(type='reroll', value=(0, 1, 1, 0, 0)),
 Action(type='reroll', value=(0, 1, 1, 0, 1)),
 Action(type='reroll', value=(0, 1, 1, 1, 0)),
 Action(type='reroll', value=(0, 1, 1, 1, 1)),
 Action(type='reroll', value=(1, 0, 0, 0, 0)),
 Action(type='reroll', value=(1, 0, 0, 0, 1)),
 Action(type='reroll', value=(1, 0, 0, 1, 0)),
 Action(type='reroll', value=(1, 0, 0, 1, 1)),
 Action(type='reroll', value=(1, 0, 1, 0, 0)),
 Action(type='reroll', value=(1, 0, 1, 0, 1)),
 Action(type='reroll', value=(1, 0, 1, 1, 0)),
 Action(type='reroll', value=(1, 0, 1, 1, 1)),
 Action(type='reroll', value=(1, 1, 0, 0, 0)),
 Action(type='reroll', value=(1, 1, 0, 0, 1)),
 Action(type='reroll', value=(1, 1, 0, 1, 0)),
 Action(type='reroll', value=(1, 1, 0, 1, 1)),
 Action(type='reroll', value=(1, 1, 1, 0, 0)),
 Action(type='reroll', value=(1, 1, 1, 0, 1)),
 Action(type='reroll', value=(1, 1, 1, 1, 0)),
 Action(type='reroll', value=(1, 1, 1, 1, 1))]

for cat in can_select:
    score = score_fn[cat](state.dice_values)
    available_actions.append(Action(type="score", value=cat))
available_actions

[Action(type='reroll', value=(0, 0, 0, 0, 0)),
 Action(type='reroll', value=(0, 0, 0, 0, 1)),
 Action(type='reroll', value=(0, 0, 0, 1, 0)),
 Action(type='reroll', value=(0, 0, 0, 1, 1)),
 Action(type='reroll', value=(0, 0, 1, 0, 0)),
 Action(type='reroll', value=(0, 0, 1, 0, 1)),
 Action(type='reroll', value=(0, 0, 1, 1, 0)),
 Action(type='reroll', value=(0, 0, 1, 1, 1)),
 Action(type='reroll', value=(0, 1, 0, 0, 0)),
 Action(type='reroll', value=(0, 1, 0, 0, 1)),
 Action(type='reroll', value=(0, 1, 0, 1, 0)),
 Action(type='reroll', value=(0, 1, 0, 1, 1)),
 Action(type='reroll', value=(0, 1, 1, 0, 0)),
 Action(type='reroll', value=(0, 1, 1, 0, 1)),
 Action(type='reroll', value=(0, 1, 1, 1, 0)),
 Action(type='reroll', value=(0, 1, 1, 1, 1)),
 Action(type='reroll', value=(1, 0, 0, 0, 0)),
 Action(type='reroll', value=(1, 0, 0, 0, 1)),
 Action(type='reroll', value=(1, 0, 0, 1, 0)),
 Action(type='reroll', value=(1, 0, 0, 1, 1)),
 Action(type='reroll', value=(1, 0, 1, 0, 0)),
 Action(type='reroll', value=(1, 0, 1, 0, 1)),
 Action(type='reroll', value=(1, 0, 1, 1, 0)),
 Action(type='reroll', value=(1, 0, 1, 1, 1)),
 Action(type='reroll', value=(1, 1, 0, 0, 0)),
 Action(type='reroll', value=(1, 1, 0, 0, 1)),
 Action(type='reroll', value=(1, 1, 0, 1, 0)),
 Action(type='reroll', value=(1, 1, 0, 1, 1)),
 Action(type='reroll', value=(1, 1, 1, 0, 0)),
 Action(type='reroll', value=(1, 1, 1, 0, 1)),
 Action(type='reroll', value=(1, 1, 1, 1, 0)),
 Action(type='reroll', value=(1, 1, 1, 1, 1)),
 Action(type='score', value='four_same'),
 Action(type='score', value='three_same'),
 Action(type='score', value='threes'),
 Action(type='score', value='small_straight'),
 Action(type='score', value='fours'),
 Action(type='score', value='fives'),
 Action(type='score', value='sixes'),
 Action(type='score', value='ones'),
 Action(type='score', value='one_pair'),
 Action(type='score', value='two_pairs'),
 Action(type='score', value='full_house'),
 Action(type='score', value='twos')]

source

get_available_actions

 get_available_actions (state:__main__.State)

get_available_actions(state)

[Action(type='reroll', value=(0, 0, 0, 0, 0)),
 Action(type='reroll', value=(0, 0, 0, 0, 1)),
 Action(type='reroll', value=(0, 0, 0, 1, 0)),
 Action(type='reroll', value=(0, 0, 0, 1, 1)),
 Action(type='reroll', value=(0, 0, 1, 0, 0)),
 Action(type='reroll', value=(0, 0, 1, 0, 1)),
 Action(type='reroll', value=(0, 0, 1, 1, 0)),
 Action(type='reroll', value=(0, 0, 1, 1, 1)),
 Action(type='reroll', value=(0, 1, 0, 0, 0)),
 Action(type='reroll', value=(0, 1, 0, 0, 1)),
 Action(type='reroll', value=(0, 1, 0, 1, 0)),
 Action(type='reroll', value=(0, 1, 0, 1, 1)),
 Action(type='reroll', value=(0, 1, 1, 0, 0)),
 Action(type='reroll', value=(0, 1, 1, 0, 1)),
 Action(type='reroll', value=(0, 1, 1, 1, 0)),
 Action(type='reroll', value=(0, 1, 1, 1, 1)),
 Action(type='reroll', value=(1, 0, 0, 0, 0)),
 Action(type='reroll', value=(1, 0, 0, 0, 1)),
 Action(type='reroll', value=(1, 0, 0, 1, 0)),
 Action(type='reroll', value=(1, 0, 0, 1, 1)),
 Action(type='reroll', value=(1, 0, 1, 0, 0)),
 Action(type='reroll', value=(1, 0, 1, 0, 1)),
 Action(type='reroll', value=(1, 0, 1, 1, 0)),
 Action(type='reroll', value=(1, 0, 1, 1, 1)),
 Action(type='reroll', value=(1, 1, 0, 0, 0)),
 Action(type='reroll', value=(1, 1, 0, 0, 1)),
 Action(type='reroll', value=(1, 1, 0, 1, 0)),
 Action(type='reroll', value=(1, 1, 0, 1, 1)),
 Action(type='reroll', value=(1, 1, 1, 0, 0)),
 Action(type='reroll', value=(1, 1, 1, 0, 1)),
 Action(type='reroll', value=(1, 1, 1, 1, 0)),
 Action(type='reroll', value=(1, 1, 1, 1, 1)),
 Action(type='score', value='four_same'),
 Action(type='score', value='three_same'),
 Action(type='score', value='threes'),
 Action(type='score', value='small_straight'),
 Action(type='score', value='fours'),
 Action(type='score', value='fives'),
 Action(type='score', value='sixes'),
 Action(type='score', value='ones'),
 Action(type='score', value='one_pair'),
 Action(type='score', value='two_pairs'),
 Action(type='score', value='full_house'),
 Action(type='score', value='twos')]

What I have

state representation
action representation
categories
dice rolling
functions to check if dice values fit category
available actions function

What is missing * transition function * logic for reroll * logic for selecting zeroing categories

Transition function

state + action –> next state

next_state = transition_func(state, action)

What transitions are there?

Action types: * reroll * zero * score

reroll * reroll the dice according to mask * set new dice to state * decrement rolls left

zero * set the category as selected * check wheter categories left and set terminal state if not any left * reroll whole dice * reset rolls left to 2

score * compute score * set the category as selected * add score to scoreboard or total sum * check wheter categories left and set terminal state if not any left * reroll whole dice * reset rolls left to 2

Let’s have a state and action to work with.

state

State(dice_values=(1, 2, 3, 4, 5), rolls_left=2, categories_picked=frozenset({'chance', 'large_straight', 'yatzy'}))

action = Action(type='reroll', value=(1, 1, 1, 0, 0))
action

Action(type='reroll', value=(1, 1, 1, 0, 0))

What it would mean to transition to next state given this state and action?

source

reroll_masked

 reroll_masked (values, mask)

reroll_masked((1,2,3,4,5), (1,1,1,0,0))

(4, 4, 5, 5, 6)

source

transition_func

 transition_func (state:__main__.State, action:__main__.Action)

transition_func(state, action)

State(dice_values=(1, 2, 3, 4, 5), rolls_left=1, categories_picked=frozenset({'chance', 'large_straight', 'yatzy'}))

transition_func(state, Action("score", "ones"))

State(dice_values=(1, 2, 2, 6, 6), rolls_left=2, categories_picked=frozenset({'chance', 'ones', 'large_straight', 'yatzy'}))

state = State(sorted(roll_n_dice(5)), 2, frozenset()) while True: actions = get_available_actions(state) # for i, action in enumerate(actions): print(i, action) print(f”State: {state}, choose action (i) or quit (q)“) inp = input(”>“) if inp ==”q”: break

if inp.isdigit():
    action = Action("reroll", tuple(int(i) for i in inp))
else:
    action = Action("score", inp)

assert action in actions

# idx = int(inp)
state = transition_func(state, action)

!cat yatzy/yatzy.py

# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbs/yatzy_dup1.ipynb.

# %% auto 0
__all__ = ['categories', 'reroll_masks', 'checkers', 'roll_n_dice', 'State', 'Action', 'has_ns', 'n_same', 'has_n_same',
           'has_two_pairs', 'has_full_house', 'has_small_straight', 'has_large_straight', 'get_available_actions',
           'reroll_masked', 'transition_func']

# %% ../nbs/yatzy_dup1.ipynb 5
import random
from dataclasses import dataclass
from typing import Literal
import itertools
from collections import Counter
from functools import partial

# %% ../nbs/yatzy_dup1.ipynb 6
# categories

categories = frozenset([
    "ones",
    "twos",
    "threes",
    "fours",
    "fives",
    "sixes",
    "one_pair",
    "two_pairs",
    "three_same",
    "four_same",
    "small_straight",
    "large_straight",
    "full_house",
    "chance",
    "yatzy",
])

# %% ../nbs/yatzy_dup1.ipynb 11
def roll_n_dice(n):
    return random.choices([1,2,3,4,5,6], k=n)

# %% ../nbs/yatzy_dup1.ipynb 16
@dataclass
class State:
    dice_values: tuple[int, ...]
    rolls_left: int
    categories_picked: frozenset[str]

# %% ../nbs/yatzy_dup1.ipynb 18
@dataclass
class Action:
    type: Literal["reroll", "score", "zero"]
    value: tuple[int, ...] | str

# %% ../nbs/yatzy_dup1.ipynb 20
reroll_masks = list(itertools.product((0,1), repeat=5))

# %% ../nbs/yatzy_dup1.ipynb 33
def has_ns(n, values):
    return n in values

assert has_ns(3, (1,2,3,4,5))
assert not has_ns(6, (1,2,3,4,5))

def n_same(n, values):
    counts = Counter(values)
    return [k for k, v in counts.items() if v >= n]

assert n_same(2, (1,2,2,3,3)) == [2, 3]

def has_n_same(n, values):
    return bool(n_same(n, values))

def has_two_pairs(values):
    pairs = n_same(2, values)
    if len(pairs) == 2:
        return True
    return False

assert has_two_pairs((1,1,2,2,3))
assert not has_two_pairs((1,1,2,3,4))
assert not has_two_pairs((1,1,1,1,2))

def has_full_house(values):
    pairs = n_same(2, values)
    three_sames = n_same(3, values)
    if pairs and three_sames:
        pairs.remove(three_sames[0])
        return len(pairs) > 0
    return False

assert has_full_house((2,2,3,3,3))
assert not has_full_house((1,2,3,3,3))

def has_small_straight(values):
    return values == (1,2,3,4,5)

def has_large_straight(values):
    return values == (2,3,4,5,6)

# %% ../nbs/yatzy_dup1.ipynb 35
checkers = {
    "ones": partial(has_ns, 1),
    "twos": partial(has_ns, 2),
    "threes": partial(has_ns, 3),
    "fours": partial(has_ns, 4),
    "fives": partial(has_ns, 5),
    "sixes": partial(has_ns, 6),
    "one_pair": partial(has_n_same, 2),
    "two_pairs": has_two_pairs,
    "three_same": partial(has_n_same, 3),
    "four_same": partial(has_n_same, 4),
    "small_straight": has_small_straight,
    "large_straight": has_large_straight,
    "full_house": has_full_house,
    "chance": lambda x: x, # ???
    "yatzy": partial(n_same, 5),
}

# %% ../nbs/yatzy_dup1.ipynb 39
def get_available_actions(state: State) -> list[Action]:
    available_actions = []
    if state.rolls_left > 0:
        rerolls = [Action("reroll", mask) for mask in reroll_masks]
        available_actions.extend(rerolls)
    
    can_zero = categories - state.categories_picked
    for cat in can_zero:
        available_actions.append(Action(type="zero", value=cat))
        can_select = checkers[cat](state.dice_values)
        if can_select:
            available_actions.append(Action(type="score", value=cat))
    
    return available_actions

# %% ../nbs/yatzy_dup1.ipynb 46
def reroll_masked(values, mask):
    next_values = list(values)
    mask = action.value
    reroll = roll_n_dice(5)
    for i, bit in enumerate(mask):
        if bit == 1:
            next_values[i] = reroll[i]
    return tuple(sorted(next_values))

# %% ../nbs/yatzy_dup1.ipynb 48
def transition_func(state: State, action: Action):
    match action.type:
        case "reroll":
            assert state.rolls_left > 0
            new_values = reroll_masked(state.dice_values, action.value)
            return State(new_values, state.rolls_left - 1, state.categories_picked)
        case "score": ...
        case "zero":
            assert action.value not in state.categories_picked
            new_categories = state.categories_picked | {action.value}
            categories_left = categories - state.categories_picked
            if not categories_left: return TERMINAL_STATE
            new_values = tuple(sorted(roll_n_dice(5)))
            return State(new_values, 2, frozenset(new_categories))