Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting a pyspiel game state to a dictionary of array-likes #1254

Open
kurtamohler opened this issue Jul 25, 2024 · 3 comments
Open

Converting a pyspiel game state to a dictionary of array-likes #1254

kurtamohler opened this issue Jul 25, 2024 · 3 comments
Labels
contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed!

Comments

@kurtamohler
Copy link

Would it be possible to convert a pyspiel game's State object to a dictionary of array-likes and back again, in an efficient way? If that is currently not supported, would it be possible to add this feature?

At the moment, it seems to me that this is not possible. pyspiel states are implemented in C++ and bound to Python with pybind11, and it doesn't look like any of the bound methods or properties provide a dict-of-arrays representation of the state.

I'm asking about this because I am looking into adding an environment wrapper class for OpenSpiel to TorchRL. Ideally, the wrapper would be stateless, so the state would need to be provided to the wrapper's step function as part of a TensorDict, which is a dictionary of array-likes.

Some other RL environment libraries support dict-of-arrays representations, like Brax and Jumanji. Just to give an example:

import jumanji
import jax
env = jumanji.make('Snake-v1')
key = jax.random.PRNGKey(0)
state, _ = env.reset(key)

def state_to_dict_of_arrays(state):
    res = {}
    for key, value in state.items():
        if hasattr(value, '_fields'):
            res[key] = {}
            for field in value._fields:
                res[key][field] = jax.numpy.asarray(value)
        else:
            res[key] = jax.numpy.asarray(value)
    
    return res

state_to_dict_of_arrays(state)
{'body': Array([[False, False, False, False, False, False, False, False, False,
         False, False, False],
         ...
        [False, False, False, False, False, False, False, False, False,
         False, False, False]], dtype=bool),
 'body_state': Array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
  'col': Array([2, 4], dtype=int32)},
 'length': Array(1, dtype=int32),
 'step_count': Array(0, dtype=int32),
 'action_mask': Array([ True,  True,  True,  True], dtype=bool),
 'key': Array([2467461003,  428148500], dtype=uint32)}
@lanctot
Copy link
Collaborator

lanctot commented Jul 28, 2024

That would be a great feature to have. You are correct: it does not currently exist. We don't have the time to add this ourselves, but it would make a welcome contribution to the code base!

@lanctot lanctot added the contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed! label Jul 28, 2024
Ayah-Saleh added a commit to Ayah-Saleh/open_spiel that referenced this issue Sep 26, 2024
@lanctot
Copy link
Collaborator

lanctot commented Sep 27, 2024

Someone has submitted an implementation: #1279

Can I ask a quick question about the technical path forward.

My understanding from glancing over a few threads is that the use case for this is so that OpenSpiel environments can more easily be used by other RL frameworks that use array-of-dicts representation.

However, wouldn't it be better to do this via our observer framework rather than directly over pyspiel states? The pyspiel states contain everything, including information that would be hidden to RL agents, whereas the observer is designed to exposes exactly the information that the RL agents should see.

I just want to figure out this design choice before we get too far into core API additions, but I definitely want to support integration with RL frameworks.

@elkhrt any opinions on this?

@elkhrt
Copy link
Member

elkhrt commented Sep 27, 2024

I'm not completely clear what's needed here. If you want a structured view of the state from the point-of-view of a player, then we have the interfaces for that, but only a handful of games have implemented it.

import pyspiel
from open_spiel.python import observation
import random

game = pyspiel.load_game("leduc_poker")
state = game.new_initial_state()
obs = observation.make_observation(game)

while not state.is_terminal():
  if state.current_player() >= 0:
    obs.set_from(state, state.current_player())
    print(state.current_player(), obs.dict)
  state.apply_action(random.choice(state.legal_actions()))

This emits something like:

0 {'player': array([1., 0.], dtype=float32), 'private_card': array([1., 0., 0., 0., 0., 0.], dtype=float32), 'community_card': array([0., 0., 0., 0., 0., 0.], dtype=float32), 'pot_contribution': array([1., 1.], dtype=float32)}
1 {'player': array([0., 1.], dtype=float32), 'private_card': array([0., 0., 0., 1., 0., 0.], dtype=float32), 'community_card': array([0., 0., 0., 0., 0., 0.], dtype=float32), 'pot_contribution': array([1., 1.], dtype=float32)}
0 {'player': array([1., 0.], dtype=float32), 'private_card': array([1., 0., 0., 0., 0., 0.], dtype=float32), 'community_card': array([0., 0., 0., 0., 1., 0.], dtype=float32), 'pot_contribution': array([1., 1.], dtype=float32)}
1 {'player': array([0., 1.], dtype=float32), 'private_card': array([0., 0., 0., 1., 0., 0.], dtype=float32), 'community_card': array([0., 0., 0., 0., 1., 0.], dtype=float32), 'pot_contribution': array([1., 1.], dtype=float32)}
0 {'player': array([1., 0.], dtype=float32), 'private_card': array([1., 0., 0., 0., 0., 0.], dtype=float32), 'community_card': array([0., 0., 0., 0., 1., 0.], dtype=float32), 'pot_contribution': array([1., 5.], dtype=float32)}
1 {'player': array([0., 1.], dtype=float32), 'private_card': array([0., 0., 0., 1., 0., 0.], dtype=float32), 'community_card': array([0., 0., 0., 0., 1., 0.], dtype=float32), 'pot_contribution': array([9., 5.], dtype=float32)}

If the game doesn't support these structured observations, you'll just get a single tensor, e.g. tiny_hanabi:

0 {'observation': array([0., 1., 0., 0., 0., 0., 0., 0.], dtype=float32)}
1 {'observation': array([0., 1., 1., 0., 0., 0., 0., 0.], dtype=float32)}

Is the idea to add things like the action_mask to this? If so, then I suggest the right thing to do is to add a flag to _Observation which is passed through make_observation and which adds the extra fields to the dict and updates them in set_from. It could be mildly more efficient to do some of this in the C++ layer, but probably no big deal.

self.dict[tensor_info.name] = values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed!
Projects
None yet
Development

No branches or pull requests

3 participants