solve(policies,
horizon,
evaluate,
debug=False,
identical=False)
| source code
|
Exhaustive search to find optimal joint policy over the given
horizon
- Parameters:
evaluate (lambda ObservationPolicy: float) - function that takes this policy object and returns an expected
value
identical (bool) - if True , then assume that all agents use an
identical policy (default is False )
horizon (int)
policies (ObservationPolicy[])
- Returns: float
- the value of the best policy found
Warning:
side effect of setting all policies in list to the best one found.
If you don't like it, too bad.
|