Policy subclass that looks a fixed number of turns into the future and
examines the expected reward received in response to the actions of other
agents
|
|
|
|
|
|
|
|
|
|
|
actionValue(self,
state,
actStruct,
debug=None,
horizon=-1)
Returns:
expected value of performing action |
source code
|
|
dict
|
evaluateChoices(self,
state,
choices=[ ] ,
debug=None,
horizon=-1)
Evaluates the expected reward of a set of possible actions |
source code
|
|
|
|
dict
|
findBest(self,
state,
choices=[ ] ,
debug=None,
horizon=-1,
explain=False)
Determines the option with the highest expected reward |
source code
|
|
|
|
|
setHorizon(self,
horizon=1)
Sets the default horizon of lookahead (which can still be overridden
by a method call argument |
source code
|
|