Package teamwork :: Package policy :: Module policyTable :: Class PolicyTable
[hide private]
[frames] | no frames]

Class PolicyTable

source code

             generic.Policy --+    
                              |    
LookaheadPolicy.LookaheadPolicy --+
                                  |
              pwlTable.PWLTable --+
                                  |
                                 PolicyTable

Super-nifty class for representing policies as tables and using policy iteration to optimize them.

Instance Methods [hide private]
 
__init__(self, entity, actions=[], horizon=1)
Same arguments used by constructor for LookaheadPolicy superclass
source code
Action[]
execute(self, state, observations={}, history=None, choices=[], index=None, debug=False, explain=False, entities={}, cache={})
Applies this policy to the given state and observation history
source code
Action[]
default(self, choices, state, observations)
Generates a default RHS, presumably with minimal effort.
source code
 
_defaultRandom(self, choices, state, observations)
Default RHS is a random choice
source code
 
_defaultGreedy(self, choices, state, observations)
Default RHS is the optimal action over a one-step time horizon
source code
 
generateObservations(self, remaining=None, result=None) source code
str[]
getLookahead(self)
Returns: the turn sequence used for the forward projection
source code
str→PolicyTable
getPolicies(self)
Returns: a dictionary of the policies of all of the agents in this entity's lookahed
source code
 
solve(self, horizon=None, choices=None, debug=False, policies=None, interrupt=None, search='exhaustive', progress=None) source code
 
iterate(self, choices, policies, state, recurse=False, debug=False, interrupt=None)
Exhaustive policy search
source code
bool
perturb(self, policies, interrupt=None, debug=False)
Consider a random perturbation of this policy
source code
bool
parallelSolve(self, policies, interrupt=None, progress=None, debug=False)
Generates an abstract state space and does value iteration to generate a policy when agents all act in parallel, each with its own LHS
source code
bool
abstractSolve(self, policies, interrupt=None, progress=None)
Generates an abstract state space (defined by the LHS attributes) and does value iteration to generate a policy
source code
 
abstractTransition(self, policies, interrupt=None, progress=None)
Generates a transition probability function over the abstract state state space (defined by the LHS attributes)
source code
float
abstractIterate(self, policies, rewards, interrupt)
One pass of value iteration over abstract state space.
source code
 
abstractReward(self, intervals, goals, tree, interrupt=None) source code
 
oldReachable(self, choices, policies, state, observations, debug=False) source code
float
evaluate(self, policies, state, observations, history=None, debug=False, fixed=True, start=0, details=False)
Computes the expected value of this policy in response to the given policies for the other agents
source code
Action[]
localSolve(self, policies, state, observations, update=False, debug=False)
Determines the best action out of the available options, given the current state and observation history, and while holding fixed the expected policies of the other agents.
source code
 
expectedValue(self, state, action, goals=None, debug=False) source code
(KeyedVector,dict:str→Action[],int)
chooseRule(self)
Generates a random state and observation history and finds the rule corresponding to them
source code
dict[]
abstract(self, index)
Returns: the abstract state subspace where the given rule is applicable, in the form of a list of intervals, one for each attribute, where each interval is a dictionary with keys weights, index, lo, and hi
source code
 
fromIndex(self, index, choices=None)
Fills in the rules using the given number as an n-ary representation of the RHS values (where n is the number of possible RHS values)
source code
int
toIndex(self, choices=None)
Returns: the n-ary representation of the RHS values (where n is the number of possible RHS values)
source code
 
updateAttributes(self, actor, attributes, diffs, leaves) source code
 
importTable(self, table)
Takes the given table and uses it to set the LHS and RHS of this policy (making sure that the RHS refers to my entity instead)
source code
 
generateLHS(self, horizon=None, choices=None, debug=False) source code
 
OLDgenerateLHS(self, horizon=None, choices=None, debug=False) source code
 
__copy__(self) source code
 
__xml__(self) source code
 
parse(self, element) source code

Inherited from LookaheadPolicy.LookaheadPolicy: __contains__, __str__, actionValue, evaluateChoices, findBest, setHorizon

Inherited from pwlTable.PWLTable: OLDfactored2index, __add__, __getitem__, __len__, __mul__, addAttribute, consistentp, copy, delAttribute, factorString, factored2index, fromTree, getTable, index, index2factored, initialize, mapIndex, max, mergeZero, prune, pruneAttributes, pruneRules, reset, star, subIndex, valueString

Class Variables [hide private]
str seedMethod = 'greedy'
the method to use to generate the initial RHS values to seed policy iteration.
Instance Variables [hide private]
Action[][] rules
table of RHS, in dictionary form, indexed by row number

Inherited from LookaheadPolicy.LookaheadPolicy: consistentTieBreaking, horizon

Inherited from pwlTable.PWLTable: attributes, values, zeroPlanes

Inherited from pwlTable.PWLTable (private): _attributes, _consistency

Method Details [hide private]

__init__(self, entity, actions=[], horizon=1)
(Constructor)

source code 

Same arguments used by constructor for LookaheadPolicy superclass

Parameters:
  • entity - the entity whose policy this is (not sure whether this is necessary)
  • actions - the options considered by this policy (used by superclass)
  • horizon - the lookahead horizon
Overrides: pwlTable.PWLTable.__init__

execute(self, state, observations={}, history=None, choices=[], index=None, debug=False, explain=False, entities={}, cache={})

source code 

Applies this policy to the given state and observation history

Parameters:
  • state (KeyedVector) - the current state
  • observations (dict:str→Action[]) - the current observation history
  • history (dict) - dictionary of actions that have been performed so far
  • entities - a dictionary of entities to be used as a value tree cache
  • cache - values computed so far
Returns: Action[]
the corresponding action in the table
Overrides: generic.Policy.execute

default(self, choices, state, observations)

source code 

Generates a default RHS, presumably with minimal effort. The exact method is determined by the seedMethod class attribute.

Parameters:
  • state (KeyedVector) - the current state
  • observations (dict:str→Action[]) - the current observation history
Returns: Action[]
the corresponding action in the table

getLookahead(self)

source code 
Returns: str[]
the turn sequence used for the forward projection

getPolicies(self)

source code 
Returns: str→PolicyTable
a dictionary of the policies of all of the agents in this entity's lookahed

perturb(self, policies, interrupt=None, debug=False)

source code 

Consider a random perturbation of this policy

Returns: bool
True iff a better variation was found

abstractSolve(self, policies, interrupt=None, progress=None)

source code 

Generates an abstract state space (defined by the LHS attributes) and does value iteration to generate a policy

Returns: bool

Warning: assumes that all of the agents have the same LHS attribute breakdown! (not hard to overcome this assumption, but annoying to code)

abstractIterate(self, policies, rewards, interrupt)

source code 

One pass of value iteration over abstract state space.

Parameters:
  • rewards (str→(str→float)[]) - rewards[name][state][action] = the reward that agent name gets in state state (index form) derived from the performance of action
Returns: float
the total change to the value function

Warning: Currently works for only two agents (easy, but messy, to generalize)

evaluate(self, policies, state, observations, history=None, debug=False, fixed=True, start=0, details=False)

source code 

Computes the expected value of this policy in response to the given policies for the other agents

Parameters:
  • policies (dict:str→PolicyTable) - the policies that the other agents are expected to follow
  • state (KeyedVector) - the current state
  • observations (dict:str→Action[]) - the current observation history
  • history (dict) - dictionary of actions that have been performed so far (not changed by this method)
  • fixed (bool) - flag, if True, then the other agents follow their given policies; otherwise, they can optimize
  • start (int) - the time offset to use in the lookahead (0 starts with this agent, and is the default)
  • details - flag, if True, then the details of this evaluation are returned in addition to the expected value
Returns: float
expected value of this policy (and sequence of actions if details is True)

localSolve(self, policies, state, observations, update=False, debug=False)

source code 

Determines the best action out of the available options, given the current state and observation history, and while holding fixed the expected policies of the other agents.

Parameters:
  • update (bool) - if True, as a side effect, this policy is modified to have this best action be the RHS of the applicable rule.
Returns: Action[]
the best action

chooseRule(self)

source code 

Generates a random state and observation history and finds the rule corresponding to them

Returns: (KeyedVector,dict:str→Action[],int)
a tuple of the state, observation history, and rule index

abstract(self, index)

source code 
Parameters:
  • index (int) - the rule index of interest
Returns: dict[]
the abstract state subspace where the given rule is applicable, in the form of a list of intervals, one for each attribute, where each interval is a dictionary with keys weights, index, lo, and hi

toIndex(self, choices=None)

source code 
Returns: int
the n-ary representation of the RHS values (where n is the number of possible RHS values)

updateAttributes(self, actor, attributes, diffs, leaves)

source code 
Parameters:
  • actor (str) - the name of the agent whose turn it is
  • attributes (KeyedPlane[]) - the attributes already found

importTable(self, table)

source code 

Takes the given table and uses it to set the LHS and RHS of this policy (making sure that the RHS refers to my entity instead)

Parameters:
  • table (PWLTable) - the PWL table that contains the relevant LHS and RHS

Warning: Does not import value function, just policy itself

__copy__(self)

source code 
Overrides: pwlTable.PWLTable.__copy__

__xml__(self)

source code 
Overrides: LookaheadPolicy.LookaheadPolicy.__xml__

parse(self, element)

source code 
Overrides: LookaheadPolicy.LookaheadPolicy.parse

Class Variable Details [hide private]

seedMethod

the method to use to generate the initial RHS values to seed policy iteration. There are currently two types supported:
  • random: the RHS values are randomly generated
  • greedy: the initial RHS value provides the best immediate expected reward (i.e., over a one-step horizon)

The default is greedy.

Type:
str
Value:
'greedy'