View source on GitHub |
Randomly generates observations following the given observation_spec.
Inherits From: PyEnvironment
tf_agents.environments.RandomPyEnvironment( observation_spec: tf_agents.typing.types.NestedArray, action_spec: Optional[types.NestedArray] = None, episode_end_probability: tf_agents.typing.types.Float = 0.1, discount: tf_agents.typing.types.Float = 1.0, reward_fn: Optional[tf_agents.environments.random_py_environment.RewardFn] = None, batch_size: Optional[types.Int] = None, auto_reset: bool = True, seed: tf_agents.typing.types.Seed = 42, render_size: Sequence[int] = (2, 2, 3), min_duration: tf_agents.typing.types.Int = 0, max_duration: Optional[types.Int] = None ) Used in the notebooks
| Used in the tutorials |
|---|
If an action_spec is provided it validates that the actions used to step the environment fall within the defined spec.
Raises | |
|---|---|
ValueError | If batch_size argument is not None and does not match the shapes of discount or reward. |
Methods
action_spec
action_spec() -> tf_agents.typing.types.NestedArraySpec Defines the actions that should be provided to step().
May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs. |
close
close() -> None Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method be used directly
env = Env(...) # Use env. env.close() or via a context manager
with Env(...) as env: # Use env. current_time_step
current_time_step() -> tf_agents.trajectories.TimeStep Returns the current timestep.
discount_spec
discount_spec() -> tf_agents.typing.types.NestedArraySpec Defines the discount that are returned by step().
Override this method to define an environment that uses non-standard discount values, for example an environment with array-valued discounts.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs. |
get_info
get_info() -> tf_agents.typing.types.NestedArray Returns the environment info returned on the last step.
| Returns | |
|---|---|
| Info returned by last call to step(). None by default. |
| Raises | |
|---|---|
NotImplementedError | If the environment does not use info. |
get_state
get_state() -> Any Returns the state of the environment.
The state contains everything required to restore the environment to the current configuration. This can contain e.g.
- The current time_step.
- The number of steps taken in the environment (for finite horizon MDPs).
- Hidden state (for POMDPs).
Callers should not assume anything about the contents or format of the returned state. It should be treated as a token that can be passed back to set_state() later.
Note that the returned state handle should not be modified by the environment later on, and ensuring this (e.g. using copy.deepcopy) is the responsibility of the environment.
| Returns | |
|---|---|
state | The current state of the environment. |
observation_spec
observation_spec() -> tf_agents.typing.types.NestedArraySpec Defines the observations provided by the environment.
May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs. |
render
render( mode: Text = 'rgb_array' ) -> np.ndarray Renders the environment.
| Args | |
|---|---|
mode | One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized. |
| Returns | |
|---|---|
An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window. |
| Raises | |
|---|---|
NotImplementedError | If the environment does not support rendering. |
reset
reset() -> tf_agents.trajectories.TimeStep Starts a new sequence and returns the first TimeStep of this sequence.
| Returns | |
|---|---|
A TimeStep namedtuple containing: step_type: A StepType of FIRST. reward: 0.0, indicating the reward. discount: 1.0, indicating the discount. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec(). |
reward_spec
reward_spec() -> tf_agents.typing.types.NestedArraySpec Defines the rewards that are returned by step().
Override this method to define an environment that uses non-standard reward values, for example an environment with array-valued rewards.
| Returns | |
|---|---|
An ArraySpec, or a nested dict, list or tuple of ArraySpecs. |
seed
seed( seed: tf_agents.typing.types.Seed ) -> None Seeds the environment.
| Args | |
|---|---|
seed | Value to use as seed for the environment. |
set_state
set_state( state: Any ) -> None Restores the environment to a given state.
See definition of state in the documentation for get_state().
| Args | |
|---|---|
state | A state to restore the environment to. |
should_reset
should_reset( current_time_step: tf_agents.trajectories.TimeStep ) -> bool Whether the Environmet should reset given the current timestep.
By default it only resets when all time_steps are LAST.
| Args | |
|---|---|
current_time_step | The current TimeStep. |
| Returns | |
|---|---|
| A bool indicating whether the Environment should reset or not. |
step
step( action: tf_agents.typing.types.NestedArray ) -> tf_agents.trajectories.TimeStep Updates the environment according to the action and returns a TimeStep.
If the environment returned a TimeStep with StepType.LAST at the previous step the implementation of _step in the environment should call reset to start a new sequence and ignore action.
This method will start a new sequence if called after the environment has been constructed and reset has not been called. In this case action will be ignored.
If should_reset(current_time_step) is True, then this method will reset by itself. In this case action will be ignored.
| Args | |
|---|---|
action | A NumPy array, or a nested dict, list or tuple of arrays corresponding to action_spec(). |
| Returns | |
|---|---|
A TimeStep namedtuple containing: step_type: A StepType value. reward: A NumPy array, reward value for this timestep. discount: A NumPy array, discount in the range [0, 1]. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to observation_spec(). |
time_step_spec
time_step_spec() -> tf_agents.trajectories.TimeStep Describes the TimeStep fields returned by step().
Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with array-valued rewards.
| Returns | |
|---|---|
A TimeStep namedtuple containing (possibly nested) ArraySpecs defining the step_type, reward, discount, and observation structure. |
__enter__
__enter__() Allows the environment to be used in a with-statement context.
__exit__
__exit__( unused_exception_type, unused_exc_value, unused_traceback ) Allows the environment to be used in a with-statement context.
View source on GitHub