tf_agents.environments.RandomPyEnvironment

Randomly generates observations following the given observation_spec.

Inherits From: PyEnvironment

View aliases

Main aliases

tf_agents.environments.random_py_environment.RandomPyEnvironment

tf_agents.environments.RandomPyEnvironment( observation_spec: tf_agents.typing.types.NestedArray, action_spec: Optional[types.NestedArray] = None, episode_end_probability: tf_agents.typing.types.Float = 0.1, discount: tf_agents.typing.types.Float = 1.0, reward_fn: Optional[tf_agents.environments.random_py_environment.RewardFn] = None, batch_size: Optional[types.Int] = None, auto_reset: bool = True, seed: tf_agents.typing.types.Seed = 42, render_size: Sequence[int] = (2, 2, 3), min_duration: tf_agents.typing.types.Int = 0, max_duration: Optional[types.Int] = None )

Used in the notebooks

Used in the tutorials
Networks

If an action_spec is provided it validates that the actions used to step the environment fall within the defined spec.

Args
`observation_spec`	An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.
`action_spec`	An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.
`episode_end_probability`	Probability an episode will end when the environment is stepped.
`discount`	Discount to set in time_steps.
`reward_fn`	Callable that takes in step_type, action, an observation(s), and returns a numpy array of rewards.
`batch_size`	(Optional) Number of observations generated per call. If this value is not `None`, then all actions are expected to have an additional major axis of size `batch_size`, and all outputs will have an additional major axis of size `batch_size`.
`auto_reset`	Bool, whether the random environment will auto reset when it reaches the end of the episode. By default it will.
`seed`	Seed to use for rng used in observation generation.
`render_size`	Size of the random render image to return when calling render.
`min_duration`	Number of steps at the beginning of the episode during which the episode can not terminate.
`max_duration`	Optional number of steps after which the episode terminates regarless of the termination probability.

Raises
`ValueError`	If batch_size argument is not None and does not match the shapes of discount or reward.

Attributes
`batch_size`	The batch size of the environment.
`batched`	Whether the environment is batched or not. If the environment supports batched observations and actions, then overwrite this property to True. A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size. When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension. When batched and handle_auto_reset, it checks `np.all(steps.is_last())`.

Attributes

batch_size The batch size of the environment.

batched

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

When batched and handle_auto_reset, it checks np.all(steps.is_last()).

Methods

`action_spec`

View source

action_spec() -> tf_agents.typing.types.NestedArraySpec

Defines the actions that should be provided to step().

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

Returns
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.

`close`

View source

close() -> None

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method be used directly

env = Env(...) # Use env. env.close()

or via a context manager

with Env(...) as env: # Use env.

`current_time_step`

View source

current_time_step() -> tf_agents.trajectories.TimeStep

Returns the current timestep.

`discount_spec`

View source

discount_spec() -> tf_agents.typing.types.NestedArraySpec

Defines the discount that are returned by step().

Override this method to define an environment that uses non-standard discount values, for example an environment with array-valued discounts.

Returns
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.

`get_info`

View source

get_info() -> tf_agents.typing.types.NestedArray

Returns the environment info returned on the last step.

Returns
Info returned by last call to step(). None by default.

Raises
`NotImplementedError`	If the environment does not use info.

`get_state`

View source

get_state() -> Any

Returns the state of the environment.

The state contains everything required to restore the environment to the current configuration. This can contain e.g.

The current time_step.
The number of steps taken in the environment (for finite horizon MDPs).
Hidden state (for POMDPs).

Callers should not assume anything about the contents or format of the returned state. It should be treated as a token that can be passed back to set_state() later.

Note that the returned state handle should not be modified by the environment later on, and ensuring this (e.g. using copy.deepcopy) is the responsibility of the environment.

Returns
`state`	The current state of the environment.

`observation_spec`

View source

observation_spec() -> tf_agents.typing.types.NestedArraySpec

Defines the observations provided by the environment.

May use a subclass of ArraySpec that specifies additional properties such as min and max bounds on the values.

Returns
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.

`render`

View source

render( mode: Text = 'rgb_array' ) -> np.ndarray

Renders the environment.

Args
`mode`	One of ['rgb_array', 'human']. Renders to an numpy array, or brings up a window where the environment can be visualized.

Returns
An ndarray of shape [width, height, 3] denoting an RGB image if mode is `rgb_array`. Otherwise return nothing and render directly to a display window.

Raises
`NotImplementedError`	If the environment does not support rendering.

`reset`

View source

reset() -> tf_agents.trajectories.TimeStep

Starts a new sequence and returns the first TimeStep of this sequence.

Returns
A `TimeStep` namedtuple containing: step_type: A `StepType` of `FIRST`. reward: 0.0, indicating the reward. discount: 1.0, indicating the discount. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to `observation_spec()`.

`reward_spec`

View source

reward_spec() -> tf_agents.typing.types.NestedArraySpec

Defines the rewards that are returned by step().

Override this method to define an environment that uses non-standard reward values, for example an environment with array-valued rewards.

Returns
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.

`seed`

View source

seed( seed: tf_agents.typing.types.Seed ) -> None

Seeds the environment.

Args
`seed`	Value to use as seed for the environment.

`set_state`

View source

set_state( state: Any ) -> None

Restores the environment to a given state.

See definition of state in the documentation for get_state().

Args
`state`	A state to restore the environment to.

`should_reset`

View source

should_reset( current_time_step: tf_agents.trajectories.TimeStep ) -> bool

Whether the Environmet should reset given the current timestep.

By default it only resets when all time_steps are LAST.

Args
`current_time_step`	The current `TimeStep`.

Returns
A bool indicating whether the Environment should reset or not.

`step`

View source

step( action: tf_agents.typing.types.NestedArray ) -> tf_agents.trajectories.TimeStep

Updates the environment according to the action and returns a TimeStep.

If the environment returned a TimeStep with StepType.LAST at the previous step the implementation of _step in the environment should call reset to start a new sequence and ignore action.

This method will start a new sequence if called after the environment has been constructed and reset has not been called. In this case action will be ignored.

If should_reset(current_time_step) is True, then this method will reset by itself. In this case action will be ignored.

Args
`action`	A NumPy array, or a nested dict, list or tuple of arrays corresponding to `action_spec()`.

Returns
A `TimeStep` namedtuple containing: step_type: A `StepType` value. reward: A NumPy array, reward value for this timestep. discount: A NumPy array, discount in the range [0, 1]. observation: A NumPy array, or a nested dict, list or tuple of arrays corresponding to `observation_spec()`.

`time_step_spec`

View source

time_step_spec() -> tf_agents.trajectories.TimeStep

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with array-valued rewards.

Returns
A `TimeStep` namedtuple containing (possibly nested) `ArraySpec`s defining the step_type, reward, discount, and observation structure.

`enter`

View source

__enter__()

Allows the environment to be used in a with-statement context.

`exit`

View source

__exit__( unused_exception_type, unused_exc_value, unused_traceback )

Allows the environment to be used in a with-statement context.

tf_agents.environments.RandomPyEnvironment Stay organized with collections Save and categorize content based on your preferences.

View aliases

Used in the notebooks

Args

Raises

Attributes

Methods

action_spec

close

current_time_step

discount_spec

get_info

get_state

observation_spec

render

reset

reward_spec

seed

set_state

should_reset

step

time_step_spec

__enter__

__exit__

tf_agents.environments.RandomPyEnvironment

`action_spec`

`close`

`current_time_step`

`discount_spec`

`get_info`

`get_state`

`observation_spec`

`render`

`reset`

`reward_spec`

`seed`

`set_state`

`should_reset`

`step`

`time_step_spec`

`enter`

`exit`