Skip to content

Commit dc2d761

Browse files
committed
ensemble
1 parent ef51242 commit dc2d761

File tree

3 files changed

+3
-64
lines changed

3 files changed

+3
-64
lines changed

README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Used in the paper [Multi-Agent Actor-Critic for Mixed Cooperative-Competitive En
2626

2727
- `./multiagent/policy.py`: contains code for interactive policy based on keyboard input.
2828

29-
- `./multiagent/scenario.py`: contains base scenario object that is extended for all scenarios. Also contains base code for the ensemble scenarios.
29+
- `./multiagent/scenario.py`: contains base scenario object that is extended for all scenarios.
3030

3131
- `./multiagent/scenarios/`: folder where various scenarios/ environments are stored. scenario code consists of several functions:
3232
1) `make_world()`: creates all of the entities that inhabit the world (landmarks, agents, etc.), assigns their capabilities (whether they can communicate, or move, or both).
@@ -46,9 +46,6 @@ You can create new scenarios by implementing the first 4 functions above (`make_
4646

4747
| Env name in code (name in paper) | Communication? | Competitive? | Notes |
4848
| --- | --- | --- | --- |
49-
| `ensemble_adversary.py` (Physical deception) | N | Y | Same as simple_adversary below, where agents are trained with an ensemble of policies. |
50-
| `ensemble_push.py` (Keep-away) | N | Y | Same as simple_push below, where agents are trained with an ensemble of policies. |
51-
| `ensemble_tag.py` (Predator-prey) | N | Y | Same as simple_tag below, where agents are trained with an ensemble of policies. |
5249
| `simple.py` | N | N | Single agent sees landmark position, rewarded based on how close it gets to landmark. Not a multiagent environment -- used for debugging policies. |
5350
| `simple_adversary.py` (Physical deception) | N | Y | 1 adversary (red), N good agents (green), N landmarks (usually N=2). All agents observe position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents rewarded based on how close one of them is to the target landmark, but negatively rewarded if the adversary is close to target landmark. Adversary is rewarded based on how close it is to the target, but it doesn’t know which landmark is the target landmark. So good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary. |
5451
| `simple_crypto.py` (Covert communication) | Y | Y | Two good agents (alice and bob), one adversary (eve). Alice must sent a private message to bob over a public channel. Alice and bob are rewarded based on how well bob reconstructs the message, but negatively rewarded if eve can reconstruct the message. Alice and bob have a private key (randomly generated at beginning of each episode), which they must learn to use to encrypt the message. |

multiagent/environment.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,7 @@ def _get_reward(self, agent):
137137
# set env action for a particular agent
138138
def _set_action(self, action, agent, action_space, time=None):
139139
agent.action.u = np.zeros(self.world.dim_p)
140-
#agent.action.c = np.zeros(self.world.dim_c)
141-
agent.action.c *= self.comm_decay
140+
agent.action.c = np.zeros(self.world.dim_c)
142141
# process action
143142
if isinstance(action_space, spaces.MultiDiscrete):
144143
act = []
@@ -170,7 +169,7 @@ def _set_action(self, action, agent, action_space, time=None):
170169
agent.action.u[1] += action[0][3] - action[0][4]
171170
else:
172171
agent.action.u = action[0]
173-
sensitivity = 5.0 #5.0 #1.0 #0.25
172+
sensitivity = 5.0
174173
if agent.accel is not None:
175174
sensitivity = agent.accel
176175
agent.action.u *= sensitivity

multiagent/scenario.py

Lines changed: 0 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -8,60 +8,3 @@ def make_world(self):
88
# create initial conditions of the world
99
def reset_world(self, world):
1010
raise NotImplementedError()
11-
12-
class EnsembleBaseScenario(BaseScenario):
13-
def __init__(self):
14-
self.partition = 'rand'
15-
self.partition_flag = -1
16-
self.measure_success = False
17-
18-
def select_agents(self, world):
19-
good_agents = [agent for agent in world.all_agents if not agent.adversary]
20-
adversary_agents = [agent for agent in world.all_agents if agent.adversary]
21-
n_good = world.good_part_n
22-
n_bad = world.adversary_part_n
23-
if self.partition == 'rand':
24-
np.random.shuffle(good_agents)
25-
np.random.shuffle(adversary_agents)
26-
world.agents = adversary_agents[:world.num_adversaries] + \
27-
good_agents[:(world.num_agents - world.num_adversaries)]
28-
elif self.partition == 'fix':
29-
k = np.random.choice(world.partition_n)
30-
bad_part = adversary_agents[k * n_bad: (k + 1) * n_bad]
31-
np.random.shuffle(bad_part)
32-
good_part = good_agents[k * n_good: (k + 1) * n_good]
33-
np.random.shuffle(good_part)
34-
world.agents = bad_part[:world.num_adversaries] + good_part[:(world.num_agents - world.num_adversaries)]
35-
else:
36-
fix_good = good_agents[:n_good]
37-
rand_good_all = good_agents[n_good:]
38-
np.random.shuffle(fix_good)
39-
fix_bad = adversary_agents[:n_bad]
40-
rand_bad_all = adversary_agents[n_bad:]
41-
np.random.shuffle(fix_bad)
42-
# pick a team from rand-good/bad
43-
t = np.random.choice(world.partition_n - 1) # excluding fix-team
44-
rand_good = rand_good_all[t * n_good: (t+1) * n_good]
45-
t = np.random.choice(world.partition_n - 1)
46-
rand_bad = rand_bad_all[t * n_bad: (t+1) * n_bad]
47-
np.random.shuffle(rand_good)
48-
np.random.shuffle(rand_bad)
49-
if self.partition == 'mix':
50-
k = np.random.choice(world.partition_n)
51-
if self.partition_flag > -1: # only use fixed partition
52-
k = self.partition_flag
53-
if k == 0:
54-
world.agents = fix_bad[:world.num_adversaries] + fix_good[:(world.num_agents - world.num_adversaries)]
55-
else:
56-
world.agents = rand_bad[:world.num_adversaries] + \
57-
rand_good[:(world.num_agents - world.num_adversaries)]
58-
else:
59-
if self.partition_flag > -1:
60-
k = self.partition_flag
61-
else:
62-
k = np.random.choice(2)
63-
if k == 0:
64-
world.agents = fix_bad[:world.num_adversaries] + rand_good[:(world.num_agents - world.num_adversaries)]
65-
else:
66-
world.agents = rand_bad[:world.num_adversaries] + fix_good[:(world.num_agents - world.num_adversaries)]
67-
assert (len(world.agents) == world.num_agents)

0 commit comments

Comments
 (0)