Skip to content

Correction of benchmark results #87

@kvas7andy

Description

@kvas7andy

Hi everyone,

Found several bugs while checking the code of ipynb notebooks with benchmark results for 3 environments TinyToy, ToyCTF, Chain.

I think my findings might be useful for community, who uses this nice implementation of cyberattacks simulation.

MOVED TO SEPARATE ISSUE #115

  1. Issue 1: learner.epsilon_greedy_search(...) is often used for training agents with different algorithms, including DQL in the dql_run. However dql_exploit_run with input network dql_run as policy-agent and eval_episode_count parameter for the number of episodes, gives an impression that runs are used for evaluation of the trained DQN. The only distinguishable difference between 2 runs is epsilon queal to 0, which leads to exploitation mode of training, but does not exclude training, because during run with learner.epsilon_greedy_search the optimizer.step() is executed on each step of training in the file agent_dql.py, function call learner.on_step(...).
  1. Issue 2: During training each episode ends only within the maximum number of iterations, which is due to the mistype in AttackerGoal class. Default value for parameter own_atleast_percent: float 1.0 is included as condition with AND, for raising flag done = True, thus for TinyToy and ToyCTF (not Chain) leads to long duration of training, wrong RL signal for evaluating Q function and low sample-efficiency.

MOVED TO SEPARATE ISSUE #115
3. Issue 3: ToyCTF benchmark is inaccurate, because with correct evaluation procedure, like with chain network configuration, agent does not reqch goal of 6 owned nodes after 200 training episodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions