Add Q-Learning algorithm implementation with epsilon-greedy policy an… #13402

idklol22 · 2025-10-10T03:46:21Z

…d grid world demo

Describe your change:

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

I have read CONTRIBUTING.md.
This pull request is all my own work -- I have not plagiarized.
I know that pull requests will not be merged if they fail the automated tests.
This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
All new Python files are placed inside an existing directory.
All filenames are in all lowercase characters with no spaces or dashes.
All functions and variable names follow Python naming conventions.
All function parameters and return values are annotated with Python type hints.
All functions have doctests that pass the automated testing.
All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

…d grid world demo

…d grid world demo code

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-10T04:30:53Z

machine_learning/q_learning.py

+current_state = (0, 0)
+
+
+def get_q_value(state, action):


Please provide return type hint for the function: get_q_value. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: action

algorithms-keeper · 2025-10-10T04:30:53Z

machine_learning/q_learning.py

+ return q_table[state][action]
+
+
+def get_best_action(state, available_actions):


Please provide return type hint for the function: get_best_action. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: available_actions

algorithms-keeper · 2025-10-10T04:30:53Z

machine_learning/q_learning.py

+ return random.choice(best)
+
+
+def choose_action(state, available_actions):


Please provide return type hint for the function: choose_action. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: available_actions

algorithms-keeper · 2025-10-10T04:30:53Z

machine_learning/q_learning.py

+ return get_best_action(state, available_actions)
+
+
+def update(state, action, reward, next_state, next_available_actions, done=False):


Please provide return type hint for the function: update. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: state

Please provide type hint for the parameter: action

Please provide type hint for the parameter: reward

Please provide type hint for the parameter: next_state

Please provide type hint for the parameter: next_available_actions

Please provide type hint for the parameter: done

algorithms-keeper · 2025-10-10T04:30:53Z

machine_learning/q_learning.py

+ q_table[state][action] = new_q
+
+
+def get_policy():


Please provide return type hint for the function: get_policy. If the function does not return a value, please provide the type hint as: def function() -> None:

algorithms-keeper · 2025-10-10T04:30:54Z

machine_learning/q_learning.py

+ return policy
+
+
+def reset_env():


As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function reset_env

Please provide return type hint for the function: reset_env. If the function does not return a value, please provide the type hint as: def function() -> None:

algorithms-keeper · 2025-10-10T04:30:54Z

machine_learning/q_learning.py

+ return current_state
+
+
+def get_available_actions_env():


As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function get_available_actions_env

Please provide return type hint for the function: get_available_actions_env. If the function does not return a value, please provide the type hint as: def function() -> None:

algorithms-keeper · 2025-10-10T04:30:54Z

machine_learning/q_learning.py

+ return [0, 1, 2, 3]
+
+
+def step_env(action):


As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function step_env

Please provide return type hint for the function: step_env. If the function does not return a value, please provide the type hint as: def function() -> None:

Please provide type hint for the parameter: action

algorithms-keeper · 2025-10-10T04:30:54Z

machine_learning/q_learning.py

+ return next_state, reward, done
+
+
+def run_q_learning():


As there is no test file in this pull request nor any test function or class in the file machine_learning/q_learning.py, please provide doctest for the function run_q_learning

Please provide return type hint for the function: run_q_learning. If the function does not return a value, please provide the type hint as: def function() -> None:

for more information, see https://pre-commit.ci

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-10T05:16:40Z

machine_learning/q_learning.py

+
+# Type alias for state
+type State = tuple[int, int]
+


An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last): File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse reports = lint_file( ^^^^^^^^^^ libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1. parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, | type State = tuple[int, int] ^

for more information, see https://pre-commit.ci

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-10T05:26:44Z

machine_learning/q_learning.py

+
+# Type alias for state
+type State = tuple[int, int]
+


An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last): File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse reports = lint_file( ^^^^^^^^^^ libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1. parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, | type State = tuple[int, int] ^

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-10T05:35:50Z

machine_learning/q_learning.py

+
+# Type alias for state
+type State = tuple[int, int]
+


An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last): File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse reports = lint_file( ^^^^^^^^^^ libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1. parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, | type State = tuple[int, int] ^

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-10T12:18:10Z

machine_learning/q_learning.py

+
+# Type alias for state
+type State = tuple[int, int]
+


An error occurred while parsing the file: machine_learning/q_learning.py

Traceback (most recent call last): File "/opt/render/project/src/algorithms_keeper/parser/python_parser.py", line 146, in parse reports = lint_file( ^^^^^^^^^^ libcst._exceptions.ParserSyntaxError: Syntax Error @ 16:1. parser error: error at 15:11: expected one of !=, %, &, (, *, **, +, ,, -, ., /, //, :, ;, <, <<, <=, =, ==, >, >=, >>, @, NEWLINE, [, ^, and, if, in, is, not, or, | type State = tuple[int, int] ^

The issue came from the line type State = tuple[int, int], which uses the new PEP 695 type alias syntax introduced in Python 3.12. It works fine locally if you’re running Python 3.12 or newer, but the CI/CD environment (Render + Ruff parser) is running on an older Python version, probably 3.11 or below. Since older interpreters don’t understand the type keyword as a valid alias declaration, the parser throws a syntax error like ParserSyntaxError @ line 16. Basically, the parser just doesn’t know what to do with the type statement. To make it compatible across all environments, I tried to replace it with the old typing style alias:
from typing import Tuple
State = Tuple[int, int]
but ruff test is failing with this , so had to revert back .

IT STILL WORKS WITH PYTHON 3.12+

@cclauss hey can you please review if any changes are needed

idklol22 added 2 commits October 10, 2025 09:01

Add Q-Learning algorithm implementation with epsilon-greedy policy an…

2b09382

…d grid world demo

Add Q-Learning algorithm implementation with epsilon-greedy policy an…

f0e2151

…d grid world demo code

algorithms-keeper bot added require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html labels Oct 10, 2025

algorithms-keeper bot reviewed Oct 10, 2025

View reviewed changes

algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Oct 10, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

3f0ec83

for more information, see https://pre-commit.ci

algorithms-keeper bot added the tests are failing Do not merge until tests pass label Oct 10, 2025

bug fixes and linting

0728312

algorithms-keeper bot removed require tests Tests [doctest/unittest/pytest] are required require type hints https://docs.python.org/3/library/typing.html labels Oct 10, 2025

algorithms-keeper bot reviewed Oct 10, 2025

View reviewed changes

pre-commit-ci bot and others added 2 commits October 10, 2025 05:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7b5349

for more information, see https://pre-commit.ci

cls

39d121a

algorithms-keeper bot reviewed Oct 10, 2025

View reviewed changes

bug fix and hints

f3594e6

algorithms-keeper bot reviewed Oct 10, 2025

View reviewed changes

algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Oct 10, 2025

idklol22 force-pushed the add-q-learning-algorithm1 branch from 9397754 to f3594e6 Compare October 10, 2025 12:18

algorithms-keeper bot reviewed Oct 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Q-Learning algorithm implementation with epsilon-greedy policy an… #13402

Add Q-Learning algorithm implementation with epsilon-greedy policy an… #13402

idklol22 commented Oct 10, 2025

algorithms-keeper bot left a comment

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot left a comment

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot left a comment

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot left a comment

algorithms-keeper bot Oct 10, 2025

algorithms-keeper bot left a comment

algorithms-keeper bot Oct 10, 2025

idklol22 Oct 10, 2025

idklol22 Oct 10, 2025

Labels

1 participant

		return q_table[state][action]


		def get_best_action(state, available_actions):

		return random.choice(best)


		def choose_action(state, available_actions):

		return get_best_action(state, available_actions)


		def update(state, action, reward, next_state, next_available_actions, done=False):

Uh oh!

Add Q-Learning algorithm implementation with epsilon-greedy policy an… #13402

Are you sure you want to change the base?

Add Q-Learning algorithm implementation with epsilon-greedy policy an… #13402

Conversation

idklol22 commented Oct 10, 2025

Describe your change:

Checklist:

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

algorithms-keeper bot Oct 10, 2025

Choose a reason for hiding this comment

idklol22 Oct 10, 2025

Choose a reason for hiding this comment

idklol22 Oct 10, 2025

Choose a reason for hiding this comment

Labels

1 participant