Skip to content

Commit bf2d8ea

Browse files
Update README.md
1 parent 54634ca commit bf2d8ea

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Run src/vector_explorer.py - this script loads the embeddings and feeds them to
2121
Run src/bug_generator.py - this file will look through the corpus of source code and look for specific logic patterns and alter them in a structured way, to goal is to take clean code and transform it into buggy code. This will give us an arbitrarily large amount of buggy and clean code. This creates 2 pickle files, and stores them in src/py2vec/ - These 2 files will contain arrays containing buggy and non-buggy code examples.
2222

2323
### Step 5
24-
Run src/RNN_model/.ipynb - This notebook loads the embeddings and buggy and non-buggy examples, creates an embedding matrix, builds an LSTM-RNN using Keras with a TensorFlow backend and trains it on the code examples.
24+
Run src/RNN_model/.ipynb - This notebook loads the embeddings and buggy and non-buggy examples, creates an embedding matrix, builds an LSTM-RNN using Keras with a TensorFlow backend and trains it on the code examples. Take a look at this file because I goes into much more detail into the reasoning behind the decisions we made.
2525

2626
## Results and Final Thoughts
2727
I believe this project takes a very interesting approach to bug detection. I was not able to find many research projects taking this kind of approach. I took a lot of my ideas for this project from modern Natural Language Processing. While this project was pretty small scale I think the result we got were very promising and prove that this concept will work and I am confident it could scale up very well. We tested this project on only about 2 million lines of code, which in Deep Learning isn't really that much data, to get better embeddings its not uncommon to see corpuses upwards of 1 billion lines when creating word embeddings, this would be the number I would strive for in this project.

0 commit comments

Comments
 (0)