|
1 | 1 | # datamining-project |
2 | 2 | This is a project for FGCU's knowledge discovery and data mining course. |
| 3 | +It focuses on spam email classification. |
| 4 | + |
| 5 | +Part 1 of the project is an implementation of a spam email classifier from scratch in Java 8, |
| 6 | +using a Naive Bayes classifier, and also a K-nearest-neighbors classifier. |
| 7 | + |
| 8 | +Part 2 of the project is a pair of experiments in Scikit learn with various other algorithms. |
| 9 | + |
| 10 | +## Run instructions: |
| 11 | + |
| 12 | +### Ensure that the dataset meets the following criteria: |
| 13 | + - All files must be .txt files |
| 14 | + - All files must be stored alone in their directory with no other files |
| 15 | + - All files be divided between two subdirectories, `test` and `training` |
| 16 | + - All spam emails are expected to have a filename beginning with "sp" |
| 17 | + |
| 18 | +### Usage: |
| 19 | + |
| 20 | + Part 1: |
| 21 | + |
| 22 | + ``` |
| 23 | + usage: Classify |
| 24 | + data_path |
| 25 | + algorithm_name (Either "knn" or "naivebayes") |
| 26 | + [k] The K value to use - only used if algorithm is "knn" |
| 27 | + ``` |
| 28 | + |
| 29 | + Part 2: |
| 30 | + |
| 31 | + ``` |
| 32 | + usage: python experiments.py |
| 33 | + |
| 34 | + There are no arguments - the program is merely a script which conducts experiments with various values. |
| 35 | + ``` |
| 36 | + |
| 37 | + |
| 38 | +### Run instuctions: |
| 39 | + |
| 40 | +Part 1: |
| 41 | + |
| 42 | + 1. Ensure that you have Java JRE 8 installed. |
| 43 | + 2. Compile the program using Gradle: |
| 44 | + (From the project's part1 subdirectory) |
| 45 | + `./gradlew build` |
| 46 | + The generated Java class files will then be in the /build/classes/java/main directory. |
| 47 | + (The program can also be compiled like a standard java program if this has problems). |
| 48 | + 3. Run the program as `java Classify arguments` |
| 49 | + The arugments are described in the above section |
| 50 | + |
| 51 | +Part 2: |
| 52 | + |
| 53 | + 1. Ensure that you have Python 3.7 or greater installed. |
| 54 | + a. If you have _both_ Python 2 and Python 3 installed, you will need to ensure you run the program using `python3`. |
| 55 | + 2. Ensure that you have pip installed. |
| 56 | + b. If you have _both_ Python 2 and Python 3 installed, you will need to run pip commands using `pip3`. |
| 57 | + 3. Install scikit-learn: `pip install sklearn` |
| 58 | + 4. Install numpy: `pip install numpy` |
| 59 | + 5. Run the program as: `python experiments.py path_to_data` |
0 commit comments