0% found this document useful (0 votes)

35 views13 pages

Synopsis ML Projectpdf

The project report focuses on developing an online payment fraud detection system using machine learning techniques to analyze transaction data and identify fraudulent activities. It details the methodology, including data collection, preprocessing, model building, and evaluation, with a final model achieving a prediction accuracy of 99.92% using a Decision Tree algorithm. The report emphasizes the need for robust detection mechanisms to combat the increasing threat of online payment fraud in e-commerce.

Uploaded by

17guptam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views13 pages

Synopsis ML Projectpdf

Uploaded by

17guptam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

A PROJECT REPORT

ON
“Online Payment Fraud Detection”

SESSION: 2024
SCHOOL OF COMPUTER SCIENCE & ENGINEERING
GREATER NOIDA, UTTAR PRADESH, INDIA

Submitted To:
DR. PRASHANT JOHRI

Submitted By:
Divy Anant Varshney - (23SCSE2030392) (MCA sec-04)

Yadvendra Singh Rathaur -(23SCSE2030453) (MCA sec-04)

Monika Gupta -(23SCSE2030438) (MCA sec-04)

TABLE OF CONTENTS

● Literature Review

● The Problem Statement

● Exploring Data

● Statistics

● Proposed System

● Flowchart

● Methodology

● Result

● References
Literature Review

Online transaction fraud is a simple and easy target. E-commerce and

other online sites have increased the number of online payment
methods, raising the danger of online fraud. With the rise in fraud
rates, machine learning approaches can be used to identify and
evaluate fraud in online transactions. The primary goal of this project
is to implement supervised machine learning models for fraud
detection, with the goal of analyzing prior transaction information.
Where transactions are classified into distinct groups based on the type
of transaction. Following that, various classifiers are trained
independently, and models are assessed for correctness. The classifier
with the highest rating score can then be picked as one of the best
approaches for predicting fraud. We worked with the
Kaggle Synthetic Financial Datasets for Fraud
Detection dataset collected by Edgar Lopez-Rojas.In this project K
Nearest Neighbor, Logistic Regression, Support Vector Machine
(SVM), Decision Tree, and Random Forest Machine Learning models
are implemented for detection of fraudulent transactions. A
comparative analysis of these algorithms is performed to identify an
optimal solution.
The Problem Statement
Online payment fraud poses a persistent challenge in e-commerce,
threatening financial losses and eroding consumer trust. The evolving tactics
of fraudsters necessitate advanced detection mechanisms to safeguard
online transactions. Despite existing preventive measures, the detection and
mitigation of fraudulent activities remain a significant concern. The problem
statement underscores the imperative for a robust and adaptive fraud
detection system capable of identifying suspicious transactions in real-time,
thereby minimizing risks for merchants and consumers alike. Addressing
this challenge requires a comprehensive understanding of fraudulent
patterns and behaviors, coupled with the implementation of sophisticated
data analysis techniques and machine learning algorithms.
Exploring Data
The first step of our project work was determining the right data set.
Many online resources exist with access to plethora of financial fraud
analysis datasets with transaction information without personal user
information.We came across many data sets
like datahack and dataworld data set. We selected the Synthetic
Financial Datasets for Fraud Detection dataset collected
by Edgar Lopez-Rojas for our task.PaySim simulates mobile money
transactions based on a sample of real transactions extracted from one
month of financial logs from a mobile money service implemented in
an African country. The original logs were provided by a multinational
company, who is the provider of the mobile financial service which is
currently running in more than 14 countries all around the world. This
synthetic dataset was scaled down to a quarter of the original dataset
and it is created just for Kaggle.This data source is obtained from
Kaggle for the detection of fraudulent online transactions. At present it
consists of 6,362,620 recordings of 5 different types of transactions and
11 columns. Among the total transactions 6,354,407(99.87%) are legal
transactions whereas 8,213(0.13%) are fraudulent transactions, which
is understandable as only a very small percentage of the total
transactions are fraud.\

The 11 columns of the dataset and what each column represents:

1. step: represents a unit of time where 1 step equals 1 hour

2. type: type of online transaction

3. amount: the amount of the transaction

4. nameOrig: customer starting the transaction

5. oldbalanceOrg: balance before the transaction

6. newbalanceOrig: balance after the transaction

7. nameDest: recipient of the transaction

8. oldbalanceDest: initial balance of recipient before the transaction

9. newbalanceDest: the new balance of recipient after the transaction

10. isFraud: fraud transaction

11. isFlaggedFraud — transfer of more than 200,000 in a single

transaction.
Statistics

In online payment fraud detection, statistics are instrumental in

analyzing transactional data and identifying fraudulent patterns.
Descriptive statistics, such as mean (μ), median, and standard
deviation (σ), provide a summary of transaction attributes like amount
and frequency. For instance, μ and σ help assess whether a transaction
amount significantly deviates from the norm, indicating potential
fraud. Correlation coefficient (ρ) quantifies relationships between
variables, like transaction amount and time, aiding in anomaly
detection. Hypothesis testing, represented by equations like t-test or
z-test, evaluates the significance of differences in transaction patterns
between normal and fraudulent activities. Moreover, predictive
models like logistic regression employ equations to estimate the
probability of fraud based on historical data, enhancing fraud
detection accuracy. These statistical tools empower fraud detection
systems to combat online payment fraud effectively.
Proposed System
The proposed system aims to bolster the security of online
payment systems by employing advanced data analysis techniques
and machine learning algorithms to detect and prevent fraudulent
transactions effectively.

Data Collection:
The system collects comprehensive transactional data from
various sources, including payment gateways, merchants, and
financial institutions. This data encompasses transaction
amounts, timestamps, user demographics, device information,
and transaction histories.
Data Preprocessing:
Upon collection, the raw transactional data undergoes
preprocessing, including data cleaning, normalization, and feature
engineering. Missing values are handled, outliers are identified
and treated, and relevant features are extracted or transformed to
enhance model performance.
Feature Selection:
Feature selection techniques, such as correlation analysis and
feature importance ranking, are employed to identify the most
discriminative features for fraud detection. This step helps reduce
dimensionality and improve model efficiency.
Model Building:
The system utilizes machine learning algorithms, including
supervised and unsupervised techniques, to build robust fraud
detection models. Supervised algorithms such as logistic
regression, decision trees, and ensemble methods learn from
labeled data to classify transactions as either legitimate or
fraudulent. Unsupervised algorithms such as clustering and
anomaly detection identify unusual patterns indicative of
fraudulent activities without the need for labeled data.
Model Training and Evaluation:
The selected models are trained on historical transaction data and
evaluated using appropriate performance metrics such as
accuracy, precision, recall, and F1-score. Cross-validation
techniques ensure the generalizability of the models, while
hyperparameter tuning optimizes their performance.
Real-time Monitoring:
The trained models are deployed in a real-time monitoring system
that continuously analyzes incoming transactions for signs of
fraud. Transactions flagged as suspicious trigger immediate alerts
for further investigation by fraud analysts or automated response
mechanisms.
Adaptive Learning:
The system incorporates adaptive learning mechanisms to
continuously update and refine the fraud detection models based
on new data and emerging fraud trends. Feedback loops enable the
system to adapt to evolving fraud tactics and maintain high
detection accuracy over time.
Reporting and Visualization:
Comprehensive reports and visualizations are generated to
provide insights into the effectiveness of the fraud detection
system. Key performance indicators, trends, and patterns are
communicated to stakeholders to support decision-making and
strategic planning.
Flowchart

The flowchart concludes with the end symbol, indicating the completion of the decision tree

algorithm. The flowchart shown in figure 3 provides a visual representation of the steps involved

in training and evaluating the decision tree model, aiding in understanding the overall process

and facilitating communication between different stakeholders.

Here is a brief explanation of the flowchart of model training in figure 3. 1. Start: The flowchart

begins with the start symbol, indicating the beginning of the decision tree algorithm. 2. Load

Dataset: The algorithm loads the dataset, which contains the input features and target variable. 3.

Define Features and Target: The feature columns and target column are defined, specifying the

variables to be used for training the decision tree. 4. Split Data: The dataset is split into training

and testing sets using the train_test_split function, allocating a portion of the data for model

evaluation. 5. Data Imputation: The SimpleImputer object is used to handle missing values in the

dataset, replacing them with the mean value of the respective feature. 6. Build Decision Tree:

The DecisionTreeClassifier object is created, representing the decision tree model. It is trained
on the training data using the fit function. 7. Predictions: The trained decision tree is utilized to

make predictions on the test set, using the predict function.

Methodology

The methodologies include the algorithm used, dataset used and flowchart of
the data used and implemented. Below is the provided step by step explanation
of the algorithm used.

Algorithm Used: The decision tree algorithm is a widely used supervised

learning technique employed for both classification and regression tasks. It
constructs a structured model resembling a flowchart, driven by input features.

Tree Construction: The algorithm commences by considering the entire dataset

as the root node, and selects the optimal feature for partitioning the data.

Feature Split: The chosen feature is utilized to divide the data into subsets,
thereby creating branches or paths within the decision tree. Recursive Splitting:
The process of feature splitting is iteratively applied to each subset until a
predefined stopping criterion is satisfied.

Leaf Node Assignment: Leaf nodes are assigned class labels or regression values
based on the majority class or mean value of the target variable within each
respective subset.

Prediction: To make predictions, the algorithm traverses the decision tree by

evaluating feature values and ultimately reaching a leaf node to obtain the final
prediction.Easy to comprehend and interpret accommodates numerical and
categorical data handles missing values gracefully captures nonlinear
relationships effectively. Prone to over fitting, necessitating proper
regularization techniques - Can be sensitive to changes in the dataset, leading
to instability. Exhibits bias towards features with high cardinality or many levels
In conclusion, decision trees offer versatility and transparency in model
interpretation. However, caution must be exercised to address overfitting issues
and effectively manage the algorithm's limitations.

Result
The goal was to predict whether a transaction is a legal transaction or a
fraudulent transaction, this falls under the scope of a classification
problem. We intend to deploy Supervised Machine Learning models in
order to achieve the highest prediction accuracy.K Nearest Neighbor,
Logistic Regression, Support Vector Machine, Decision Tree and
Random Forest models were trained using k-fold technique, training
contained total 5 folds and with each fold accuracy of the model kept
increasing up to 5th fold. After the 5th fold, accuracy started decreasing
because our dataset was not sufficient enough for more than 5 folds.
So, the final model was trained on 5 folds with 88.55% average
accuracy. This means that if someone would train Random Forest with
a bigger data set using the k-fold technique then the average accuracy
of the model would be even higher.

As a result, the Decision Tree model had the greatest prediction

accuracy of 99.92% and recall of 86.96%
Due to huge amount of data models for Support Vector Machine and
Random Forest were unable to compile, even on Google Collab. Further
work can be done by under sampling of data by 50:50, that would
reduce data size even more and as a result SVM and Random Forest
results can be compiled accurately.Initial results, Final results could
not be compiled due to insufficient computing power.

References

1. Design and development of financial fraud detection using

machine learning. (2024). International Journal of Emerging
Trends in Engineering Research, 8(9), 5838–
5843. https://doi.org/10.30534/ijeter/2020/152892020

2. Rucco, M., Giannini, F., Lupinetti, K., & Monti, M. (2019). A

methodology for part classification with supervised machine
learning. Artificial Intelligence for Engineering Design, Analysis
and Manufacturing, 33(1), 100–
113. https://doi.org/10.1017/S0890060418000197

3. Saarikoski, J., Joutsijoki, H., Järvelin, K., Laurikkala, J., & Juhola,
M. (2015). On the influence of training data quality on text
document classification using machine learning methods.
International Journal of Knowledge Engineering and Data Mining,
3(2), 143. https://doi.org/10.1504/IJKEDM.2015.071284

Final Synopsis Fraud Detection
No ratings yet
Final Synopsis Fraud Detection
15 pages
ML for Online Payment Fraud Detection
No ratings yet
ML for Online Payment Fraud Detection
8 pages
New Synopsis
No ratings yet
New Synopsis
18 pages
Fraud Detection Synopsis
No ratings yet
Fraud Detection Synopsis
14 pages
Online Fraud Report
No ratings yet
Online Fraud Report
15 pages
Final Research Paper
No ratings yet
Final Research Paper
8 pages
Mini Project
No ratings yet
Mini Project
3 pages
HR Template
No ratings yet
HR Template
6 pages
Online Payment Fraud Detection
No ratings yet
Online Payment Fraud Detection
24 pages
JETIR2404299
No ratings yet
JETIR2404299
9 pages
Mini Project
No ratings yet
Mini Project
23 pages
Fraud Detection
No ratings yet
Fraud Detection
4 pages
FRAUD DETECTION 2 - Formatted Paper
No ratings yet
FRAUD DETECTION 2 - Formatted Paper
8 pages
Synopsis FinalFINAL
No ratings yet
Synopsis FinalFINAL
4 pages
Domaine Des Transactions D'argent Mobile
No ratings yet
Domaine Des Transactions D'argent Mobile
6 pages
FDS Project Report
No ratings yet
FDS Project Report
7 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Fraud Detection in Digital Payment Systems
No ratings yet
Fraud Detection in Digital Payment Systems
3 pages
Financial Fraud Detection Using Machine Learning
No ratings yet
Financial Fraud Detection Using Machine Learning
9 pages
Mlproject
No ratings yet
Mlproject
8 pages
191 - 197 - Detection of Transaction Fraud Using Deep Learning
No ratings yet
191 - 197 - Detection of Transaction Fraud Using Deep Learning
28 pages
Final Year Project
No ratings yet
Final Year Project
27 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
46 pages
Fraud Detection with Machine Learning
No ratings yet
Fraud Detection with Machine Learning
8 pages
Predictive Analytics for Fraud Detection
No ratings yet
Predictive Analytics for Fraud Detection
5 pages
Report
No ratings yet
Report
14 pages
Online Transaction Fraud Detection System Based On Machine Learning
No ratings yet
Online Transaction Fraud Detection System Based On Machine Learning
4 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
6 pages
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
No ratings yet
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
22 pages
Fraud Detection Project Report
No ratings yet
Fraud Detection Project Report
4 pages
ONLINE PAYMENT FRAUD DETECTION USING MACHINE LEARNING MODEL - Key
No ratings yet
ONLINE PAYMENT FRAUD DETECTION USING MACHINE LEARNING MODEL - Key
12 pages
Phase-2 For DS
No ratings yet
Phase-2 For DS
13 pages
PPT Dự án cuối kỳ nhóm 8
No ratings yet
PPT Dự án cuối kỳ nhóm 8
38 pages
Credit Card Fraud 1.4% Positive Class
No ratings yet
Credit Card Fraud 1.4% Positive Class
17 pages
Script KHDL
No ratings yet
Script KHDL
4 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Nityananda Vyawhare 2223216 Case Study 5
No ratings yet
Nityananda Vyawhare 2223216 Case Study 5
5 pages
Online Transaction Fraud Detection
No ratings yet
Online Transaction Fraud Detection
161 pages
Online Fraud Detection
No ratings yet
Online Fraud Detection
24 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
Fraud Detection
No ratings yet
Fraud Detection
19 pages
Phase 5
No ratings yet
Phase 5
10 pages
Project Zero
No ratings yet
Project Zero
15 pages
Phase 5 Fraud Detection in Financial Transactions
No ratings yet
Phase 5 Fraud Detection in Financial Transactions
17 pages
Res Ayu
No ratings yet
Res Ayu
16 pages
Major Project Report
No ratings yet
Major Project Report
11 pages
Ayu Reschs
No ratings yet
Ayu Reschs
15 pages
Online Transactions Fraud Detection Using Machine Learning
No ratings yet
Online Transactions Fraud Detection Using Machine Learning
4 pages
Online Payment Fraud Detection Study
No ratings yet
Online Payment Fraud Detection Study
5 pages
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
11 pages
Irjet V8i4692
No ratings yet
Irjet V8i4692
4 pages
Real-Time Fraud Detection System
No ratings yet
Real-Time Fraud Detection System
3 pages
Financial Fraud Detection Methods
No ratings yet
Financial Fraud Detection Methods
6 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
3 pages
Credit Card Fraud Detection Proposal Redone
No ratings yet
Credit Card Fraud Detection Proposal Redone
5 pages
DS 1
No ratings yet
DS 1
9 pages
B17 Discrete Report
No ratings yet
B17 Discrete Report
16 pages
Kurdisent: A Corpus For Kurdish Sentiment Analysis: Language Resources and Evaluation January 2024
No ratings yet
Kurdisent: A Corpus For Kurdish Sentiment Analysis: Language Resources and Evaluation January 2024
21 pages
Patient Mortality Prediction Using Machine Learning and Artificial
No ratings yet
Patient Mortality Prediction Using Machine Learning and Artificial
7 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
Enhancing Android Malware Detection Throught Ensemble Stakcking
No ratings yet
Enhancing Android Malware Detection Throught Ensemble Stakcking
11 pages
Deep Learning-Based Platform For Prediction of Loss of Ambulation (LOA) in Parkinson Disease
No ratings yet
Deep Learning-Based Platform For Prediction of Loss of Ambulation (LOA) in Parkinson Disease
6 pages
Kumar Shivam CV
No ratings yet
Kumar Shivam CV
1 page
Rosenthal CoverLetter IMF March2021
No ratings yet
Rosenthal CoverLetter IMF March2021
1 page
Applied Data Science Questions
No ratings yet
Applied Data Science Questions
15 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Presentation Salaid
No ratings yet
Presentation Salaid
21 pages
Association Rule Learning Explained
No ratings yet
Association Rule Learning Explained
35 pages
Android Malware Detection
No ratings yet
Android Malware Detection
17 pages
PABF End Term
No ratings yet
PABF End Term
7 pages
Random Forest
No ratings yet
Random Forest
225 pages
Autism Detection Using AI & ML
No ratings yet
Autism Detection Using AI & ML
52 pages
AI Microdegree Program Overview
No ratings yet
AI Microdegree Program Overview
17 pages
Modeling of Land Use and Land Cover Changes Using Google Earth Engine and Machine Learning Approach: Implications For Landscape Management
No ratings yet
Modeling of Land Use and Land Cover Changes Using Google Earth Engine and Machine Learning Approach: Implications For Landscape Management
16 pages
Synopsis (Heart Disease Prediction)
No ratings yet
Synopsis (Heart Disease Prediction)
7 pages
Comprehensive Improvement of Energy Efficiency and Indoor Environmental Quality For University Library Atrium-A Multi-Objective Fast Optimization Framework
No ratings yet
Comprehensive Improvement of Energy Efficiency and Indoor Environmental Quality For University Library Atrium-A Multi-Objective Fast Optimization Framework
22 pages
Ajay Kumar Yadav Ongc 2
No ratings yet
Ajay Kumar Yadav Ongc 2
23 pages
Heart Failure Prediction via ML
No ratings yet
Heart Failure Prediction via ML
11 pages
Summer Training Report ML
79% (14)
Summer Training Report ML
48 pages
NeurIPS 2018 Information Constraints On Auto Encoding Variational Bayes Paper
No ratings yet
NeurIPS 2018 Information Constraints On Auto Encoding Variational Bayes Paper
12 pages
Intelligent Honeypot-Based IDS For Cyber Attack de
No ratings yet
Intelligent Honeypot-Based IDS For Cyber Attack de
20 pages
Int 354
No ratings yet
Int 354
4 pages
11 (1) Merged
No ratings yet
11 (1) Merged
12 pages
Nordin 2023
No ratings yet
Nordin 2023
12 pages
Abstract Review-01: Under Esteemed Guidance of Submitted by M Sandeep (20KT1A0597)
No ratings yet
Abstract Review-01: Under Esteemed Guidance of Submitted by M Sandeep (20KT1A0597)
27 pages
02 - Bharghav Fake News Detection
No ratings yet
02 - Bharghav Fake News Detection
49 pages
Final PPT Heart Disease
67% (3)
Final PPT Heart Disease
23 pages

Synopsis ML Projectpdf

Uploaded by

Synopsis ML Projectpdf

Uploaded by

A PROJECT REPORT

Yadvendra Singh Rathaur -(23SCSE2030453) (MCA sec-04)

Monika Gupta -(23SCSE2030438) (MCA sec-04)

● The Problem Statement

Online transaction fraud is a simple and easy target. E-commerce and

The 11 columns of the dataset and what each column represents:

1. step: represents a unit of time where 1 step equals 1 hour

2. type: type of online transaction

3. amount: the amount of the transaction

4. nameOrig: customer starting the transaction

6. newbalanceOrig: balance after the transaction

7. nameDest: recipient of the transaction

8. oldbalanceDest: initial balance of recipient before the transaction

9. newbalanceDest: the new balance of recipient after the transaction

10. isFraud: fraud transaction

11. isFlaggedFraud — transfer of more than 200,000 in a single

In online payment fraud detection, statistics are instrumental in

and facilitating communication between different stakeholders.

make predictions on the test set, using the predict function.

Algorithm Used: The decision tree algorithm is a widely used supervised

Tree Construction: The algorithm commences by considering the entire dataset

Prediction: To make predictions, the algorithm traverses the decision tree by

As a result, the Decision Tree model had the greatest prediction

1. Design and development of financial fraud detection using

2. Rucco, M., Giannini, F., Lupinetti, K., & Monti, M. (2019). A

You might also like