Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Based Model By Gedela Pradeep Under The Guidance Of PG212206010 Sri P.Venkata Rao Sir M.Sc(CS)- 4th sem Associate Professor By N. Naveen kumar Under The Guidence Of PG-212202042 Sri P.Venkata Rao Sir MCA -4th sem Associate Professor
CONTENTS • 1. Abstract • 2.Existing system • 3. Proposed system • 4. Hardware and software requirement • 5.Algorithm and examples • 6.Block diagram • 7.Conclusion
ABSTRACT Drinking water fraud is a major issue for water supply businesses and authorities. This conduct generates a very high percentage of non-technical losses and causes large revenue losses. Developing efficient methods for identifying fake jobs become a viable area of research in recent years. Water supply companies can identify These fraudulent operations to reduce losses by using clever data mining tools .This study examines the use of two split strategies Decision Tree and Bayesian Classification are used to look for questionable water customers. The client loading profile attributes used by the Decision Tree-based technique are used to show abnormal behaviour that are known to be linked to non- technical losses.
EXISTING SYSTEM Water fraud causes significant losses for water supply corporations. The first two categories are really network water transfer and network washout difficulties in the manufacturing system, which are all related to technology loss (TL). The amount of water given to consumers but not charged results in non-technical loss (NTL), which causes a loss of income. This study examines the use of two spilt strategies Decision Tree and Baysian classifications.
PROPOSED SYSTEM This project focuses on customer history data, and its major goal is to employ the well-known data mining techniques Decision Tree (DT) and Bayesian Classifier to create an adequate model for identifying suspect fraudulent consumers based on how they use water history metres. - This study's execution was contracted out to CRSP-DM .
HARDWARE REQUIREMENTS System : Intel Core 2 Duo. Hard Disk : 500 GB. Monitor : 15’’ LED Input Devices : Keyboard, Mouse Ram : 4GB.
SOFTWARE REQUIREMENTS Operating System : Windows 7/8/10 Server side Script : Python
DECISION TREE: Decision Tree is a supervised learning method used in data mining for classification and regression methods. It is a tree that helps us in decision-making purposes. The decision tree creates classification or regression models as a tree structure. It separates a data set into smaller subsets, and at the same time, the decision tree is steadily developed. The final tree is a tree with the decision nodes and leaf nodes. A decision node has at least two branches. The leaf nodes show a classification or decision. We can't accomplish more split on leaf nodes-The uppermost decision node in a tree that relates to the best predictor called the root node. Decision trees can deal with both categorical and numerical data.
•Step-1: Begin the tree with the root node, says C, which contains the complete dataset. •Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM). •Step-3: Divide the C into subsets that contains possible values for the best attributes. •Step-4: Generate the decision tree node, which contains the best attribute. •Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node. ALGORITHM
•Decision Tree Example: Consider the given example of a factory where • Expanding factor costs $3 million, the probability of a good economy is 0.6 (60%), which leads to $8 million profit, and the probability of a bad economy is 0.4 (40%), which leads to $6 million profit. • Not expanding factor with 0$ cost, the probability of a good economy is 0.6(60%), which leads to $4 million profit, and the probability of a bad economy is 0.4, which leads to $2 million profit.
NAVIE BAYS: • Navie Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.
EXAMPLE • Naïve Bayes' Classifier: • Naïve Bayes' Classifier can be understood with the help of the below example: • Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps: 1.Convert the given dataset into frequency tables. 2.Generate Likelihood table by finding the probabilities of given features. 3.Now, use Bayes theorem to calculate the posterior probability. • Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Applying Bayes'theorem: P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny) P(Sunny|Yes)= 3/10= 0.3 P(Sunny)= 0.35 P(Yes)=0.71 So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60 P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny) P(Sunny|NO)= 2/4=0.5 P(No)= 0.29 P(Sunny)= 0.35 So P(No|Sunny)= 0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny) Hence on a Sunny day, Player can play the game.
BLOCK DIAGRAM
CONCLUSION To reduce the fradulent behaviour in water consumption the Decision tree and bayesian classification models helps. These data mining models has many advantages compared to other methods it increase profits. To reduce fradulent behaviour we used this data mining technique.
THANK YOU

Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Based Model

  • 1.
    Detection Of FraudlentBehavior In Water Consumption Using A Data Mining Based Model By Gedela Pradeep Under The Guidance Of PG212206010 Sri P.Venkata Rao Sir M.Sc(CS)- 4th sem Associate Professor By N. Naveen kumar Under The Guidence Of PG-212202042 Sri P.Venkata Rao Sir MCA -4th sem Associate Professor
  • 2.
    CONTENTS • 1. Abstract •2.Existing system • 3. Proposed system • 4. Hardware and software requirement • 5.Algorithm and examples • 6.Block diagram • 7.Conclusion
  • 3.
    ABSTRACT Drinking water fraudis a major issue for water supply businesses and authorities. This conduct generates a very high percentage of non-technical losses and causes large revenue losses. Developing efficient methods for identifying fake jobs become a viable area of research in recent years. Water supply companies can identify These fraudulent operations to reduce losses by using clever data mining tools .This study examines the use of two split strategies Decision Tree and Bayesian Classification are used to look for questionable water customers. The client loading profile attributes used by the Decision Tree-based technique are used to show abnormal behaviour that are known to be linked to non- technical losses.
  • 4.
    EXISTING SYSTEM Water fraudcauses significant losses for water supply corporations. The first two categories are really network water transfer and network washout difficulties in the manufacturing system, which are all related to technology loss (TL). The amount of water given to consumers but not charged results in non-technical loss (NTL), which causes a loss of income. This study examines the use of two spilt strategies Decision Tree and Baysian classifications.
  • 5.
    PROPOSED SYSTEM This projectfocuses on customer history data, and its major goal is to employ the well-known data mining techniques Decision Tree (DT) and Bayesian Classifier to create an adequate model for identifying suspect fraudulent consumers based on how they use water history metres. - This study's execution was contracted out to CRSP-DM .
  • 6.
    HARDWARE REQUIREMENTS System :Intel Core 2 Duo. Hard Disk : 500 GB. Monitor : 15’’ LED Input Devices : Keyboard, Mouse Ram : 4GB.
  • 7.
    SOFTWARE REQUIREMENTS Operating System: Windows 7/8/10 Server side Script : Python
  • 8.
    DECISION TREE: Decision Treeis a supervised learning method used in data mining for classification and regression methods. It is a tree that helps us in decision-making purposes. The decision tree creates classification or regression models as a tree structure. It separates a data set into smaller subsets, and at the same time, the decision tree is steadily developed. The final tree is a tree with the decision nodes and leaf nodes. A decision node has at least two branches. The leaf nodes show a classification or decision. We can't accomplish more split on leaf nodes-The uppermost decision node in a tree that relates to the best predictor called the root node. Decision trees can deal with both categorical and numerical data.
  • 9.
    •Step-1: Begin thetree with the root node, says C, which contains the complete dataset. •Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM). •Step-3: Divide the C into subsets that contains possible values for the best attributes. •Step-4: Generate the decision tree node, which contains the best attribute. •Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node. ALGORITHM
  • 10.
    •Decision Tree Example: Considerthe given example of a factory where • Expanding factor costs $3 million, the probability of a good economy is 0.6 (60%), which leads to $8 million profit, and the probability of a bad economy is 0.4 (40%), which leads to $6 million profit. • Not expanding factor with 0$ cost, the probability of a good economy is 0.6(60%), which leads to $4 million profit, and the probability of a bad economy is 0.4, which leads to $2 million profit.
  • 11.
    NAVIE BAYS: • NavieBayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.
  • 12.
    EXAMPLE • Naïve Bayes'Classifier: • Naïve Bayes' Classifier can be understood with the help of the below example: • Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps: 1.Convert the given dataset into frequency tables. 2.Generate Likelihood table by finding the probabilities of given features. 3.Now, use Bayes theorem to calculate the posterior probability. • Problem: If the weather is sunny, then the Player should play or not?
  • 13.
    Solution: To solvethis, first consider the below dataset:
  • 14.
    Applying Bayes'theorem: P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny) P(Sunny|Yes)=3/10= 0.3 P(Sunny)= 0.35 P(Yes)=0.71 So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60 P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny) P(Sunny|NO)= 2/4=0.5 P(No)= 0.29 P(Sunny)= 0.35 So P(No|Sunny)= 0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny) Hence on a Sunny day, Player can play the game.
  • 15.
  • 16.
    CONCLUSION To reduce thefradulent behaviour in water consumption the Decision tree and bayesian classification models helps. These data mining models has many advantages compared to other methods it increase profits. To reduce fradulent behaviour we used this data mining technique.
  • 17.