PySATL Change point detection subproject (abbreviated pysatl-cpd) is a module, designed for detecting anomalies in time series data, which refer to significant deviations from expected patterns or trends. Anomalies can indicate unusual events or changes in a system, making them crucial for monitoring and analysis in various fields such as finance, healthcare, and network security.
At the moment, the module implements the following CPD algorithms:
- Bayesian algorithm (scrubbing, online and linear heuristic online versions)
- Density based algorithms:
- KLIEP
- RuLSIF
- Graph algorithm
- k-NN based algorithm
- Algorithms, based on classifiers:
- SVM
- KNN
- Decision Tree
- Logistic Regression
- Random Forest
-
Python 3.10+
-
Poetry 2.1.0+
Clone repository:
git clone https://github.com/PySATL/pysatl-cpd.gitGo to repository directory and run:
cd pysatl-cpd poetry installpy --version poetry --versionIf python not in the PATH, reinstall it and make sure that you checked the box "Add Python to PATH"
If poetry not in the PATH, add it by running:
setx PATH "%PATH%;%APPDATA%\Python\Scripts"After that, do not forget to restart the terminal.
Go to repository folder and run:
Set-Location pysatl-cpd poetry installfrom pathlib import Path from pysatl_cpd.labeled_data import LabeledCpdData # import change point detection solver from pysatl_cpd.online_cpd_solver import OnlineCpdSolver from pysatl_cpd.core.problem import CpdProblem # import algorithm from pysatl_cpd.core.algorithms.bayesian_online_algorithm import BayesianOnline from pysatl_cpd.core.algorithms.bayesian.likelihoods.gaussian_conjugate import GaussianConjugate from pysatl_cpd.core.algorithms.bayesian.hazards.constant import ConstantHazard from pysatl_cpd.core.algorithms.bayesian.detectors.threshold import ThresholdDetector from pysatl_cpd.core.algorithms.bayesian.localizers.argmax import ArgmaxLocalizer labeled_data = LabeledCpdData.generate_cp_datasets(Path("examples/configs/test_config_exp.yml"))["example"] # specify CPD algorithm with parameters algorithm = BayesianOnline( learning_sample_size=5, likelihood=GaussianConjugate(), hazard=ConstantHazard(rate=1.0 / (1.0 - 0.5 ** (1.0 / 500))), detector=ThresholdDetector(threshold=0.005), localizer=ArgmaxLocalizer(), ) # make a solver object solver = OnlineCpdSolver(CpdProblem(True), algorithm, labeled_data) # then run algorithm cpd_results = solver.run() # print the results print(cpd_results) # output: # Located change points: (200;400) # Expected change point: (200;400) # Difference: () # Computation time (sec): 0.2 # visualize data with located changepoints cpd_results.visualize()from pathlib import Path from benchmarking.pipeline.pipeline import Pipeline from benchmarking.steps.data_generation_step.data_generation_step import DataGenerationStep from benchmarking.steps.data_generation_step.data_handlers.generators.cpd_generator import CpdGenerator from benchmarking.steps.experiment_execution_step.experiment_execution_step import ExperimentExecutionStep from benchmarking.steps.experiment_execution_step.workers.run_complete_algorithm_worker import ( RunCompleteAlgorithmWorker, ) from benchmarking.steps.report_generation_step.report_builders.change_point_builder import CpBuilder from benchmarking.steps.report_generation_step.report_generation_step import ReportGenerationStep from benchmarking.steps.report_generation_step.report_visualizers.change_point_text_visualizer import CpTextVisualizer from benchmarking.steps.report_generation_step.reporters.reporter import Reporter from pysatl_cpd.core.algorithms.bayesian.detectors.threshold import ThresholdDetector from pysatl_cpd.core.algorithms.bayesian.hazards.constant import ConstantHazard from pysatl_cpd.core.algorithms.bayesian.likelihoods.heuristic_gaussian_vs_exponential import ( HeuristicGaussianVsExponential, ) from pysatl_cpd.core.algorithms.bayesian.localizers.argmax import ArgmaxLocalizer from pysatl_cpd.core.algorithms.bayesian_algorithm import BayesianAlgorithm # Generate data with example config and save as my_experiment_dataset generator = CpdGenerator( name="cpd_generator", output_storage_names={"example"}, config=Path("examples/configs/test_config_exp.yml") ) step_1 = DataGenerationStep( data_handler=generator, name="cpd_generation_test_config_exp_step", output_storage_names={"example": "my_experiment_dataset"}, ) # Initialize BayesianAlgorithm and run with generated data algorithm = BayesianAlgorithm( learning_steps=5, likelihood=HeuristicGaussianVsExponential(), hazard=ConstantHazard(rate=1.0 / (1.0 - 0.5 ** (1.0 / 500))), detector=ThresholdDetector(threshold=0.005), localizer=ArgmaxLocalizer(), ) algo_worker = RunCompleteAlgorithmWorker(algorithm=algorithm, name="run_bayesian_algorithm_worker") step_2 = ExperimentExecutionStep( worker=algo_worker, name="run_bayesian_algorithm_step", input_storage_names={"my_experiment_dataset": "dataset"} ) # Generate text report with change points from Result Storage builder = CpBuilder() visualizer = CpTextVisualizer(file_name="my_experiment_change_points_report") reporter = Reporter(builder, visualizer, name="text_reporter") step_3 = ReportGenerationStep(reporter, name="ReportGeneration", input_storage_names={"change_points"}) # configure pipeline and start the experiment steps = [step_1, step_2, step_3] pipeline = Pipeline(steps) pipeline.run()output in results/my_experiment_change_points_report.txt:
Located change points: [25, 201, 396]If you want to contribute, you should create development environment as follows:
poetry install --with-devInstall pre-commit hooks:
poetry run pre-commit installStarting manually:
poetry run pre-commit run --all-files --color always --verbose --show-diff-on-failureThis project is licensed under the terms of the MIT license. See the LICENSE for more information.
