ATL: Autonomous Knowledge Transfer from Many Streaming Processes
@inproceedings{10.1145/3357384.3357948, author = {Pratama, Mahardhika and de Carvalho, Marcus and Xie, Renchunzi and Lughofer, Edwin and Lu, Jie}, title = {ATL: Autonomous Knowledge Transfer from Many Streaming Processes}, year = {2019}, isbn = {9781450369763}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3357384.3357948}, doi = {10.1145/3357384.3357948}, booktitle = {Proceedings of the 28th ACM International Conference on Information and Knowledge Management}, pages = {269–278}, numpages = {10}, keywords = {concept drif, transfer learning, deep learning, multistream learning}, location = {Beijing, China}, series = {CIKM ’19} } If you want to see the original code used for this paper, access ATL_Matlab
ATL_Python is a reconstruction of ATL_Matlab made by the same author, but using Python 3.6 and PyTorch (with autograd enabled and GPU support).
ATL: Autonomous Knowledge Transfer From Many Streaming Processes ACM CIKM 2019
-
Clone
ATL_Pythongit to your computer, or just download the files. -
Open Anaconda prompt and travel until ATL folder.
-
Run the following command
conda env create -f environment.yml. This will create an environment calledatlwith every python packaged/library needed to run ATL. -
Enable ATL environment by running the command
activate atlorconda activate atl. -
Provide a dataset by replacing the file
data.csvThe currentdata.csvholds SEA dataset.data.csvmust be prepared as following:
- Each row presents a new data sample - Each column presents a data feature - The last column presents the label for that sample. Don't use one-hot encoding. Use a format from 1 onwards - Run
python ATL.py
ATL will automatically normalize your data and split your data into 2 streams (Source and Target data streams) with a bias between them, as described in the paper.
ATL statues are printed at the end of every minibatch, where you will be able to follow useful information as:
- Training time (maximum, mean, minimum, current and accumulated) - Testing time (maximum, mean, minimum, current and accumulated) - Classification Rate for the Source (maximum, mean, minimum and current) - Classification Rate for the Target (maximum, mean, minimum and current) - Classification Loss for the Source (maximum, mean, minimum and current) - Classification Loss for the Target (maximum, mean, minimum and current) - Reconstruction Loss for the Source (maximum, mean, minimum and current) - Reconstruction Loss for the Target (maximum, mean, minimum and current) - Kullback-Leibler Loss (maximum, mean, minimum and current) - Number of nodes (maximum, mean, minimum and current) - And a quick review of ATL structure (both discriminative and generative phases), where you can see how many automatically generated nodes were created. At the end of the process, ATL will plot 6 graphs:
- The processing time per mini-batch and the total processing time as well, both for training and testing - The evolution of nodes over time - The target and source classification rate evolution, as well as the final mean accuracy of the network - The number of GMMs on Source AGMM and Target AGMM - Losess for the source and target classification as well as source and target reconstruction - Bias and Variance of the discriminative phase - Bias and Variance of the generative phase Thank you.
As some datasets are too big, we can't upload them to GitHub. GitHub has a size limit of 35MB per file. Because of that, you can find all the datasets in a csv format on the anonymous link below. To test it, copy the desired dataset to the same folder as ATL and rename it to data.csv.