Anomaly detection using deep one class classifier

Anomaly Detection using Deep One-Class Classifier Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018

Anomaly Detection and Localization Using GAN and One-Class Classifier Satellite Image Forgery Detection and Localization Using GAN and One-Class Classifier https://arxiv.org/abs/1802.04881 Previous Approach I

Anomaly Detection • 정상치에서 벗어난 관측치들을 detect  One-class classification 혹은 one-class description 여기서는 • Generative adversarial network 또는 Auto-encoder를 이용하여 정상 image에 대한 feature를 mapping한 후 one-class support vector machine (SVM)으로 분포를 결정. Query image에 대하여 결정된 분 포내에 존재하는지 여부 확인

Problem formulation • 학습된 image외에 unseen or unfamiliar object가 발견될 경우, 그 림과 같이 binary mask로 영역을 표시 Trained Image Trained Image mask Query Image w/ unfamiliar object Query Image mask w/ unfamiliar object

Method 𝐴 𝑒 X h 𝐴 𝑑 𝑋 X min 𝐺 max 𝐷 𝑉(𝐷, 𝐺) = 𝐸 𝑋~𝑝 𝑑𝑎𝑡𝑎 log 𝐷 𝑋 + log(1 − 𝐷 𝐺 𝑋 ) 𝑋 = 𝐺 𝑋 = 𝐴 𝑑 ℎ = 𝐴 𝑑 𝐴 𝑒(𝑋) • Auto-encoder를 이용하여 image로부터 feature(h) 구하고 이를 다시 복원. 복원된 image와 원 image를 이용하여 GAN을 훈련  Auto- encoder 보다 약간의 성능향상 • 정상 image에 대한 latent space의 distribution을 찾아 냄.

Method Normal image의 cluster Abnormal image의 features - Training된 Auto-encode의 Encoder에 Query image를 입력하여 latent vector를 계산 - 계산 된 latent vector가 정상 image의 cluster내에 포함되는지 여부 판단  여기서는 RADIAL BASES FUNCTIONS(Gauss Kernel, Parametric modeling of Cluster) 을 사용한 One class SVM을 사용 Features from normal patches(i.e., red dots) cluster together, whereas features from abnormal patches (i.e., blue dots) are more distant.

we solve the problem of classifying nonlinearly separable pattern in a hybrid manner involving two stages: • First: Transform a given set of nonlinearly separable patterns into a new set for which, under certain conditions, the likelihood of the transformed patterns becoming linearly separable is high. • Second: the solution of the classification problem is completed by using Stochastic Gradient Descent. Non-linear SVM Classifier using the RBF(Radial-basis function) kernel

We find w and b by solving the following objective function using Quadratic Programming. To define an optimal hyperplane we need to maximize the width of the margin(w). Linear SVM(Support Vector Machines) Support vector

• The simplest way to separate two groups of data is with a straight line (1 dimension), flat plane (2 dimensions) or an N-dimensional hyperplane. • However, there are situations where a nonlinear region can separate the groups more efficiently. • The kernel function transform the data into a higher dimensional feature space to make it possible to perform the linear separation. Non-Linear SVM(Support Vector Machines) kernel trick

To Map from input space to feature space to simplify classification task Non-linear SVM Classifier using the RBF(Radial-basis function) kernel is adopted Non-Linear SVM(Support Vector Machines) Feature space에서의 inner product(a measure of similarity)

Key Idea of Kernel Methods K(𝑥𝑖, 𝑥𝑗) K(𝑥𝑖, 𝑥𝑗) = Φ(𝑥𝑖)· Φ(𝑥𝑗)

Normal Condition : Cluster bound : exp{− [ 𝑥1−𝑐1 2+ 𝑥2−𝑐2 2] 2𝜎2 } ≥ {0<Threshold<<1} 𝑥1 − 𝑐1 2 + 𝑥2 − 𝑐2 2 ≤ r2 𝐾1 + 𝐾2 ≤ r2 x1 x2 .(c1,c2) r K1 K2 r2 r2 Key Idea of Kernel Methods

RBFN architecture Σ Input layer Hidden layer (RBFs) Output layer W1 W2 WM x1 x2 xn No weight f(x) Each of n components of the input vector x feeds forward to m basis functions whose outputs are linearly combined with weights w (i.e. dot product x∙w) into the network output f(x). The output layer performs a simple weighted sum (i.e. w ∙x). If the RBFN is used for regression then this output is fine. However, if pattern classification is required, then a hard- limiter or sigmoid function could be placed on the output neurons to give 0/1 output values Input data set ∶ 𝑋 = { 𝑥1 𝑥2 … 𝑥 𝑁}

RBFN architecture  For Gaussian basis functions  s x w w x c w w x c p i i p i i M i pj ij ijj n i M ( ) exp ( )                    0 1 0 2 2 11 2    Assume the variance  across each dimension are equal s x w w x cp i i pj ij j n i M ( ) exp ( )            0 2 2 11 1 2 → → → →

Σ Σ Category 1 Category 2 Category 1 Category 2 RBFN for classification

RBFN Learning • Design decision • number of hidden neurons • max of neurons = number of input patterns • more neurons – more complex, smaller tolerance • Parameters to be learnt • centers • radii • A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the radius. • smaller radius  fits training data better (overfitting) • larger radius  less sensitivity, less overfitting, network of smaller size, faster execution • weights between hidden and output layers

The question now is: How to train the RBF network? In other words, how to find:  The number and the parameters of hidden units (the basis functions) using unlabeled data (unsupervised learning).  K-Mean Clustering Algorithm  The weights between the hidden layer and the output layer.  Recursive Least-Squares Estimation Algorithm RBFN Learning

xp K-means K-Nearest Neighbor Basis Functions Linear Regression ci ci i A w RBFN Learning

 Use the K-mean algorithm to find ci RBFN Learning

K-mean Algorithm step1: K initial clusters are chosen randomly from the samples to form K groups. step2: Each new sample is added to the group whose mean is the closest to this sample. step3: Adjust the mean of the group to take account of the new points. step4: Repeat step2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance.

Outcome: There are K clusters with means representing the centroid of each clusters. Advantages: (1) A fast and simple algorithm. (2) Reduce the effects of noisy samples.

 Use K nearest neighbor rule to find the function width  k-th nearest neighbor of ci  The objective is to cover the training points so that a smooth fit of the training samples can be achieved 2 1 1   K k iki cc K → →

 RBF learning by gradient descent  Let andi p pj ij ijj n p p px x c e x d x s x( ) exp ( ) ( ) ( )                   1 2 2 2 1   E e xp p N    1 2 1 2 ( ) .  we have       E w E E ci ij ij , , and Apply → → → → → N : No. of batch

we have the following update equations  RBF learning by gradient descent

Gaussian Mixture Models and Expectation-Maximization Algorithm

28 Normal Distribution (1D Gaussian)   2 2 1 ( , ) exp 22 x f x               ,mean 2 ,std

29  d = 2  x = random data point (2D vector)  = mean value (2D vector)  = covariance matrix (2D matrix) 2D Gaussians      1 1 ( , ) exp 22 det( ) T d x x f x                     The same equation holds for a 3D Gaussian

30 2D Gaussians      1 1 ( , ) exp 22 det( ) T d x x f x                   

31 Exploring Covariance Matrix    2 2 1 ( , ) cov( , )1 cov( , ) i i i N T w i i i h x random vector w h w h x x N h w                   is symmetric  has eigendecomposition (svd)    * * T V D V   1 2 ... d    

32 Covariance Matrix Geometry  1 2 * * 1* 2* T V D V a v b v       b a

33 3D Gaussians    2 2 1 2 ( , , ) cov( , ) cov( , ) 1 cov( , ) cov( , ) cov( , ) cov( , ) i rN T i i g i b x r g b g r b r x x r g b g N r b g b                       

34 GMMs – Gaussian Mixture Models W H  Suppose we have 1000 data points in 2D space (w,h)

35 W H GMMs – Gaussian Mixture Models  Assume each data point is normally distributed  Obviously, there are 5 sets of underlying gaussians

36 The GMM assumption  There are K components (Gaussians)  Each k is specified with three parameters: weight, mean, covariance matrix  The total density function is:      1 1 1 1 1 ( ) exp 22 det( ) { , , } 0 1 T K j j j j d j j K j j j j K j j j x x f x weight j                                  

37 The EM algorithm (Dempster, Laird and Rubin, 1977) Raw data GMMs (K = 6) Total Density Function i i

38 EM Basics  Objective: Given N data points, find maximum likelihood estimation of :  Algorithm: 1. Guess initial 2. Perform E step (expectation)  Based on , associate each data point with specific gaussian 3. Perform M step (maximization)  Based on data points clustering, maximize 4. Repeat 2-3 until convergence (~tens iterations)  1argmax ( ,..., )Nf x x       

39 EM Details  E-Step (estimate probability that point t associated to gaussian j):  M-Step (estimate new parameters): , 1 ( , ) 1,..., 1,..., ( , ) j t j j t j K i t i ii f x w j K t N f x          , 1 ,1 ,1 ,1 ,1 1 ( )( ) N new j t j t N t j tnew t j N t jt N new new T t j t j t jnew t j N t jt w N w x w w x x w                    

40 EM Example Gaussian j data point t blue: wt,j

RBF networks MLP Learning speed Very Fast Very Slow Convergence Almost guarantee Not guarantee Response time Slow Fast Memory requirement Very large Small Hardware implementation IBM ZISC036 Nestor Ni1000 www-5.ibm.com/fr/cdlab/zisc.html Voice Direct 364 www.sensoryinc.com Generalization Usually better Usually poorer Hyper-parameter ? Initial values are given !

Simulation • The color image under analysis is split into patches (either overlapping or not) of size 64x64 pixels. • A adversarially trained auto-encoder encodes the patches into a low dimensional representation called feature vector h(a 2,048 dimensional vector). • A one-class SVM fed with h is used to detect forged patches as anomalies with respect to features distribution learned from normal patches. • Once all patches are classified, a label mask for the entire image is obtained by grouping together all the patch labels.

• Small - Object size is smaller than the patch size (approximately 32 pixel per side). • Medium - Object size is comparable to patch size (approximately 64 pixel per side). • Large - Object size is larger than patch size (approximately128 pixel per side). Simulation 검출대상물의 크기에 따라 성능평가

Simulation Query Image I w/ unfamiliar object Query Image II w/ unfamiliar object GT mask I GT mask II

Unsupervised Anomaly Detection with GANs to Guide Marker Discovery https://arxiv.org/abs/1703.05921 Postech 이도엽씨가 구현한 Tensorflow 코드 https://github.com/LeeDoYup/AnoGAN Previous Approach II

이 연구에서는 아래 그림처럼 정상 data만으로 학습시킨 GAN 모델를 이용하여 Query data에 대하여 정상여부는 물론 비정상 시 비정상 영역을 찾아내고자 함.

1. 정상 data를 이용하여 Generator & Discriminator의 훈련 - Deep convolutional generative adversarial network을 이용하여 latent space(z)로 부터 Generator를 이용하여 생성된 image와 Real image를 구별하도록 Discriminator를 훈련  정상 data의 latent space(z) 분포를 학습 2. 비정상 data여부와 비정상 영역 파악 - 훈련된 Generator & Discriminator의 parameter를 고정한 채 Query image에 대한 latent space(z)로의 mapping 작업을 수행 훈련된 정상 data의 경우, 기학습된 정상 data의 latent space(z) 로 mapping이 되지만, 비정상 data의 경우 벗어남  cost function의 오차가 발생 Anomaly Detection은 다음과 같이 2단계로 이루어짐

1. GAN을 이용하여 정상 data 모델링하기 : 정상 data의 generative model(distribution)을 GAN을 이용하여 학습 정상 𝑑𝑎𝑡𝑎 𝐼 𝑚, with m = 1,2,.....,M, where 𝐼 𝑚 ∈ 𝑅 𝑎𝑥𝑏 임의의 위치에서 랜덤하게 cxc크기의 K 2-D image patches를 추출 x = 𝑥 𝑘,𝑚 ∈ ℵ with k = 1,2,……,K. D and G are simultaneously optimized through the following two- player minimax game with value function V (G,D) The discriminator is trained to maximize the probability of assigning real training examples the “real” and samples from 𝑝 𝑔the “fake” label

2. Query data의 latent space Mapping Query image x가 주어질 경우, 이와 가장 유사한 가상 image인 G(z) 에 해당하는 latent space상의 점 z을 찾는다. x 와 G(z)의 유사여부는 query image가 generator의 훈련시 사용된 정상 data의 분포 𝑝 𝑔를 어느 정도 따르느냐에 의해 결정 z을 찾기 위하여 , latent space distribution Z에서 랜덤하게 샘플된 z1 을 기훈련된 generator에 입력하여 얻은 출력 G(z1)와 x의 차(loss ft’n)를 최 소화하도록 backpropagation을 통하여 latent space의 점z2로 update

z 정상 image의 Latent space(z)가 1차원이라고 가정하고 Z은 다음과 같은 분포로 가정하면 𝜇 𝑧 z𝜇 𝑧 Query image에 대한 latent space(z) mapping은 i) 임의의 값 𝑧1에서 시작하여 loss ft’n을 최소화하도록 update ii) 주어진 Γ번째 iteration 후 𝑧Γ이 allowable range안에 들어왔는지 여부에 때라 정상, 비정상을 구분 𝑧1 𝑧2 𝑧Γ Allowable range

• Overall loss or Anomaly score: • Anomaly score consists of two parts: • Residual Loss - visual similarity • Discrimination Loss - enforces the generated image to lie on the manifold Query Image의 Mapping에 대한 Loss function 정의

Improved discrimination loss based on feature matching • f(.) – output of intermediate layer of the discriminator • It is some statistics of an input image This approach utilizes the trained discriminator not as classifier but as a feature extractor

3. Anomaly Detection Anomaly score : query image x가 정상 image에 얼마나 부합하는지 여부 R(x) : Γ번의 backpropagation후 Residual loss D(x) : Γ번의 backpropagation후 Discrimination Loss 비정상 image : A(x) is large 정상 image : A(x) is small 𝑥 𝑅 = 𝑥 − 𝐺 𝑧Γ Residual error : image내의 비정상 영역을 나타냄

4. Experiments 실험대상은 망막층을 3차원적으로 관측하는 빛간섭단층촬영(OCT) 영상 • Data, Data Selection and Preprocessing i) Training sets : - 2D image patches extracted from 270 clinical OCT volumes of healthy subjects - The gray values were normalized to range from -1 to 1. - Extracted in total 1,000,000 2D training patches with an image resolution of 64x64 pixels at randomly sampled positions.

ii) Testing sets : - patches were extracted from 10 additional healthy cases and 10 pathological cases, which contained retinal fluid - Test set in total consisted of 8,192 image patches and comprised normal and pathological samples

iii) Model description - Adopt DCGAN architecture that resulted in stable GAN training on images of sizes 64x64 pixels. - Utilized intermediate representations with 512-256-128-64 channels (instead of 1024-512-256-128) - Discrimination loss : Feature representations of the last convolution layer of the discriminator was used - Training was performed for 20 epochs utilizing Adam optimizer. - Ran 500 backpropagation steps for the mapping of new images to the latent space. - Used λ= 0.1 in loss function

i) Generative capability of the DCGAN 5. Experiments Given image Generated image Residual overlay Pixel-level annotations of retinal fluid Normal image Anomalous image

ii) Detection performance ROC curves Distribution of the residual score(c) and of the discrimination score(d) Latent space에서 정상 data(trained data 및 test data 중 정상)간의 분포는 유사하나 Test data 중 비정상과는 확실한 차이를 나타냄

Problems in Previous Approach - Can’t control the shape and boundary of cluster - Can’t control the ambiguous point at the boundary  Let’s find a way to control the shape of cluster and ambiguous point at the boundary

SVDD is the smallest enclosing ball problem and it’s alternatives are • The minimum enclosing ball problem with errors • The minimum enclosing ball problem in a RKHS(Repoducing Kernel Hilbert Spaces) • The two class Support vector data description (SVDD) Support Vector Data Description (SVDD)

• One class is the target class, and all other data is outlier data. • Create a spherically shaped boundary around the complete target set. • To minimize the chance of accepting outliers, the volume of this description is minimized. • Outlier sensitivity can be controlled by changing the ball-shaped boundary into a more flexible boundary. • Example outliers can be included into the training procedure to find a more efficient description. SOLUTIONS FOR SOLVING DATA DESCRIPTION

1. The minimum enclosing ball problem [Tax and Duin, 2004] centerRadius, R

2. The minimum enclosing ball problem with errors

- We assume vectors x are column vectors. - We have a training set {xi }, i = 1, . . , N for which we want to obtain a description. - We further assume that the data shows variances in all feature directions. NORMAL DATA DESCRIPTION • The sphere is characterized by center a and radius R > 0. • We minimize the volume of the sphere by minimizing R², and demand that the sphere contains all training objects xi. • To allow the possibility of outliers in the training set, the distance from xi to the center a should not be strictly smaller than R², but larger distances should be penalized. - Minimization problem: F(R, a) = R² + C∑ξi with constraints ||xi − a||² ≤ R² + ξi, ξi ≥ 0 2. The minimum enclosing ball problem with errors

NORMAL DATA DESCRIPTION Lagrange function : L(R, a, αi, γi, ξi ) = R² + C∑ξi − ∑αi {R² + ξi − (‖xi‖² − 2a · xi + ‖a‖²)} − ∑γi ξi L should be minimized with respect to R, c, ξi and maximized with respect to αi and γi: } With subject to: 0 ≤ αi ≤ C 2. The minimum enclosing ball problem with errors

2. The minimum enclosing ball problem with errors NORMAL DATA DESCRIPTION } Support vectors There are 3 cases 𝑅2 = 𝑋 𝑏 − 𝑎 2 = 𝑋 𝑏 ⋅ 𝑋 𝑏 - 2 𝑖 𝛼𝑖 (𝑋𝑖 ⋅ 𝑋 𝑏 ) + 𝑖,𝑗 𝛼𝑖 𝛼𝑗 (𝑋𝑖⋅ 𝑋𝑗) Hypersphere’s center can be determined as 𝑎 = 𝑖 𝛼𝑖 𝑿𝒊 Hypersphere’s radius can be determined by selecting an arbitrary support vector on the boundary 𝑋 𝑏

TEST A NEW DATA Xk To test if a new data Xk is within the sphere, the distance to the center of Sphere has to be calculated. A test data Xk is Normal when this distance is smaller than radius ||xk − a||² ≤ R2 2. The minimum enclosing ball problem with errors

2. The minimum enclosing ball problem with errors Please refer to Python Code for SVDD : https://wikidocs.net/3431

SVDD with negative examples - When negative examples (objects which should be rejected) are available, they can be incorporated in the training to improve the description. - In contrast with the training (target) examples which should be within the sphere, the negative examples should be outside it.  Minimization problem: With constraints: } 2. The minimum enclosing ball problem with errors

3. The minimum enclosing ball problem in a RKHS Gaussian kernel: With subject to: 0 ≤ αi ≤ C • Minimum enclosing ball problem with errors • Inner product can be substituted by a general kernel function like Gaussian kernel 𝑋 𝑘 − 𝑎 2 = K(𝑋 𝑘, 𝑋 𝑘) - 2 𝑖 𝛼𝑖 K(𝑋𝑖, 𝑋 𝑘) + 𝑖,𝑗 𝛼𝑖 𝛼𝑗 K(𝑋𝑖, 𝑋𝑗) ≤ 𝑅2

3. The minimum enclosing ball problem in a RKHS - For small values of s all objects become support vectors. Test object is selected when: - For very large s the solution approximates the original spherically shaped solution. - Decreasing the parameter C constraints the values for αi more, and more objects become support vectors. - Also with decreasing C the error on the target class increases, but the covered volume of the data description decreases.

4. The two class Support vector data description (SVDD)

The two class SVDD vs. one class SVDD

Deep SVDD learns a neural network transformation Ф(· ; W) with weights W from input space X∈ R d to output space F ∈ R p that attempts to map most of the data network representations into a hypersphere characterized by center c and radius R of minimum volume. Mappings of normal examples fall within, whereas mappings of anomalies fall outside the hypersphere. Deep Support Vector Data Description (Deep SVDD)

Given some training data on X, we define the soft-boundary Deep SVDD objective as - First term : minimizing R2 minimizes the volume of the hypersphere. - Second term is a penalty term for points lying outside the sphere after being passed through the network, i.e. if its distance to the center is greater than radius R - The last term is a regularizer on the network parameters W Deep Support Vector Data Description (Deep SVDD)

To achieve this the network must extract the common factors of variation of the data. As a result, normal examples of the data are closely mapped to center c, whereas anomalous examples are mapped further away from the center or outside of the hypersphere. Through this we obtain a compact description of the normal class. Anomal data Anomal dataNomal data Nomal data Deep Support Vector Data Description (Deep SVDD)

One-Class Deep SVDD objective SVDD simply employs a quadratic loss for penalizing the distance of every network representation to c One-Class Deep SVDD contracts the sphere by minimizing the mean distance of all data representations to the center.

For a given test point x ϵ X, anomaly score s can be defined for both variants of Deep SVDD by the distance of the point to the center of the hypersphere Anomaly Score Anomaly Score Conventional Approach Deep SVDD Normal Anomal Normal Anomal Anomaly Score distribution distribution

One-class classification on MNIST and CIFAR-10 Each convolutional module consists of a convolutional layer followed by leaky ReLU activations and 2x2 max-pooling. On MNIST, a CNN with two modules, 8x(5x5x1)-filters followed by 4x(5x5x1)- filters, and a final dense layer of 32 units. On CIFAR-10, a CNN with three modules, 32x(5x5x3)-filters, 64x(5x5x3)-filters, and 128x(5x5x3)-filters, followed by a final dense layer of 128 units. a batch size of 200 and set the weight decay hyper-parameter λ = 10-6 Network architectures

Both MNIST and CIFAR-10 have ten different classes from which we create ten one-class classification setups. In each setup, one of the classes is the normal class and samples from the remaining classes are used to represent anomalies. Only train with training set examples from the respective normal class. Training set sizes of n≈6,000 for MNIST and n=5,000 for CIFAR-10. Both test sets have 10,000 samples including samples from the nine anomalous classes for each setup. Pre-process all images with global contrast normalization using the L1 norm and finally rescale to [0; 1] via min-max-scaling. One-class classification on MNIST and CIFAR-10 Data setup

One-class classification on MNIST and CIFAR-10 Average AUCs in % with StdDevs (over 10 seeds) per method and one-class experiment on MNIST and CIFAR-10

Anomaly Detection using One-Class Neural Networks arXiv:1802.06360v1 Code : https://github.com/raghavchalapathy/oc-nn

Model architecture of Auto-encoder and the proposed one-class neural networks

One-Class Support Vector Machine Objective is to find a Hyper plane and distance from origin, which is positive on subset A and negative on every thing out side A. Maximize distance from hyper plane to origin Subset A Hypersphere Hyperplane 𝑟 Negative 𝑤

In order to obtain w and r , we need to solve the following optimization problem, One-Class Support Vector Machine where w is the norm perpendicular to the hyper-plane and r is the distance of the hyper-plane from origin. Distance of Feature vector from origin

A simple feed forward network with one hidden layer having linear or sigmoid activation g(·) and one output node OC-NN objective can be formulated as: where w is the scalar output obtained from the hidden to output layer, V is the weight matrix from input to hidden units. Xn is an input vector One-Class NN

Discriminative Feature Learning

A Discriminative Feature Learning For generic object, scene or action recognition. The deeply learned features need to be not only separable but also discriminative.

• Only softmax loss has been considered in classification problem  SOFTMAX LOSS : encouraging the separability of features. • Discriminative feature learning approach considers center loss as well  CENTER LOSS: simultaneously learning a center for deep features of each class and penalizing the distances between the deep features and their corresponding class centers.  JOINT SUPERVISION: minimizing the intra-class variations while keeping the features of different classes separable A Discriminative Feature Learning

A Discriminative Feature Learning Detailed Discussion on Center Loss • Easy-to-Implement. The gradient and update equation are easy to derive and the resulting CNN model is trainable. • Easy-to-Train. Centers are updated based on mini-batch with an adjustable learning rate. • Easy-to-Input. Center loss enjoys the same requirement as the softmax loss and needs no complex sample mining and recombination, which is inevitable in contrastive loss and triple loss. • Easy-to-Converge. Faster than softmax loss only

• With only softmax loss (λ=0), the deeply learned features are separable, but not discriminative (significant intra-class variations). • With proper λ, the discriminative power of deep features can be significantly enhanced, which is crucial for classification problem A Discriminative Feature Learning

Anomaly detection using deep one class classifier

More Related Content

What's hot

Similar to Anomaly detection using deep one class classifier

More from 홍배 김

Recently uploaded

In this document

Anomaly detection using deep one class classifier