Community Extracting Using Intersection Graph and Content Analysis in Complex Network

Community Extracting using Intersection Graph and Content Analysis in Complex Network Toshiya Kuramochi Naoki Okada Kyohei Tanikawa Yoshinori Hijikata Shogo Nishida Graduate School of Engineering Science, Osaka University, Japan The 2012 IEEE/WIC/ACM International Conference on Web Intelligence ENGINEERING SCIENCE

Overview page 2 1. Background and Problems of Community Detection 2. Our Proposed Method 3. Experimentation in Real SNS Networks 4. Results and Discussions 5. Conclusion

Background page 3 Many researchers have studied about complex networks and have found the “community structure” Community structure • connection in groups is densely • connection among groups is sparsely science communities in WWW sets of web pages related to a certain topic business sport Community structure is a key characteristic of complex network

Problem (1) – overlap of communities page 4 Some nodes belong to several communities in real networks communities in WWW community of community of sports pages business pages overlap of communities (e.g., economic effect of the Olympic Games) Most of ordinary clustering methods allocate nodes one cluster They CANNOT represent the overlap of communities Community detection method should be able to allocate nodes several clusters

Problem (2) – edge inhomogeneity page 5 Edges are not homogeneous in real networks edges in SNS network same hobby work place family same university Many community detection methods assume all edges are same They CANNOT represent the edge inhomogeneity Weights of edges should be set individually

Problem (3) – appropriate number of communities page 6 The number of real communities is often unknown How many communities in this network? 4 3 2 Most hierarchical clustering methods require manual input of appropriate number of communities Number of communities should be determined automatically

Purpose of this work page 7 We solve these three problems by proposing a new community detection method • A node may belong to several communities Using the idea of intersection graph [Everett & Borgatti, 1998] • Weights of edges are set individually Content information analysis • Number of communities are automatically determined Clustering based on modularity [Newman, 2003]

Summary of our proposed method page 9 Input • Step 1: Enumeration of dense subgraphs • Step 2: Conversion to the intersection graph • Step 3: Calculation of the weights of edges • Step 4: Clustering based on modularity Output Clusters (communities)

Step 1: Enumeration of dense subgraphs page 10 dense subgraph clique, n-clique, n-clan, etc. complete graph example of clique enumeration (example of dense subgraph) { D, E, I, J } (size = 4) K I H clique threshold: threshold J of size of clique B G threshold enumerate D 3 A E 4 C F 5 { B, C, D, J, K } (size = 5) E, G, H } (size = 3) {

Step 2: Conversion to the intersection graph page 11 Z K I H E, G, J B H G {E} D A B, C, D D, E, E C F J, K I, J X { D, J } Y (common member) overlap threshold: threshold of number of common members

Step 3: Calculation of weights of edges page 12 X B C D Y K J E I

Step 4: Clustering based on modularity page 13 automatically detection of best number of clusters

Summary of our proposed method page 14 K I H J B • Step 1: Enumeration G of cliques D A E C F • Step 2: Conversion to Z X the intersection graph Y • Step 3: Calculation of Z weights of edges X Y • Step 4: Clustering Z cluster 2 based on modularity cluster 1 X Y

Overview page 15 1. Background and Problems of Community Detection 2. Our Proposed Method 3. Experimentation in Real SNS Networks 4. Experimental Results 5. Conclusion

Dataset page 16 mixi: one of the most popular SNS in Japan • test subjects: 20 mixi users • link structure: two radius from each test subject • content information: self-introduction, friend introduction, attributes (gender, address, birthday, etc.) ground truth: relation names between a test subject and person in the dataset which is enumerated by the test subject (e.g., „same university‟, „hobby friend‟, „coworker‟) evaluation: we evaluate communities that include test subject • numerical evaluation • precision • recall • F-measure • visual evaluation

Implementation page 17 parameter setting: (clique threshold, overlap threshold) = (3, 2), (4, 2), (4, 3), (5, 2), (5, 3) and (5, 4) implementation: conventional method WithCA NonCA Everett‟s Everett‟s method method* content friend introduction analysis none analysis clustering simple hierarchical clustering based on modularity method clustering # output equals to automatically determined correct data* clusters NonCA * the number of relation names which is enumerated by test subject

Numerical evaluation (F-measure) page 19 Everett's method Everett's method* NonCA WithCA 0.5 0.4 F-measure 0.3 0.2 0.1 0 (3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4) (clique threshold, overlap threshold) Everett Everett* NonCA WithCA Everett bad bad bad superiority of our method Everett* good bad bad NonCA good good contribution of clustering WithCA good good based on modularity

Numerical evaluation (precision) page 20 Everett's method Everett's method* NonCA WithCA 0.8 0.7 0.6 precision 0.5 0.4 0.3 0.2 0.1 0 (3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4) (clique threshold, overlap threshold) Everett Everett* NonCA WithCA Everett bad Everett* bad NonCA bad the precision become WithCA good good good better with content analysis

Numerical evaluation (recall) page 21 Everett's method Everett's method* NonCA WithCA 0.6 0.5 0.4 recall 0.3 0.2 0.1 0 (3, 2) (4, 2) (4, 3) (5, 2) (5, 3) (5, 4) (clique threshold, overlap threshold) Everett Everett* NonCA WithCA our methods overcome Everett bad bad bad the ordinary method Everett* good bad bad the recall become better by NonCA good good good using clustering method WithCA good good bad based on modularity

Visual evaluation (Everett’s method vs. WithCA) page 22 Everett‟s method WithCA

Visual evaluation (NonCA vs. WithCA) page 23 NonCA WithCA

Conclusion page 25 • Features of our proposed method – Our method can allocate nodes several clusters – Our method can represent edge inhomogeneity – Our method can automatically detect the number of clusters • Evaluation on real SNS networks – Our method overcomes conventional method in F- measure – The recall becomes better by using clustering method based on modularity – The precision becomes better with content analysis

Community Extracting Using Intersection Graph and Content Analysis in Complex Network

More Related Content

Similar to Community Extracting Using Intersection Graph and Content Analysis in Complex Network

Community Extracting Using Intersection Graph and Content Analysis in Complex Network