File tree Expand file tree Collapse file tree 2 files changed +17
-3
lines changed Expand file tree Collapse file tree 2 files changed +17
-3
lines changed Original file line number Diff line number Diff line change @@ -4,15 +4,29 @@ The objective of this pipeline is to enable autotagging of stack overflow questi
44# Sample datasets
55
66<p align =" center " >
7- <img src =" /sample_dataset.png " width =" 900 " />
7+ <img src =" /bigData/ sample_dataset.png " width =" 900 " />
88</p >
99
1010# Tag
1111<p align =" center " >
12- <img src =" /tag.png " width =" 900 " />
12+ <img src =" /bigData/ tag.png " width =" 900 " />
1313</p >
1414
1515# Pipeline
1616<p align =" center " >
17- <img src =" /pipeline.png " width =" 900 " />
17+ <img src =" /bigData/ pipeline.png " width =" 900 " />
1818</p >
19+
20+ # Explore the data set
21+
22+ ## Feature Extraction for each answer or questions
23+
24+ Use TF-IDF to form a vector for each questions or answers:
25+ 1 . TF(term frequency) is the frequency of a word appears in a document
26+ 2 . IDF(inverted document frequency) is a measurement of whether a word is common or rare in the whole documents
27+
28+ <p align =" center " >
29+ <img src =" /bigData/data_exploration.png " width =" 900 " />
30+ </p >
31+
32+
You can’t perform that action at this time.
0 commit comments