Skip to content

Commit 1243f45

Browse files
committed
update README
1 parent 56ccbdd commit 1243f45

File tree

2 files changed

+17
-3
lines changed

2 files changed

+17
-3
lines changed

README.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,29 @@ The objective of this pipeline is to enable autotagging of stack overflow questi
44
# Sample datasets
55

66
<p align="center">
7-
<img src="/sample_dataset.png" width="900"/>
7+
<img src="/bigData/sample_dataset.png" width="900"/>
88
</p>
99

1010
# Tag
1111
<p align="center">
12-
<img src="/tag.png" width="900"/>
12+
<img src="/bigData/tag.png" width="900"/>
1313
</p>
1414

1515
# Pipeline
1616
<p align="center">
17-
<img src="/pipeline.png" width="900"/>
17+
<img src="/bigData/pipeline.png" width="900"/>
1818
</p>
19+
20+
# Explore the data set
21+
22+
## Feature Extraction for each answer or questions
23+
24+
Use TF-IDF to form a vector for each questions or answers:
25+
1. TF(term frequency) is the frequency of a word appears in a document
26+
2. IDF(inverted document frequency) is a measurement of whether a word is common or rare in the whole documents
27+
28+
<p align="center">
29+
<img src="/bigData/data_exploration.png" width="900"/>
30+
</p>
31+
32+

data_exlporation.png

106 KB
Loading

0 commit comments

Comments
 (0)