You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-11Lines changed: 12 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,23 @@
1
-
**STACKOVERFLOW ANALYSIS**
1
+
# STACKOVERFLOW ANALYSIS USING STACK EXCHANGE API
2
+
This Python-based project utilizes the Stack Exchange API to analyze StackOverflow data, focusing on the 'R' and 'Dot Net' programming tags. The analysis, divided into data extraction and visualization scripts, explores five key metrics—Is Answered, View Count, Answer Count, Score, and Reputation. Additional analysis highlights distinct patterns in answer resolution between the two tags. The project provides valuable insights into the dynamics of these programming communities on StackOverflow.
2
3
3
-
**Language(s) used -**
4
+
## Language(s) used -
4
5
5
6
- Python
6
7
7
-
**Identifier for a post -**
8
+
## Identifier for a post -
8
9
9
10
-**Question Id (question\_Id) -** Id of the question.
10
11
11
-
**Files and Folders -**
12
+
## Files and Folders -
12
13
13
14
-**soDataExtraction.py -** used to extract the data from stackoverflow and generate csv files.
14
15
-**soVisualization.py -** used to generate plots using the csv files generated.
15
16
-**plots -** contains png file of all the plots generated.
16
17
17
-
**PART 1****- PLOTS TO UNDERSTAND****DIFFERENCES IN DISTRIBUTION OF THE 5 METRICS BETWEEN THE POSTS OF THE TWO TAGS**
18
+
## PART 1- PLOTS TO UNDERSTANDDIFFERENCES IN DISTRIBUTION OF THE 5 METRICS BETWEEN THE POSTS OF THE TWO TAGS
18
19
19
-
**1. Is Answered (is\_answered)**
20
+
### 1. Is Answered (is\_answered)
20
21
21
22
It is a boolean value which indicates whether the question has been answered or is still
22
23
@@ -34,7 +35,7 @@ Based upon the above plot we can infer that -
34
35
- This could possibly suggest that R contributors are more active than Dot Net contributors.
35
36
- Another analysis could be that, since it's a 6 month data, it takes more time for contributors to answer questions related to Dot Net tag than R tag.
36
37
37
-
**2. View Count (view\_count)**
38
+
### 2. View Count (view\_count)
38
39
39
40
It is a numeric value which shows the number of views on a post.
40
41
@@ -49,7 +50,7 @@ Based upon the above plot we can infer that -
49
50
- For the majority of questions for both R and Dot Net tag, the view count is less than or equal to 150. This implies that contributors related to both tags are equally active when it comes to viewing the posts.
50
51
- However, speaking of outliers in Dot Net tag, this suggests that while Dot Net contributors are more widespread, they are not very consistent in using Stackoverflow as a platform but are only focussed towards certain questions of their interest.
51
52
52
-
**3. Answer Count (answer\_count)**
53
+
### 3. Answer Count (answer\_count)
53
54
54
55
It is a numeric value which shows the number of answers on a question.
55
56
@@ -64,7 +65,7 @@ Based upon the above plot we can infer that -
64
65
- In terms of answering the questions, contributors who are familiar with R are much more active, with the majority of the questions having greater than or equal to 1 answer than those who are familiar with Dot Net, who have the majority answer count of 0 or 1.
65
66
- The above point supports our hypothesis made in metric 1 as well, that contributors who are familiar with R are more active.
66
67
67
-
**4. Score (score)**
68
+
### 4. Score (score)
68
69
69
70
It is a numeric value which shows the difference between number of up-votes and number of down-votes on a question.
70
71
@@ -78,7 +79,7 @@ Based upon the above plot we can infer that -
78
79
79
80
- Distribution of score for both the tags is almost same, with majority of them lying in the range 0-5. This implies that contributors for both the tags are consistently active when it comes to up-voting/down-voting the posts which is a good sign as this helps the community to solve the questions quickly.
80
81
81
-
**5.****Reputation (reputation)**
82
+
### 5. Reputation (reputation)
82
83
83
84
It is a numeric value which shows the reputation of the owner of the question. It is a
84
85
@@ -99,7 +100,7 @@ Based upon the above plots we can infer that -
99
100
- Fig 1 - Mean of reputation of the owner of posts related to Dot Net tag is ~2600 whereas for R tag, it is ~1300. However, Fig 2 suggests that since the reputation of the owners of posts related to Dot Net tag seems to have more outliers as compared to R, the mean is deflected.
100
101
- As we have seen above in metric 2, contributors related to Dot Net seem to have only a certain set of users who are highly active and are more engrossed while the majority of users lie in a range of 0-12.5k owner reputation which is similar to where the major portion of R contributors lie in.
0 commit comments