Skip to content

Commit 451cf7d

Browse files
authored
Update README.md
1 parent 96d7a6c commit 451cf7d

File tree

1 file changed

+12
-11
lines changed

1 file changed

+12
-11
lines changed

README.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,23 @@
1-
**STACKOVERFLOW ANALYSIS**
1+
# STACKOVERFLOW ANALYSIS USING STACK EXCHANGE API
2+
This Python-based project utilizes the Stack Exchange API to analyze StackOverflow data, focusing on the 'R' and 'Dot Net' programming tags. The analysis, divided into data extraction and visualization scripts, explores five key metrics—Is Answered, View Count, Answer Count, Score, and Reputation. Additional analysis highlights distinct patterns in answer resolution between the two tags. The project provides valuable insights into the dynamics of these programming communities on StackOverflow.
23

3-
**Language(s) used -**
4+
## Language(s) used -
45

56
- Python
67

7-
**Identifier for a post -**
8+
## Identifier for a post -
89

910
- **Question Id (question\_Id) -** Id of the question.
1011

11-
**Files and Folders -**
12+
## Files and Folders -
1213

1314
- **soDataExtraction.py -** used to extract the data from stackoverflow and generate csv files.
1415
- **soVisualization.py -** used to generate plots using the csv files generated.
1516
- **plots -** contains png file of all the plots generated.
1617

17-
**PART 1** **- PLOTS TO UNDERSTAND** **DIFFERENCES IN DISTRIBUTION OF THE 5 METRICS BETWEEN THE POSTS OF THE TWO TAGS**
18+
## PART 1 - PLOTS TO UNDERSTAND DIFFERENCES IN DISTRIBUTION OF THE 5 METRICS BETWEEN THE POSTS OF THE TWO TAGS
1819

19-
**1. Is Answered (is\_answered)**
20+
### 1. Is Answered (is\_answered)
2021

2122
It is a boolean value which indicates whether the question has been answered or is still
2223

@@ -34,7 +35,7 @@ Based upon the above plot we can infer that -
3435
- This could possibly suggest that R contributors are more active than Dot Net contributors.
3536
- Another analysis could be that, since it's a 6 month data, it takes more time for contributors to answer questions related to Dot Net tag than R tag.
3637

37-
**2. View Count (view\_count)**
38+
### 2. View Count (view\_count)
3839

3940
It is a numeric value which shows the number of views on a post.
4041

@@ -49,7 +50,7 @@ Based upon the above plot we can infer that -
4950
- For the majority of questions for both R and Dot Net tag, the view count is less than or equal to 150. This implies that contributors related to both tags are equally active when it comes to viewing the posts.
5051
- However, speaking of outliers in Dot Net tag, this suggests that while Dot Net contributors are more widespread, they are not very consistent in using Stackoverflow as a platform but are only focussed towards certain questions of their interest.
5152

52-
**3. Answer Count (answer\_count)**
53+
### 3. Answer Count (answer\_count)
5354

5455
It is a numeric value which shows the number of answers on a question.
5556

@@ -64,7 +65,7 @@ Based upon the above plot we can infer that -
6465
- In terms of answering the questions, contributors who are familiar with R are much more active, with the majority of the questions having greater than or equal to 1 answer than those who are familiar with Dot Net, who have the majority answer count of 0 or 1.
6566
- The above point supports our hypothesis made in metric 1 as well, that contributors who are familiar with R are more active.
6667

67-
**4. Score (score)**
68+
### 4. Score (score)
6869

6970
It is a numeric value which shows the difference between number of up-votes and number of down-votes on a question.
7071

@@ -78,7 +79,7 @@ Based upon the above plot we can infer that -
7879

7980
- Distribution of score for both the tags is almost same, with majority of them lying in the range 0-5. This implies that contributors for both the tags are consistently active when it comes to up-voting/down-voting the posts which is a good sign as this helps the community to solve the questions quickly.
8081

81-
**5.** **Reputation (reputation)**
82+
### 5. Reputation (reputation)
8283

8384
It is a numeric value which shows the reputation of the owner of the question. It is a
8485

@@ -99,7 +100,7 @@ Based upon the above plots we can infer that -
99100
- Fig 1 - Mean of reputation of the owner of posts related to Dot Net tag is ~2600 whereas for R tag, it is ~1300. However, Fig 2 suggests that since the reputation of the owners of posts related to Dot Net tag seems to have more outliers as compared to R, the mean is deflected.
100101
- As we have seen above in metric 2, contributors related to Dot Net seem to have only a certain set of users who are highly active and are more engrossed while the majority of users lie in a range of 0-12.5k owner reputation which is similar to where the major portion of R contributors lie in.
101102

102-
**PART 2** **- ADDITIONAL ANALYSIS**
103+
## PART 2 - ADDITIONAL ANALYSIS
103104

104105
![alt text](https://github.com/iamber12/stackOverflowAnalysis/blob/main/plots/additional.png)
105106

0 commit comments

Comments
 (0)