You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix events and workflow_runs datetimes in source-github (#19299)
* Fix events and workflow_runs datetimes in `source-github` * add PR number * whitespace * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/integrations/sources/github.md
+76-70Lines changed: 76 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,25 @@
1
1
# GitHub
2
+
2
3
This page contains the setup guide and reference information for the GitHub source connector.
3
4
4
5
## Prerequisites
5
-
* Start date
6
-
* GitHub Repositories
7
-
* Branch (Optional)
8
-
* Page size for large streams (Optional)
9
6
10
-
**For Airbyte Cloud:**
7
+
- Start date
8
+
- GitHub Repositories
9
+
- Branch (Optional)
10
+
- Page size for large streams (Optional)
11
11
12
-
* Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))
13
-
* OAuth
12
+
**For Airbyte Cloud:**
14
13
14
+
- Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))
15
+
- OAuth
15
16
16
17
**For Airbyte Open Source:**
17
18
18
-
* Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))
19
-
19
+
- Personal Access Token (see [Permissions and scopes](https://docs.airbyte.com/integrations/sources/github#permissions-and-scopes))
20
20
21
21
## Setup guide
22
+
22
23
### Step 1: Set up GitHub
23
24
24
25
Create a [GitHub Account](https://github.com).
@@ -28,19 +29,21 @@ Create a [GitHub Account](https://github.com).
28
29
Log into [GitHub](https://github.com) and then generate a [personal access token](https://github.com/settings/tokens). To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with `,`.
29
30
30
31
### Step 2: Set up the GitHub connector in Airbyte
32
+
31
33
**For Airbyte Cloud:**
32
34
33
35
1.[Log into your Airbyte Cloud](https://cloud.airbyte.io/workspaces) account.
34
36
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+ new source**.
35
37
3. On the source setup page, select **GitHub** from the Source type dropdown and enter a name for this connector.
36
-
4. Click `Authenticate your GitHub account` by selecting Oauth or Personal Access Token for Authentication.
38
+
4. Click `Authenticate your GitHub account` by selecting Oauth or Personal Access Token for Authentication.
37
39
5. Log in and Authorize to the GitHub account.
38
40
6.**Start date** - The date from which you'd like to replicate data for streams: `comments`, `commit_comment_reactions`, `commit_comments`, `commits`, `deployments`, `events`, `issue_comment_reactions`, `issue_events`, `issue_milestones`, `issue_reactions`, `issues`, `project_cards`, `project_columns`, `projects`, `pull_request_comment_reactions`, `pull_requests`, `pull_requeststats`, `releases`, `review_comments`, `reviews`, `stargazers`, `workflow_runs`, `workflows`.
39
41
7.**GitHub Repositories** - Space-delimited list of GitHub organizations/repositories, e.g. `airbytehq/airbyte` for single repository, `airbytehq/airbyte airbytehq/another-repo` for multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example: `airbytehq/*`.
40
42
8.**Branch (Optional)** - Space-delimited list of GitHub repository branches to pull commits for, e.g. `airbytehq/airbyte/master`. If no branches are specified for a repository, the default branch will be pulled. (e.g. `airbytehq/airbyte/master airbytehq/airbyte/my-branch`).
41
43
9.**Page size for large streams (Optional)** - The GitHub connector contains several streams with a large load. The page size of such streams depends on the size of your repository. Recommended to specify values between 10 and 30.
42
44
43
45
**For Airbyte Open Source:**
46
+
44
47
1. Authenticate with **Personal Access Token**.
45
48
46
49
## Supported sync modes
@@ -59,85 +62,88 @@ The GitHub source connector supports the following [sync modes](https://docs.air
59
62
60
63
This connector outputs the following full refresh streams:
1. Only 4 streams \(`comments`, `commits`, `issues` and `review comments`\) from the above 24 incremental streams are pure incremental meaning that they:
104
-
* read only new records;
105
-
* output only new records.
107
+
108
+
- read only new records;
109
+
- output only new records.
106
110
107
111
2. Streams `workflow_runs` and `worflow_jobs` is almost pure incremental:
108
-
* read new records and some portion of old records (in past 30 days) [docs](https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs);
109
-
* the `workflow_jobs` depends on the `workflow_runs` to read the data, so they both follow the same logic [docs](https://docs.github.com/pt/rest/actions/workflow-jobs#list-jobs-for-a-workflow-run);
110
-
* output only new records.
112
+
113
+
- read new records and some portion of old records (in past 30 days) [docs](https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs);
114
+
- the `workflow_jobs` depends on the `workflow_runs` to read the data, so they both follow the same logic [docs](https://docs.github.com/pt/rest/actions/workflow-jobs#list-jobs-for-a-workflow-run);
115
+
- output only new records.
111
116
112
117
3. Other 19 incremental streams are also incremental but with one difference, they:
113
-
* read all records;
114
-
* output only new records.
115
-
Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.
118
+
119
+
- read all records;
120
+
- output only new records.
121
+
Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.
116
122
117
123
4. We are passing few parameters \(`since`, `sort` and `direction`\) to GitHub in order to filter records and sometimes for large streams specifying very distant `start_date` in the past may result in keep on getting error from GitHub instead of records \(respective `WARN` log message will be outputted\). In this case Specifying more recent `start_date` may help.
118
-
**The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:**
119
-
120
-
*`assignees`
121
-
*`branches`
122
-
*`collaborators`
123
-
*`issue_labels`
124
-
*`organizations`
125
-
*`pull_request_commits`
126
-
*`pull_request_stats`
127
-
*`repositories`
128
-
*`tags`
129
-
*`teams`
130
-
*`users`
124
+
**The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:**
125
+
126
+
-`assignees`
127
+
-`branches`
128
+
-`collaborators`
129
+
-`issue_labels`
130
+
-`organizations`
131
+
-`pull_request_commits`
132
+
-`pull_request_stats`
133
+
-`repositories`
134
+
-`tags`
135
+
-`teams`
136
+
-`users`
131
137
132
138
### Permissions and scopes
133
139
134
140
If you use OAuth authentication method, the oauth2.0 application requests the next list of [scopes](https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps#available-scopes): **repo**, **read:org**, **read:repo_hook**, **read:user**, **read:discussion**, **workflow**. For [personal access token](https://github.com/settings/tokens) it need to manually select needed scopes.
135
141
136
142
Your token should have at least the `repo` scope. Depending on which streams you want to sync, the user generating the token needs more permissions:
137
143
138
-
* For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions [here](https://docs.github.com/en/get-started/learning-about-github/access-permissions-on-github).
139
-
* Syncing [Teams](https://docs.github.com/en/organizations/organizing-members-into-teams/about-teams) is only available to authenticated members of a team's [organization](https://docs.github.com/en/rest/orgs). [Personal user accounts](https://docs.github.com/en/get-started/learning-about-github/types-of-github-accounts) and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
140
-
* To sync the Projects stream, the repository must have the Projects feature enabled.
144
+
- For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions [here](https://docs.github.com/en/get-started/learning-about-github/access-permissions-on-github).
145
+
- Syncing [Teams](https://docs.github.com/en/organizations/organizing-members-into-teams/about-teams) is only available to authenticated members of a team's [organization](https://docs.github.com/en/rest/orgs). [Personal user accounts](https://docs.github.com/en/get-started/learning-about-github/types-of-github-accounts) and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
146
+
- To sync the Projects stream, the repository must have the Projects feature enabled.
141
147
142
148
### Performance considerations
143
149
@@ -147,6 +153,7 @@ The GitHub connector should not run into GitHub API limitations under normal usa
0 commit comments