- Notifications
You must be signed in to change notification settings - Fork 4.9k
Source file: do not read whole file on check and discover #24278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /test connector=connectors/source-file
Build PassedTest summary info: |
| # this is to ensure we make all conditions under which the bug is reproduced, i.e. | ||
| # - chunk size < file size | ||
| # - column type in the last chunk is not `string` | ||
| @patch("source_file.client.Client.CSV_CHUNK_SIZE", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped this test because it doesn't represent the expected behavior from now on
| with client.reader.open(): | ||
| list(client.streams) | ||
| return AirbyteConnectionStatus(status=Status.SUCCEEDED) | ||
| list(client.streams(empty_schema=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read only file header when running check to ensure the connection succeeds
| if skip_data: | ||
| reader_options["nrows"] = 0 | ||
| reader_options["index_col"] = 0 | ||
| yield from reader(fp, **reader_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read only self.CSV_CHUNK_SIZE bytes of data to generate schema. Otherwise a time out is possible in case a large file is read
| /publish connector=connectors/source-file
if you have connectors that successfully published but failed definition generation, follow step 4 here |
| /publish connector=connectors/source-file-secure
if you have connectors that successfully published but failed definition generation, follow step 4 here |
What
https://github.com/airbytehq/oncall/issues/1681
Fix timing out
checkanddiscovercommandsHow
Read only the header of a file on
check.Read a single chunk of data on
discover.