- Notifications
You must be signed in to change notification settings - Fork 4.9k
✨ Source S3: Add handling NoSuchBucket error #31383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
|
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/x86_64 | ❌ |
| Code format checks | ✅ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test| def _check_list_files(self, stream: "AbstractFileBasedStream") -> List[RemoteFile]: | ||
| try: | ||
| files = stream.list_files() | ||
| except CustomFileBasedSourceException as custom_exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tolik0 thanks for this PR, I agree that we want to be able customize the exceptions.
Looking at this code: the error that's relevant to the user is still that there was a problem listing files, so I think we want to keep the CheckAvailabilityError, but want to make sure we can customize the message, e.g. for things like missing bucket.
Since we're already raising CheckAvailabilityError ... from exc, we should get the missing bucket error as part of the stacktrace when we raise the CheckAvailabilityError, right?
If that's obscuring the error too much, instead of raising CustomFileBasedSourceException, WDYT about giving it a "message" attribute, then here we'll raise CheckAvailabilityError(custom_exc.message, stream=stream.name) from exc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clnoll Thanks for feedback!
I've removed CustomFileBasedSourceException. Do you think this is better? Should I split this PR into two: one for file-based CDK and another for S3 changes?
9f586f9 to db3ff06 Compare
clnoll left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tolik0! I think I'm actually preferring to keep the CustomFileBasedSourceException. I left an example of what I have in mind in a comment.
I think you will have to split this into 2 PRs since the new CDK version won't be available yet when the source-s3 connector is published.
| try: | ||
| files = stream.list_files() | ||
| except Exception as exc: | ||
| raise CheckAvailabilityError(FileBasedSourceError.ERROR_LISTING_FILES, stream=stream.name) from exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I'd do here is continue to raise the CustomException from your previous version, but catch it and re-raise with a CheckAvailabilityError, like this:
try: ... except CustomFileBasedException as exc: raise CheckAvailabilityError(exc.error_message, stream=stream.name) from exc except Exception as exc: ... This way we can avoid having to do the hasattr check. Plus, I think we'll probably want to use your CustomFileBasedException in other places too as the file-based CDK matures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code to use CustomFileBasedException and re-raise as CheckAvailabilityError, removing the hasattr check. Thanks for the suggestion.
| yield remote_file | ||
| else: | ||
| for remote_file in self._page(s3, globs, self.config.bucket, None, seen, logger): | ||
| for current_prefix in (prefixes if prefixes else [None]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the code to avoid duplication
| ) from exc | ||
| self.raise_error_listing_files(FileBasedSourceError.ERROR_LISTING_FILES, globs, exc) | ||
| | ||
| def raise_error_listing_files(self, error_message: Union[str, FileBasedSourceError], globs: List[str], exc: Optional[Exception] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a helper method for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the helper method to reduce repetitive error-raising code.
|
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ❌ |
| Code format checks | ✅ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 testc60529b to 3c18702 Compare |
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/x86_64 | ❌ |
| Code format checks | ❌ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test |
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/x86_64 | ❌ |
| Code format checks | ✅ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test
clnoll left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tolik0, looks great!
30e354a to 3edc8af Compare |
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/amd64 | ❌ |
| Code format checks | ❌ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test |
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/amd64 | ❌ |
| Code format checks | ✅ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test0550b77 to 2c9c7b6 Compare |
| Step | Result |
|---|---|
| Build source-s3 docker image for platform(s) linux/amd64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
| Check our base image is used | ✅ |
| Code format checks | ✅ |
| Validate metadata for source-s3 | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
☁️ View runs for commit in Dagger Cloud
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-s3 test
What
Add raising config error in case of wrong bucket name.
Closes:
#25835
How
Added a new type of error to file-based cdk to avoid catching in file-based cdk and just raise AirbyteTracedException config error.