Skip to content

Conversation

@BenWhitehead
Copy link
Collaborator

@BenWhitehead BenWhitehead commented Jun 9, 2023

When uploading a file where we are able to rewind to an arbitrary offset, we can be more optimistic in the way we send requests to GCS.

Add new code middleware to allow PUTing an entire file to GCS in a single request, and using query resumable session to recover from the specific offset in the case of retryable error.

Benchmark Results

Methodology

Generate a random file on disk of size 128KiB..2GiB from /dev/urandom, then upload the generated file using Storage.createFrom(BlobInfo, Path).

Perform each 4096 times.

Run on a c2-standard-60 instance is us-central1 against a regional bucket located in us-central1.

Results

The following summary of throughput in MiB/s as observed between the existing implementation, and the new implementation proposed in this PR.

 count mean std min 50% 75% 90% 99% max runId ApiName createFrom - existing JSON 4096.0 66.754 10.988 3.249 67.317 73.476 78.961 91.197 107.247 createFrom - new JSON 4096.0 158.769 67.105 4.600 170.680 218.618 240.992 266.297 305.205 

Comparison

When comparing the new implementation to the existing implementation we get the following improvement to throughput (higher is better):

stat pct mean 137.841 50% 153.547 90% 205.204 99% 192.003 
@BenWhitehead BenWhitehead added do not merge Indicates a pull request not ready for merge, due to either quality or timing. owlbot:ignore instruct owl-bot to ignore a PR labels Jun 9, 2023
@BenWhitehead BenWhitehead requested a review from a team as a code owner June 9, 2023 21:12
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: storage Issues related to the googleapis/java-storage API. labels Jun 9, 2023
@BenWhitehead
Copy link
Collaborator Author

ITRetryConformanceTests are currently failing due to googleapis/storage-testbench#510

@BenWhitehead BenWhitehead force-pushed the put-file branch 4 times, most recently from d29afd4 to 0a6f7c7 Compare June 23, 2023 15:16
When uploading a file where we are able to rewind to an arbitrary offset, we can be more optimistic in the way we send requests to GCS. Add new code middleware to allow PUTing an entire file to GCS in a single request, and using query resumable session to recover from the specific offset in the case of retryable error. ## TODO 1. Add failure scenario: send more bytes than are specified in content-range header
@BenWhitehead BenWhitehead changed the title feat: update Storage.createFrom(BlobInfo, Path) to be smarter feat: update Storage.createFrom(BlobInfo, Path) to be 150% higher throughput Jul 5, 2023
@BenWhitehead BenWhitehead changed the title feat: update Storage.createFrom(BlobInfo, Path) to be 150% higher throughput feat: update Storage.createFrom(BlobInfo, Path) to have 150% higher throughput Jul 5, 2023
Copy link
Contributor

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one comment on a todo; otherwise lgtm


@ParametersAreNonnullByDefault
enum JsonResumableSessionFailureScenario {
// TODO: send more bytes than are in the Content-Range header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't the server just stop reading bytes after content-range is satisfied? Guessing this would be more of a dataloss scenario client side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time thinking about this, and this would actually be a client side bug. Since the client computes both Content-Range and Content-Length if those don't sync up, we have a bug so not a server response failure we need to handle. I'll remove this TODO in the follow up PR I have queued up on my workstation to follow this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client side issue makes sense; if anything tests would serve to protect client changes that introduce such a bug.

@BenWhitehead BenWhitehead removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 13, 2023
@BenWhitehead BenWhitehead merged commit 4c2f44e into main Jul 13, 2023
@BenWhitehead BenWhitehead deleted the put-file branch July 13, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the googleapis/java-storage API. owlbot:ignore instruct owl-bot to ignore a PR size: xl Pull request size is extra large.

2 participants