feat(experimental): add write resumption strategy #1663

Pulkit0110 · 2025-12-15T07:17:47Z

Adding writes resumption strategy which will be used for error handling of bidi writes operation.

gemini-code-assist · 2025-12-15T07:18:09Z

Summary of Changes

Hello @Pulkit0110, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability of asynchronous bidirectional streaming operations by introducing a comprehensive write resumption strategy and a generic retry manager. This refactoring centralizes error handling and retry logic, making both read and write operations more resilient to transient failures and network interruptions. The changes ensure that data transfers can seamlessly resume from the last known good state, improving the overall robustness of the system.

Highlights

Write Resumption Strategy: Introduced a new _WriteResumptionStrategy to provide robust error handling and resumption capabilities for bidirectional streaming write operations.
Generic Bidi Stream Retry Manager: Implemented a generic _BidiStreamRetryManager to centralize retry logic for all bidirectional streaming operations, making it reusable for both reads and writes.
Refactored Read Resumption: The AsyncMultiRangeDownloader was refactored to utilize the new _BidiStreamRetryManager and an updated _ReadResumptionStrategy, moving checksum and offset validation into the strategy.
Retry Policy Integration: The open and download_ranges methods in AsyncMultiRangeDownloader now accept an AsyncRetry policy, allowing callers to customize retry behavior.
Improved Data Integrity Checks: Checksum and offset validation logic for read operations has been enhanced and moved into the _ReadResumptionStrategy, ensuring data integrity during stream resumption.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a generic retry manager for bidi streams and applies it to read operations, which is a great improvement for resiliency. It also adds a resumption strategy for write operations. The overall structure is well-designed and the new tests are comprehensive. I've found one critical issue that will cause a TypeError at runtime due to a method being called with an unexpected argument. I've also included a couple of medium-severity comments to improve documentation clarity and maintainability in the new strategy classes.

google/cloud/storage/_experimental/asyncio/async_multi_range_downloader.py

google/cloud/storage/_experimental/asyncio/retry/reads_resumption_strategy.py

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

Pulkit0110 · 2025-12-26T10:17:23Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a write resumption strategy for handling bidi write operations, which is a valuable addition for improving error handling in experimental async features. The implementation is well-structured and accompanied by a comprehensive set of unit tests that cover various scenarios, including initial uploads, resumption, and failure recovery. My feedback focuses on enhancing type safety, improving code clarity by addressing a potentially unused flag, and increasing the robustness of the tests by using mocks instead of None.

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

tests/unit/asyncio/retry/test_writes_resumption_strategy.py

chandra-siri · 2025-12-29T08:45:37Z

/gemini summary

gemini-code-assist · 2025-12-29T08:45:53Z

Summary of Changes

This pull request introduces an experimental write resumption strategy for asynchronous bidirectional write operations within the Google Cloud Storage client library. This new strategy significantly enhances the reliability of data uploads by providing robust mechanisms to handle errors, track the state of ongoing writes, and seamlessly resume operations from the last successfully persisted byte, thereby improving data integrity and resilience against transient network issues or server redirects.

Highlights

New Write Resumption Strategy: Introduced _WriteResumptionStrategy and _WriteState to provide robust error handling and resumption capabilities for asynchronous bidirectional write operations. This strategy manages the state of an upload, including write_handle, routing_token, persisted_size, and bytes_sent.
Request Generation and Checksumming: The strategy generates BidiWriteObjectRequests, incorporating AppendObjectSpec, state_lookup for resumption, and data chunks with CRC32C checksums for data integrity. It also handles the finish_write signal for completing uploads.
Error Recovery Mechanism: Includes logic to update the write state based on server responses and to recover from failures, specifically BidiWriteObjectRedirectedError. Upon failure, the user buffer is rewound to the last known persisted_size to facilitate seamless resumption.
Comprehensive Unit Tests: Added a new test file (test_writes_resumption_strategy.py) with extensive unit tests to validate the functionality of the _WriteResumptionStrategy, covering initial uploads, resumption scenarios, error recovery, and checksumming.
Minor Formatting Adjustments: Minor formatting changes were applied to an existing test file (test_reads_resumption_strategy.py) to align with code style guidelines.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py
- Added a new module defining _WriteState to track the progress and state of an upload operation.
- Implemented _WriteResumptionStrategy to manage the generation of write requests, processing of server responses, and recovery from failures for bidirectional writes.
tests/unit/asyncio/retry/test_reads_resumption_strategy.py
- Applied minor formatting changes to improve code readability and consistency.
tests/unit/asyncio/retry/test_writes_resumption_strategy.py
- Added a new test suite to thoroughly validate the functionality of the _WriteResumptionStrategy, including tests for initial uploads, empty files, resumption, error handling, and checksum verification.

Activity

gemini-code-assist[bot] provided an initial summary of the changes and several review comments, highlighting critical and medium priority suggestions.
Pulkit0110 (the author) addressed multiple review comments, including updating type hints, removing an unused first_request flag, and improving test robustness by using mock objects for _WriteState initialization.
chandra-siri requested a summary of the pull request.

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

chandra-siri · 2025-12-29T08:37:58Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+
+    Attributes:
+        spec (AppendObjectSpec): The specification for the object to write.
+        chunk_size (int): The size of chunks to read from the buffer.


to read ?

chandra-siri · 2025-12-29T09:01:10Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+            write_state.write_handle = response.write_handle
+
+        if response.resource:
+            write_state.is_complete = True


resource is obtained only when object is finalized. Finalization != closing stream (or finishing uploading for a particular session)

as per the doc string is_complete (bool): Whether the upload has finished .

What do you mean here , when you set is_complete to True?

It means the upload is complete and the object is finalized.

why don't we use the same terminology as backend ? is_finalized ?

otherwise it'll create confusion. ( already we've a lot of terms, finalize / close etc)

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

chandra-siri · 2025-12-29T09:13:09Z

tests/unit/asyncio/retry/test_writes_resumption_strategy.py

+        self.assertEqual(requests[3].checksummed_data.content, b"89")
+
+        self.assertEqual(requests[4].write_offset, 10)
+        self.assertTrue(requests[4].finish_write)


it shouldn't always be true.

It should be true only when user explicitly provides finalize_on_close=True

same commment for all other tests.

chandra-siri · 2025-12-29T09:24:15Z

tests/unit/asyncio/retry/test_writes_resumption_strategy.py

+    def test_generate_requests_resumption(self):
+        """
+        Verify request sequence when resuming an upload.
+        - First request is AppendObjectSpec with write_handle and state_lookup=True.


why state_lookup=True ? . If it's to fetch persisted_size then AFAIR, when opening a bidi-stream to write with write_handle persisted_size is obtained in first BidiWriteObjectResponse message

I don't think that opening the stream with write_handle guarantees to return the persisted_size. state_lookup makes sure that the persisted_size is always returned. Also, state_lookup will be passed while opening the stream with write_handle, so there won't be any additional request.

chandra-siri · 2025-12-29T09:28:47Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+            write_state.is_complete = True
+            write_state.persisted_size = response.resource.size
+
+    async def recover_state_on_failure(


(maybe little late to notice) - why async def ? there's no await anywhere here.

Yes, I'll keep it for now. Once the implementation is complete. I'll change it accordingly.

why ? are you worried of failing tests ?

chandra-siri · 2025-12-29T09:31:52Z

tests/unit/asyncio/retry/test_writes_resumption_strategy.py

+        write_state = state["write_state"]
+        write_state.persisted_size = 2048
+
+        response = storage_type.BidiWriteObjectResponse(persisted_size=1024)


when this scenario will happen ?

When there's out of order or delayed response from the server. For example, a response confirming 2048 bytes have been persisted might arrive before a delayed response that confirms only 1024 bytes were persisted at an earlier point in time.

chandra-siri · 2025-12-29T09:36:08Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+                write_state.is_complete = True
+                yield storage_type.BidiWriteObjectRequest(
+                    write_offset=write_state.bytes_sent,
+                    finish_write=True,


this way object is always finalized after doing writer.append , right ?

we should not finalize. We should keep the object in unfinalize state always unless users explicitly specifies.

chandra-siri · 2025-12-31T05:19:07Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

-        yield storage_type.BidiWriteObjectRequest(
-            append_object_spec=write_state.spec, state_lookup=do_state_lookup
+
+        # Determine if we need to send WriteObjectSpec or AppendObjectSpec


nit: this comment should be on top of L82, right?

chandra-siri · 2025-12-31T05:20:41Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+            write_state.write_handle = response.write_handle
+
+        if response.resource:
+            write_state.is_complete = True


why don't we use the same terminology as backend ? is_finalized ?

otherwise it'll create confusion. ( already we've a lot of terms, finalize / close etc)

chandra-siri · 2025-12-31T05:21:28Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+            write_state.is_complete = True
+            write_state.persisted_size = response.resource.size
+
+    async def recover_state_on_failure(


why ? are you worried of failing tests ?

chandra-siri · 2025-12-31T05:30:53Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py


    def __init__(
        self,
        spec: storage_type.AppendObjectSpec,


As discussed offline, it could be both storage_type.AppendObjectSpec | storage_type.WriteObjectSpec

WriteObjectSpec in the first requests.

chandra-siri · 2025-12-31T05:49:10Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py


+        # Initial request of the stream must provide the specification.
+        # If we have a write_handle, we request a state lookup to verify persisted offset.
        do_state_lookup = write_state.write_handle is not None


nit: is_state_lookup_required is a good variable name in my opinion.

chandra-siri · 2025-12-31T05:59:26Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

        write_state: _WriteState = state["write_state"]

        if response.persisted_size is not None:
            if response.persisted_size > write_state.persisted_size:


this check is not required.

chandra-siri · 2025-12-31T06:08:07Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

            write_state.write_handle = response.write_handle

        if response.resource:
            write_state.is_complete = True


Important: resource is also returned on the first response, in both Write & Append object spec. (see internal code link , sent via chat)

chandra-siri · 2025-12-31T06:28:03Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

            write_state.write_handle = response.write_handle

        if response.resource:
            write_state.is_complete = True


you can check if an object is finalized or not by checking the presence of https://github.com/googleapis/googleapis/blob/9a477cd3c26a704130e2a2fb44a40281d9312e4c/google/storage/v2/storage.proto#L2947

chandra-siri · 2025-12-31T06:32:07Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

-            if hasattr(cause, "write_handle") and cause.write_handle:
-                write_state.write_handle = cause.write_handle
+            redirect_handle = getattr(cause, "write_handle", None)
+            if redirect_handle:


(note/things to keep in mind while integrating with AppendableWriter) to fetch write_handle, you may have to do something similar to reads's _handle_redirect

chandra-siri · 2025-12-31T06:45:28Z

google/cloud/storage/_experimental/asyncio/retry/writes_resumption_strategy.py

+        else:
+            initial_request.append_object_spec = write_state.spec
+
+        yield initial_request


this doesn't handle the 'open' scenario , so please add a comment for that.

Pulkit0110 requested review from a team as code owners December 15, 2025 07:17

product-auto-label bot added the size: xl Pull request size is extra large. label Dec 15, 2025

blunderbuss-gcf bot assigned chandra-siri Dec 15, 2025

product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Dec 15, 2025

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

Pulkit0110 added 2 commits December 26, 2025 10:09

integrate retry logic with the MRD

40ccae1

feat(experimental): add write resumption strategy

8c2a143

Pulkit0110 force-pushed the bidi-writes-retry branch from 7d63612 to 8c2a143 Compare December 26, 2025 10:10

product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Dec 26, 2025

gemini-code-assist bot reviewed Dec 26, 2025

View reviewed changes

address gemini bot comments

0b4e6a7

chandra-siri requested changes Dec 29, 2025

View reviewed changes

addressing comments:

8aba092

Pulkit0110 requested a review from chandra-siri December 30, 2025 10:16

chandra-siri requested changes Dec 31, 2025

View reviewed changes

feat(experimental): add write resumption strategy #1663

Are you sure you want to change the base?

feat(experimental): add write resumption strategy #1663

Conversation

Pulkit0110 commented Dec 15, 2025

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pulkit0110 commented Dec 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chandra-siri commented Dec 29, 2025

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Summary of Changes

Highlights

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment