-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spanner): retry spanner transactions and mutations when RST_STREAM error #6699
Conversation
rahul2393
commented
Sep 20, 2022
- Clients need to retry the Spanner transactions in case of RST_STREAM error returned from the backend
…AM internal errors is returned from backend.
// returned by Cloud Spanner, or if none is returned, the calculated delay with | ||
// a minimum of 10ms and maximum of 32s. There is no delay before the retry if | ||
// the error was Session not found. | ||
func runWithRetryOnAbortedOrSessionNotFound(ctx context.Context, f func(context.Context) error) error { | ||
retryer := onCodes(DefaultRetryBackoff, codes.Aborted) | ||
retryer := onCodes(DefaultRetryBackoff, codes.Aborted, codes.Internal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make it retry on any Internal
error, also those that are not retryable. This could cause an infinite (or at least very long-lasting) loop if someone tries to execute a transaction that happens to hit a feature that returns an internal error.
// creations/retrivals. | ||
return ts, err | ||
// Make a retryer for Aborted and certain Internal errors. | ||
retryer := onCodes(DefaultRetryBackoff, codes.Aborted, codes.Internal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also could cause an infinite loop.
// retries it if it returns an Aborted or Session not found error. The retry | ||
// is delayed if the error was Aborted. The delay between retries is the delay | ||
// retries it if it returns an Aborted, Session not found error or certain Internal errors. The retry | ||
// is delayed if the error was Aborted or Internal error. The delay between retries is the delay | ||
// returned by Cloud Spanner, or if none is returned, the calculated delay with | ||
// a minimum of 10ms and maximum of 32s. There is no delay before the retry if | ||
// the error was Session not found. | ||
func runWithRetryOnAbortedOrSessionNotFound(ctx context.Context, f func(context.Context) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a couple of tests for the changes in this PR that verify that transactions are retried if the RST_STREAM
error is returned, but not for any other random internal error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olavloite Test "TestClient_ApplyAtLeastOnce_NonRetryableInternalErrors" added,
Please take a look at the code again.
This could cause an infinite
=> This is not true. The following code only says that:
if it is an internal error and its error message is not in the list, then return false
.
If it is an internal error and it error message in the list, then let it pass to the next check r.Retryer.Retry(err)
.
It early returns false
for non-retryable internal errors before the real retry check.
If the user has specified to not retry internal errors
, then r.Retryer.Retry(err)
will return false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that, thanks! The specific error messages were already there, we were just not using them anymore (and I guess you are now going to dig up that I was the one that actually removed it ;-) ).