-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow expansion service to choose pickler. #22999
Conversation
R: @chamikaramj It'd be good to get this in before the release cut--possibly considering changing the default (for Python-from-Java). |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
@@ -68,10 +68,12 @@ def load_session(file_path): | |||
def set_library(selected_library=DEFAULT_PICKLE_LIB): | |||
""" Sets pickle library that will be used. """ | |||
global desired_pickle_lib | |||
if selected_library == USE_DILL and desired_pickle_lib != dill_pickler: | |||
if (selected_library == USE_DILL) != (desired_pickle_lib == dill_pickler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean "(selected_library == USE_DILL) and" ?
Logic is bit hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the new value of (pickler is dill) is different than the old value, we have to set (or unset) the hooks. Added a comment to clarify.
LGTM to merge after verifying the logic. Thanks. |
This needs to be added here as it is a global option, not a pipeline option. This is useful as dill has some issues with objects created from exec/eval which are used for remotely-defined PythonCallable objects. Also refactored the pickler choice logic a bit for clarity and better error checking.
e8c6f5a
to
c054ade
Compare
Codecov Report
@@ Coverage Diff @@
## master #22999 +/- ##
==========================================
- Coverage 73.68% 73.63% -0.06%
==========================================
Files 714 716 +2
Lines 95219 95258 +39
==========================================
- Hits 70165 70140 -25
- Misses 23757 23821 +64
Partials 1297 1297
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Run Python_PVR_Flink PreCommit |
cc: @ryanthompson591 |
@@ -129,6 +131,9 @@ def run(argv): | |||
print('Server started at port', port) | |||
return | |||
|
|||
if options.default_pickler: | |||
pickler.set_library(options.default_pickler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set the --pickler_library
pipeline option in this case?
Also noting the naming difference for a similar concept - should this be renamed into --pickler_library
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware that there was such a pipeline option (and it doesn't seem possible to set on a per-pipeline basis).
If you feel it's worth changing the name, we should do so before the next release, though we'd have to make sure it gets set in both places (setting in stager seems too late).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware that there was such a pipeline option
It's used for example here:
pickler.set_library(pickle_library) |
it doesn't seem possible to set on a per-pipeline basis
I wonder if this is the same as: #21615
If you feel it's worth changing the name, we should do so before the next release
I think it would be worthwhile. Would someone working on xlang have bandwidth look into that?
If we are making cloudpickle default, I would consider tightening the bounds on cloudpickle to
cloudpickle>=2.2.0,<2.3.0 , and bumping that bound every beam release until we decide to vendor the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take on this change. Should we just re-use --pickle_library
like the pipeline option? (Actually fixing #21615 is out of scope for now.) It'd be another cherry-pick into the release branch.
Tightening the bounds feels like it should be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tightening the bounds feels like it should be a separate PR.
Sending #23120
This needs to be added here as it is a global option, not a pipeline option.
This is useful as dill has some issues with objects created from exec/eval
which are used for remotely-defined PythonCallable objects.
Also refactored the pickler choice logic a bit for clarity and better
error checking.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.