Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow expansion service to choose pickler. #22999

Merged
merged 3 commits into from
Sep 6, 2022

Conversation

robertwb
Copy link
Contributor

@robertwb robertwb commented Sep 1, 2022

This needs to be added here as it is a global option, not a pipeline option.
This is useful as dill has some issues with objects created from exec/eval
which are used for remotely-defined PythonCallable objects.

Also refactored the pickler choice logic a bit for clarity and better
error checking.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@robertwb
Copy link
Contributor Author

robertwb commented Sep 1, 2022

R: @chamikaramj It'd be good to get this in before the release cut--possibly considering changing the default (for Python-from-Java).

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2022

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@@ -68,10 +68,12 @@ def load_session(file_path):
def set_library(selected_library=DEFAULT_PICKLE_LIB):
""" Sets pickle library that will be used. """
global desired_pickle_lib
if selected_library == USE_DILL and desired_pickle_lib != dill_pickler:
if (selected_library == USE_DILL) != (desired_pickle_lib == dill_pickler):
Copy link
Contributor

@chamikaramj chamikaramj Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "(selected_library == USE_DILL) and" ?

Logic is bit hard to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new value of (pickler is dill) is different than the old value, we have to set (or unset) the hooks. Added a comment to clarify.

@chamikaramj
Copy link
Contributor

LGTM to merge after verifying the logic. Thanks.

This needs to be added here as it is a global option, not a pipeline option.
This is useful as dill has some issues with objects created from exec/eval
which are used for remotely-defined PythonCallable objects.

Also refactored the pickler choice logic a bit for clarity and better
error checking.
@codecov
Copy link

codecov bot commented Sep 6, 2022

Codecov Report

Merging #22999 (955922d) into master (ac3766a) will decrease coverage by 0.05%.
The diff coverage is 63.63%.

❗ Current head 955922d differs from pull request most recent head e1101fc. Consider uploading reports for the commit e1101fc to get more accurate results

@@            Coverage Diff             @@
##           master   #22999      +/-   ##
==========================================
- Coverage   73.68%   73.63%   -0.06%     
==========================================
  Files         714      716       +2     
  Lines       95219    95258      +39     
==========================================
- Hits        70165    70140      -25     
- Misses      23757    23821      +64     
  Partials     1297     1297              
Flag Coverage Δ
python 83.42% <63.63%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...beam/runners/portability/local_job_service_main.py 14.85% <25.00%> (+0.41%) ⬆️
sdks/python/apache_beam/internal/pickler.py 92.00% <85.71%> (-3.46%) ⬇️
...ks/python/apache_beam/runners/worker/statecache.py 89.69% <0.00%> (-6.47%) ⬇️
...dks/python/apache_beam/metrics/monitoring_infos.py 92.50% <0.00%> (-4.50%) ⬇️
...python/apache_beam/runners/worker/worker_status.py 76.66% <0.00%> (-3.05%) ⬇️
...n/apache_beam/ml/gcp/recommendations_ai_test_it.py 73.46% <0.00%> (-2.05%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.05% <0.00%> (-0.39%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.17% <0.00%> (-0.38%) ⬇️
...am/examples/inference/pytorch_language_modeling.py 0.00% <0.00%> (ø)
...examples/inference/pytorch_image_classification.py 0.00% <0.00%> (ø)
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@robertwb
Copy link
Contributor Author

robertwb commented Sep 6, 2022

Run Python_PVR_Flink PreCommit

@robertwb robertwb merged commit b9483a3 into apache:master Sep 6, 2022
@tvalentyn
Copy link
Contributor

cc: @ryanthompson591

@@ -129,6 +131,9 @@ def run(argv):
print('Server started at port', port)
return

if options.default_pickler:
pickler.set_library(options.default_pickler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set the --pickler_library pipeline option in this case?

Also noting the naming difference for a similar concept - should this be renamed into --pickler_library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that there was such a pipeline option (and it doesn't seem possible to set on a per-pipeline basis).

If you feel it's worth changing the name, we should do so before the next release, though we'd have to make sure it gets set in both places (setting in stager seems too late).

Copy link
Contributor

@tvalentyn tvalentyn Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that there was such a pipeline option

It's used for example here:

pickler.set_library(pickle_library)

it doesn't seem possible to set on a per-pipeline basis

I wonder if this is the same as: #21615

If you feel it's worth changing the name, we should do so before the next release

I think it would be worthwhile. Would someone working on xlang have bandwidth look into that?

If we are making cloudpickle default, I would consider tightening the bounds on cloudpickle to
cloudpickle>=2.2.0,<2.3.0 , and bumping that bound every beam release until we decide to vendor the library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take on this change. Should we just re-use --pickle_library like the pipeline option? (Actually fixing #21615 is out of scope for now.) It'd be another cherry-pick into the release branch.

Tightening the bounds feels like it should be a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@tvalentyn tvalentyn Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be another cherry-pick into the release branch.

Cherrypick, assuming #23111 is merged 'as is': #23118

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tightening the bounds feels like it should be a separate PR.

Sending #23120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants