Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: add Start function with a context to abort profiling #8632

Open
timofurrer opened this issue Oct 2, 2023 · 4 comments
Open

profiler: add Start function with a context to abort profiling #8632

timofurrer opened this issue Oct 2, 2023 · 4 comments
Assignees
Labels
api: cloudprofiler Issues related to the Cloud Profiler API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@timofurrer
Copy link

timofurrer commented Oct 2, 2023

Is your feature request related to a problem? Please describe.

I want to be able to stop the profiling and restart it - possibly with another config even.
However, the current API with profiler.Start() doesn't support that.

Describe the solution you'd like

Possibly a ctx argument for the Start() function, however, this is likely not possible due to backwards compatibility, therefore a new function could be added:

// Start starts a goroutine to collect and upload profiles. The
// caller must provide the service string in the config. See
// Config for details. Start should only be called once. Any
// additional calls will be ignored.
func Start(cfg Config, options ...option.ClientOption) error {
	startError := startOnce.do(func() error {
		return start(context.Background(), cfg, options...)
	})
	return startError
}

// StartWithContext starts a goroutine to collect and upload profiles. The
// caller must provide the service string in the config. See
// Config for details. StartWithContext should only be called once. Any
// additional calls will be ignored.
// When the context is cancalled the profiling is aborted
func StartWithContext(ctx context.Context, cfg Config, options ...option.ClientOption) error {
	startError := startOnce.do(func() error {
		return start(ctx, cfg, options...)
	})
	return startError
}

func start(ctx context.Context, cfg Config, options ...option.ClientOption) error {
	...
}

I think that pollProfilerService service also needs to change to respect exiting the for-loop on context cancellation.

When that is implemented I think the code and test code can be refactored to remove i < config.numProfiles and profilerDone channel and use a context during tests.

Describe alternatives you've considered

  • use a profiler.Config field with a context - that would avoid adding StartWithContext, but is also kinda weird from an API perspective

Additional context

I haven't looked at the profiler internals and I'm not sure if it's even possible to simply stop and restart the profiling without leaving any traces of it.

I'm also open to contribute this myself if you are willing to accept the change :)

@timofurrer timofurrer added the triage me I really want to be triaged. label Oct 2, 2023
@product-auto-label product-auto-label bot added the api: cloudprofiler Issues related to the Cloud Profiler API. label Oct 2, 2023
timofurrer added a commit to timofurrer/google-cloud-go that referenced this issue Oct 2, 2023
@timofurrer
Copy link
Author

I've created a draft PR at #8633

While I was working on it I realized that there are more "problems" - e.g. some state is globally set, like config, but obviously isn't easily reset when the context is cancelled.
There would be ways around this, like the StartWithContext() could return a stop function that would wrap the context cancellation and reset global state.
On the other hand, it's probably also not really an issue, because when starting again the state is overridden - and starting multiple profilers is anyways not a thing you'd want.

@aabmass since you got assigned to this - WDYT?

timofurrer added a commit to timofurrer/google-cloud-go that referenced this issue Oct 2, 2023
timofurrer added a commit to timofurrer/google-cloud-go that referenced this issue Oct 2, 2023
@aabmass
Copy link
Contributor

aabmass commented Oct 2, 2023

@timofurrer thanks for sending a PR and opening discussion. As you figured out, there is a lot of global state that makes this tricky since the initial assumption was that profiling would be started and never stopped for a process lifetime. We make the same assumption in the Python, NodeJS, and Java agents which also don't provide a stop function, so this would introduce some inconsistency and complexity.

I want to be able to stop the profiling and restart it - possibly with another config even.

Do you mind expanding on the use case a bit?

@timofurrer
Copy link
Author

timofurrer commented Oct 3, 2023

Do you mind expanding on the use case a bit?

Sure, so we have this application running in a customer environment and they / we want to be able to enable profiling from time to time - but not always. In general, this entire application is configurable remotely - and is also changed during runtime remotely, without having to restart it. It would be great if we could follow the same pattern with the profiler.

@aabmass btw. I'd be more than happy to contribute a "proper" PR (or a set of them) to gradually refactor to something "desired" -> for example remove the global state without breaking the public interface. And after that introduce an API which allows to properly start and stop a profiler. If you'd be willing to review and accept such a change then let me know and I'll contribute that.

EDIT: maybe something like this is a better start:

timofurrer added a commit to timofurrer/google-cloud-go that referenced this issue Oct 4, 2023
This change set refactors the profiler to reduce the global state
drastically. It tries to keep the changes minimal on purpose and with
that many opportunities to simplify the code have been left out.

This change set also prepares the profiler to allow for a stop API,
that is once the profiler has been started it can be stopped again and
restarted - even with a different config.

Refs googleapis#8632
@aabmass
Copy link
Contributor

aabmass commented Oct 4, 2023

@timofurrer thanks for prototyping this, I appreciate the effort! Unfortunately we can't accept such a contribution right now because of our support load. We'd like to keep complications to a minimum and not expand our feature set at all. We are happy to accept any bug fixes–just not feature requests right now.

You're of course welcome to release this new feature as a fork that you/your customers can pull in.

@aabmass aabmass added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed triage me I really want to be triaged. labels Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: cloudprofiler Issues related to the Cloud Profiler API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@timofurrer @aabmass and others