Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add OpenTelemetry tracing to the Publisher and Subscriber #2086

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

michaelpri10
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

If you write sample code, please follow the samples format.

@michaelpri10 michaelpri10 requested review from a team as code owners June 24, 2024 22:24
Copy link

snippet-bot bot commented Jun 24, 2024

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: pubsub Issues related to the googleapis/java-pubsub API. labels Jun 24, 2024
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Jun 24, 2024
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Jul 2, 2024
@michaelpri10 michaelpri10 changed the title feat: Add Publish-side OpenTelemetry tracing feat: Add OpenTelemetry tracing to the Publisher and Subscriber Jul 2, 2024
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-semconv</artifactId>
<version>1.26.0-alpha</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already specified in share-dependencies

<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious that are there any cases Truth and junit can not do?

int ackDeadline,
boolean isReceiptModack,
boolean enableOpenTelemetryTracing) {
if (enableOpenTelemetryTracing && tracer != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are passing enableOpenTelemetryTracing and tracer all the time, it seems we can at least extract them as fields and make OpenTelemetryUtil as a context aware wrapper of Tracer. In addition, we can have create an interface PubsubTracer that have all the injection points, and we can create an OpenTelemetryPubsubTracer only if enableOpenTelemetryTracing is true. Similar concepts can be applied to PubsubMessageWrapper as well.

We have this architecture in gax. There is a ApiTracer that defines all the injection points, and there is an OpenCensusTracer that is created conditionally, so we don't have to always check if instrumentation is enabled or not. This also has a clearer separation of concern that in case we need to move away from OpenTelemetry, we don't have to write all the code, only the implementation of the interface.

TopicName topicName = TopicName.parse(topic);
Attributes attributes =
createCommonSpanAttributesBuilder(
topicName.getTopic(), topicName.getProject(), "Publisher.publishCall", "publish")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code.function can just be the method name: https://opentelemetry.io/docs/specs/semconv/attributes-registry/code/

We should probably include the class name in code.namespace but it wasn't included in the design and I think it's not strictly necessary.

* Creates, starts, and returns a publish RPC span for the given message batch. Bi-directional
* links with the publisher parent span are created for sampled messages in the batch.
*/
public static final Span startPublishRpcSpan(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method would currently be exposed to users right? Is it possible to expose this method to other parts of the library, but not to users?

Comment on lines +1348 to +1351
SpanData flowControlSpanData = allSpans.get(0);
SpanData batchingSpanData = allSpans.get(1);
SpanData publishRpcSpanData = allSpans.get(2);
SpanData publisherSpanData = allSpans.get(3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these orders determined? It doesn't seem like it's chronological. Is it possible these change based on some race condition?

Attributes.builder()
.put(SemanticAttributes.MESSAGING_OPERATION, rpcOperation)
.build();
rpcSpan.addLink(message.getSubscriberSpan().getSpanContext(), linkAttributes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above: call addLink as part of span creation when possible.

if (message.getPublisherSpan().getSpanContext().isSampled()) {
Attributes linkAttributes =
Attributes.builder().put(SemanticAttributes.MESSAGING_OPERATION, "publish").build();
publishRpcSpan.addLink(message.getPublisherSpan().getSpanContext(), linkAttributes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do addLink earlier rather than here? It might make the code a bit more complicated, but it's preferred to do so as part of the span builder: https://javadoc.io/doc/io.opentelemetry/opentelemetry-api/latest/io/opentelemetry/api/trace/Span.html#addLink(io.opentelemetry.api.trace.SpanContext)

This is due to sampling decisions not respecting addLink called after span creation.

.setOrderingKey(ORDERING_KEY)
.build();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any way of testing the performance degradation of this change when EnableOpenTelemetry=false? We want to make sure that users not interested in tracing are not impacted. I suspect the changes are fairly small since we aren't creating any spans, but it would be nice to see the impact of the extra data structures / wrappers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/java-pubsub API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants