[connector/servicegraph] Unit test failure: TestConnectorConsume/test_fix_failed_label_not_work #33998

crobert-1 · 2024-07-09T19:54:56Z

Component(s)

connector/servicegraph

Describe the issue you're reporting

Failure occurred in unrelated PR, running on ubuntu.

Failure output:

Running target 'test-with-cover' in module 'connector/servicegraphconnector' as part of group 'connector'
make --no-print-directory -C connector/servicegraphconnector test-with-cover
mkdir -p /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/coverage/unit
/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 600s -parallel 4 --tags="" -cover -covermode=atomic -args -test.gocoverdir="/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/coverage/unit"
✖  . (79ms) (coverage: 87.3% of statements)
✓  internal/metadata (1.02s) (coverage: 83.3% of statements)
✓  internal/store (1.121s) (coverage: 92.7% of statements)

DONE 31 tests, 2 failures in 7.887s

✖  . (38ms) (coverage: 87.3% of statements)

=== Failed
make[2]: *** [../../Makefile.Common:131: test-with-cover] Error 1
=== FAIL: . TestConnectorConsume/test_fix_failed_label_not_work (0.01s)
make[1]: *** [Makefile:179: connector/servicegraphconnector] Error 2
    logger.go:146: 2024-07-09T18:45:05.793Z	INFO	Started servicegraphconnector
    logger.go:146: 2024-07-09T18:45:05.793Z	INFO	Shutting down servicegraphconnector
    logger.go:146: 2024-07-09T18:45:05.796Z	DEBUG	edge completed	{"client_service": "foo", "server_service": "bar", "connection_type": "", "trace_id": "b3a53e9c2ac533afc48b511c2f368dc1"}
    logger.go:146: 2024-07-09T18:45:05.796Z	DEBUG	edge completed	{"client_service": "foo", "server_service": "bar", "connection_type": "", "trace_id": "c6aafb2dcf4d71c81ae52e50e8f805ee"}
make: *** [Makefile:131: gotest-with-cover] Error 2
    logger.go:146: 2024-07-09T18:45:05.796Z	DEBUG	edge completed	{"client_service": "foo", "server_service": "bar", "connection_type": "", "trace_id": "7af657de319d6a57967bc241882ea94f"}
    connector_test.go:129: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/connector/servicegraphconnector/connector_test.go:129
        	Error:      	Received unexpected error:
        	            	the following errors occurred:
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_client_seconds": missing expected datapoint: map[client:foo connection_type: failed:false server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_client_seconds": unexpected datapoint: map[client:foo connection_type: failed:true server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_client_seconds": missing expected datapoint: map[client:foo connection_type: failed:true server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_client_seconds": unexpected datapoint: map[client:foo connection_type: failed:false server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": missing expected datapoint: map[client:foo connection_type: failed:false server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": unexpected datapoint: map[client:foo connection_type: failed:true server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": missing expected datapoint: map[client:foo connection_type: failed:true server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": unexpected datapoint: map[client:foo connection_type: failed:false server:bar]
        	Test:       	TestConnectorConsume/test_fix_failed_label_not_work

=== FAIL: . TestConnectorConsume (0.01s)

=== FAIL: . TestConnectorConsume/test_fix_failed_label_not_work (re-run 1) (0.01s)
    logger.go:146: 2024-07-09T18:45:07.902Z	INFO	Started servicegraphconnector
    logger.go:146: 2024-07-09T18:45:07.902Z	INFO	Shutting down servicegraphconnector
    logger.go:146: 2024-07-09T18:45:07.904Z	DEBUG	edge completed	{"client_service": "foo", "server_service": "bar", "connection_type": "", "trace_id": "b3a53e9c2ac533afc48b511c2f368dc1"}
    logger.go:146: 2024-07-09T18:45:07.905Z	DEBUG	edge completed	{"client_service": "foo", "server_service": "bar", "connection_type": "", "trace_id": "c6aafb2dcf4d71c81ae52e50e8f805ee"}
    logger.go:146: 2024-07-09T18:45:07.905Z	DEBUG	edge completed	{"client_service": "foo", "server_service": "bar", "connection_type": "", "trace_id": "7af657de[319](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9854192691/job/27233186145?pr=33458#step:9:320)d6a57967bc241882ea94f"}
    connector_test.go:129: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/connector/servicegraphconnector/connector_test.go:129
        	Error:      	Received unexpected error:
        	            	the following errors occurred:
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": missing expected datapoint: map[client:foo connection_type: failed:false server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": unexpected datapoint: map[client:foo connection_type: failed:true server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": missing expected datapoint: map[client:foo connection_type: failed:true server:bar]
        	            	 -  resource "map[]": scope "traces_service_graph": metric "traces_service_graph_request_server_seconds": unexpected datapoint: map[client:foo connection_type: failed:false server:bar]
        	Test:       	TestConnectorConsume/test_fix_failed_label_not_work

=== FAIL: . TestConnectorConsume (re-run 1) (0.01s)

DONE 2 runs, 33 tests, 4 failures in 8.913s
make[1]: Leaving directory '/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib'
Error: Process completed with exit code 2.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-07-09T19:55:12Z

Pinging code owners:

connector/servicegraph: @jpkrohling @mapno @JaredTan95

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 · 2024-07-09T23:14:12Z

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9861657624/job/27230596801

crobert-1 · 2024-07-10T15:16:07Z

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9869760614/job/27254198175?pr=34014

crobert-1 · 2024-07-10T15:42:35Z

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9874452305/job/27269074185?pr=33727

jpkrohling · 2024-07-11T08:18:37Z

I talked to @t00mas and he'll be looking into this soon.

evantorrie · 2024-07-11T08:36:30Z

The problem is due to the servicegraph connector unit tests making use of pmetrictest.CompareMetrics() with a golden set of metrics using IgnoreMetricsOrder() option, but where there are two different instances of a metric with the same name but different datapoint attributes (this seems to mostly (only?) affect comparison of Histograms).

pmetrictest.CompareScopeMetrics() considers two metrics from the same scope to be "matched" simply by comparing their names.

When the servicegraph connector test fails, it's because the names of consecutive metrics are the same, but the datapoint attributes are different between the actual metric and the expected metric from the golden dataset.

Perhaps IgnoreMetricAttributeValue() should be passed during this test?

Alternatively, perhaps pmetrictest.CompareScopeMetrics() with IgnoreMetricOrder() should consider more than just the name when deciding whether their identity is the same.

jpkrohling · 2024-07-11T08:38:27Z

I think that IgnoreMetricOrder() should also ignore the order of the attributes.

evantorrie · 2024-07-11T16:15:41Z

I don't completely understand the servicegraph metrics implementation, but on further investigation, I do wonder if the way the metrics are created in calls like collectServerLatencyMetrics and collectClientLatencyMetrics is incorrect.

In each case it ranges over a key, and creates a new metric each time, even though the name of the metric is the same in all cases.

My understanding was that if this were inside an application using the Otel SDK that the SDK would aggregate all those keys (which I think represent different attribute sets) into a single metric entry but with multiple datapoints. The servicegraphconnector code on the other hand, seems to create multiple entries for the same metric, with just one datapoint per metric entry.

Perhaps this is breaking the preconditions which pmetrictest.CompareScopeMetrics() is expecting - which appear to be only one entry per unique metric name in a single ScopeMetrics list of metrics.

crobert-1 · 2024-07-12T14:50:50Z

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9903266660/job/27359747148?pr=33224

mwear · 2024-07-15T21:47:32Z

+1 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9899340509/job/27450734347?pr=34049

songy23 · 2024-07-16T13:02:34Z

+2 freq: #34092 #34094

…Order option to CompareMetricsOption (#34076) By introducing the new option and using it in the `servicegraphconnector` test, the flakyness of the test and therefore false positives are resolved Fixes #33998 --------- Signed-off-by: odubajDT <[email protected]>

mwear · 2024-07-16T22:07:27Z

This still appears to be happening: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9964617070/job/27533331662?pr=34104

songy23 · 2024-07-16T22:13:27Z

+1 still happening https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9963973411/job/27531379259?pr=34115

crobert-1 · 2024-07-16T22:18:03Z

+1 freq: #33994

songy23 · 2024-07-16T22:36:51Z

Given the frequency / flakiness I would suggest to skip this test for now #34120

Related to #33998

…peMetrics metric entry (#34070) Tests in `connector/servicegraph` were failing because the servicegraph `buildMetrics()` code was creating multiple metric entries for the same metric name within a single MetricScope. Although this may not be forbidden by the Otel specification, I think there is a general assumption that a metric name does not appear more than once within the same MetricScope. Instead, different values (e.g. with different sets of attribute values) should be created as separate datapoints within the same metric. The `pmetrictest.CompareScopeMetrics` test functionality is not designed to handle multiple metric entries with the same `Name()`. Instead, it is assumed that in cases where Order is ignored, the [first entry found in the actual metrics which matches the name of the expected metrics](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/ca9a8d14df471ed6a7dda7562ddfe659ec52127d/pkg/pdatatest/pmetrictest/metrics.go#L186-L195) *must* be the metric to compare. This fix changes the `buildMetrics()` code to create one metric within a scope, and instead create multiple datapoints per metric when there are entries where the datapoint attribute set is unique (i.e. all entries in the internal maps `serviceGraphConnector.req{Total,FailedTotal,ServerDurationSecondsCount}` are coalesced into a single named metric as appropriate.) _Note_: I don't have any past experience working with servicegraphconnector, but just observing that `collectClientLatencyMetrics()` and `collectServerLatencyMetrics()` both range over the same map - `p.reqServerDurationSecondsCount`, although the actual values collected in `collectClientLatencyMetrics()` are from `p.reqServerDurationSeconds{Count,Sum,BucketCounts}` and the values collected in `collectServerLatencyMetrics()` are from `p.reqClientDurationSeconds{Count,Sum,BucketCounts}`. This seems a little asymmetrical, but I don't have enough experience to say whether this is an error or not. **Description:** Fixes #33998 **Testing:** All other unit tests now complete, and the previously failing unit test now works reliably. **Documentation:** No documentation added. This is a unit test fix.

crobert-1 added the needs triage New item requiring triage label Jul 9, 2024

github-actions bot added the connector/servicegraph label Jul 9, 2024

jpkrohling added ci-cd CI, CD, testing, build issues and removed needs triage New item requiring triage labels Jul 11, 2024

evantorrie mentioned this issue Jul 11, 2024

[processor/k8sattributes] Add container.image.repo_digests #34031

Merged

crobert-1 mentioned this issue Jul 12, 2024

[receiver/sqlserver] Properly scrape DB when metrics are enabled #34065

Merged

evantorrie mentioned this issue Jul 15, 2024

[connector/servicegraph] Coalesce different attr sets into single ScopeMetrics metric entry #34070

Merged

mwear mentioned this issue Jul 15, 2024

[extension/healthcheckv2] Add support for streaming Watch RPC to gRPC service #34049

Merged

github-actions bot mentioned this issue Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Open

jpkrohling mentioned this issue Jul 16, 2024

[chore][pkg/pdataset/pmetrictest] introduce IgnoreDatapointAttributesOrder option to CompareMetricsOption #34076

Merged

This was referenced Jul 16, 2024

Update module github.com/prometheus/prometheus to v0.53.1 #34094

Merged

Update module github.com/hashicorp/consul/api to v1.29.2 #34092

Merged

songy23 added the flaky test a test is flaky label Jul 16, 2024

jpkrohling closed this as completed in #34076 Jul 16, 2024

crobert-1 mentioned this issue Jul 16, 2024

Update module github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common to v1.0.965 #34097

Merged

1 task

mwear mentioned this issue Jul 16, 2024

[telemetrygen] use log sdk instead of pdata #34104

Merged

songy23 reopened this Jul 16, 2024

songy23 mentioned this issue Jul 16, 2024

Update module golang.org/x/vuln to v1.1.3 #34115

Merged

1 task

crobert-1 mentioned this issue Jul 16, 2024

32669 sumconnector config #33994

Merged

songy23 mentioned this issue Jul 16, 2024

[chore] skip flaky test #34120

Merged

codeboten pushed a commit that referenced this issue Jul 16, 2024

[chore] skip flaky test (#34120)

ed3a56a

Related to #33998

jpkrohling closed this as completed in #34070 Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[connector/servicegraph] Unit test failure: TestConnectorConsume/test_fix_failed_label_not_work #33998

[connector/servicegraph] Unit test failure: TestConnectorConsume/test_fix_failed_label_not_work #33998

crobert-1 commented Jul 9, 2024

github-actions bot commented Jul 9, 2024

crobert-1 commented Jul 9, 2024

crobert-1 commented Jul 10, 2024

crobert-1 commented Jul 10, 2024 •

edited

Loading

jpkrohling commented Jul 11, 2024

evantorrie commented Jul 11, 2024

jpkrohling commented Jul 11, 2024

evantorrie commented Jul 11, 2024

crobert-1 commented Jul 12, 2024

mwear commented Jul 15, 2024

songy23 commented Jul 16, 2024

mwear commented Jul 16, 2024

songy23 commented Jul 16, 2024

crobert-1 commented Jul 16, 2024

songy23 commented Jul 16, 2024

[connector/servicegraph] Unit test failure: TestConnectorConsume/test_fix_failed_label_not_work #33998

[connector/servicegraph] Unit test failure: TestConnectorConsume/test_fix_failed_label_not_work #33998

Comments

crobert-1 commented Jul 9, 2024

Component(s)

Describe the issue you're reporting

github-actions bot commented Jul 9, 2024

crobert-1 commented Jul 9, 2024

crobert-1 commented Jul 10, 2024

crobert-1 commented Jul 10, 2024 • edited Loading

jpkrohling commented Jul 11, 2024

evantorrie commented Jul 11, 2024

jpkrohling commented Jul 11, 2024

evantorrie commented Jul 11, 2024

crobert-1 commented Jul 12, 2024

mwear commented Jul 15, 2024

songy23 commented Jul 16, 2024

mwear commented Jul 16, 2024

songy23 commented Jul 16, 2024

crobert-1 commented Jul 16, 2024

songy23 commented Jul 16, 2024

crobert-1 commented Jul 10, 2024 •

edited

Loading