Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore dns_config changes for Autopilot clusters #8654

Merged

Conversation

dpasiukevich
Copy link
Contributor

@dpasiukevich dpasiukevich commented Aug 15, 2023

Recently, GKE Autopilot has changed the default dns_config to be:

dns_config {
    cluster_dns        = "CLOUD_DNS"
    cluster_dns_domain = "cluster.local"
    cluster_dns_scope  = "CLUSTER_SCOPE"
  }

Customers are not allowed to modify dns_confg in GKE Autopilot. It's a pre-configured feature/config.

But this change still affects Autopilot customers as the terraform tries to converge back to dns_config=null (original default value).

To fix this and to be aligned with the fact that it's not allowed to modify dns_config in Autopilot, this PR makes google_container_cluster to ignore dns_config changes.

Issues:
Fixes hashicorp/terraform-provider-google#15484
Fixes hashicorp/terraform-provider-google#15454

If this PR is for Terraform, I acknowledge that I have:

  • Searched through the issue tracker for an open issue that this either resolves or contributes to, commented on it to claim it, and written "fixes {url}" or "part of {url}" in this PR description. If there were no relevant open issues, I opened one and commented that I would like to work on it (not necessary for very small changes).
  • Ensured that all new fields I added that can be set by a user appear in at least one example (for generated resources) or third_party test (for handwritten resources or update tests).
  • Generated Terraform providers, and ran make test and make lint in the generated providers to ensure it passes unit and linter tests.
  • Ran relevant acceptance tests using my own Google Cloud project and credentials (If the acceptance tests do not yet pass or you are unable to run them, please let your reviewer know).
  • Read Write release notes before writing my release note below.

Release Note Template for Downstream PRs (will be copied)

container: updated `resource_container_cluster` to ignore `dns_config` diff when `enable_autopilot = true` 

Autopilot does not allow to modify dns_config.

Recently, in Autopilot the default dns_config has changed to be:
```
dns_config {
    cluster_dns        = "CLOUD_DNS"
    cluster_dns_domain = "cluster.local"
    cluster_dns_scope  = "CLUSTER_SCOPE"
  }
```

This breaks Autopilot customers as the terraform tries to converge dns_config back to null.
@modular-magician
Copy link
Collaborator

Hello! I am a robot. It looks like you are a: Community Contributor Googler Core Contributor. Tests will run automatically.

@SarahFrench, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

@dpasiukevich
Copy link
Contributor Author

dpasiukevich commented Aug 15, 2023

To the reviewer: may I ask - does this change ensure that dns_config for autopilot will be ignored for any command/stage for resource_container_cluster resource?

I'm 95% confident it will be, but I'm double checking (as I'm not that familiar with the code-base & terraform commands).

@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

Terraform GA: Diff ( 1 file changed, 6 insertions(+), 5 deletions(-))
Terraform Beta: Diff ( 1 file changed, 6 insertions(+), 5 deletions(-))

@dpasiukevich
Copy link
Contributor Author

Ran the testacc

make testacc TEST=./google/services/container TESTARGS='-run=TestAccContainerCluster_withDNSConfig'
TF_ACC=1 TF_SCHEMA_PANIC_ON_ERROR=1 go test ./google/services/container -v -run=TestAccContainerCluster_withDNSConfig -timeout 240m -ldflags="-X=github.com/hashicorp/terraform-provider-google/version.ProviderVersion=acc"
=== RUN   TestAccContainerCluster_withDNSConfig
=== PAUSE TestAccContainerCluster_withDNSConfig
=== CONT  TestAccContainerCluster_withDNSConfig
--- PASS: TestAccContainerCluster_withDNSConfig (633.28s)
PASS
ok      github.com/hashicorp/terraform-provider-google/google/services/container        633.637s

@SarahFrench
Copy link
Collaborator

I also think that it'll work; the suppressDiffForAutopilot function will return true, causing diffs to be suppressed, if the enable_autopilot field is set to true on the resource.

The best way to confirm it would be with a test. I can see that our acceptance tests that run every night includes TestAccContainerCluster_autopilot_minimal, and this test is currently failing due to the issue you highlighted (see below). Hopefully the automated tests running on this PR will show the test pass.

------- Stdout: -------
=== RUN   TestAccContainerCluster_autopilot_minimal
=== PAUSE TestAccContainerCluster_autopilot_minimal
=== CONT  TestAccContainerCluster_autopilot_minimal
    vcr_utils.go:152: Step 1/2 error: After applying this test step, the plan was not empty.
        stdout:
        Terraform used the selected providers to generate the following execution
        plan. Resource actions are indicated with the following symbols:
        -/+ destroy and then create replacement
        Terraform will perform the following actions:
          # google_container_cluster.primary must be replaced
        -/+ resource "google_container_cluster" "primary" {

... ommitted a bunch of stuff ...

              - dns_config { # forces replacement
                  - cluster_dns        = "CLOUD_DNS" -> null
                  - cluster_dns_domain = "cluster.local" -> null
                  - cluster_dns_scope  = "CLUSTER_SCOPE" -> null
                }

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 2951
Passed tests 2647
Skipped tests: 302
Affected tests: 2

Action taken

Found 2 affected test(s) by replaying old test recordings. Starting RECORDING based on the most recent commit. Click here to see the affected tests
TestAccCertificateManagerCertificate_certificateManagerSelfManagedCertificateExample|TestAccCertificateManagerCertificate_certificateManagerGoogleManagedCertificateIssuanceConfigExample

Get to know how VCR tests work

@modular-magician
Copy link
Collaborator

$\textcolor{green}{\textsf{Tests passed during RECORDING mode:}}$
TestAccCertificateManagerCertificate_certificateManagerSelfManagedCertificateExample[Debug log]
TestAccCertificateManagerCertificate_certificateManagerGoogleManagedCertificateIssuanceConfigExample[Debug log]

Rerun these tests in REPLAYING mode to catch issues

$\textcolor{green}{\textsf{No issues found for passed tests after REPLAYING rerun.}}$


$\textcolor{green}{\textsf{All tests passed!}}$
View the build log or the debug log for each test

@SarahFrench
Copy link
Collaborator

👆 I think that because the tests are running off recorded API responses the problem isn't apparent above. I'll manually run a test using this PR's changes soon

@dpasiukevich
Copy link
Contributor Author

dpasiukevich commented Aug 16, 2023

Cool, thanks!

I also did already verify the change manually yesterday (with the override to the built provider) and it works as expected.

ps: this RECORDING/REPLAYING mode looks interesting. It did handle the unrelated flaky test gracefully IIUC?

@dpasiukevich
Copy link
Contributor Author

@SarahFrench may I ask for review if possible?

I have verified manually autopilot cluster create/update with resource_container_cluster

Thanks!

@SarahFrench SarahFrench requested review from slevenick and removed request for SarahFrench August 17, 2023 18:33
@slevenick
Copy link
Contributor

I'm fine with approving this, but I have concerns about the issue impacting earlier versions of the provider. I'd like to wait on merging until we have a discussion around other potential fixes for this issue

@dpasiukevich
Copy link
Contributor Author

I'm fine with approving this, but I have concerns about the issue impacting earlier versions of the provider. I'd like to wait on merging until we have a discussion around other potential fixes for this issue

This PR is needed anyway. It goes along with the fact that consumers of GKE Autopilot API cannot modify the dns_config. So there's no point for tf to track this field.
In the ideal world this change should've been from the point when enable_autopilot flag was introduced in this resource.

By holding this PR it won't make anything better. It's definitely better to split and move the discussion about the breaking change of Autopilot API to the issues.

Copy link
Contributor

@slevenick slevenick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of a special case because of autopilot, but we need to seriously consider the cost of these sort of API changes and how they impact Terraform users going forward.

@slevenick slevenick merged commit 06fa78a into GoogleCloudPlatform:main Aug 17, 2023
17 checks passed
joelkattapuram pushed a commit to joelkattapuram/magic-modules that referenced this pull request Sep 20, 2023
…#8654)

Autopilot does not allow to modify dns_config.

Recently, in Autopilot the default dns_config has changed to be:
```
dns_config {
    cluster_dns        = "CLOUD_DNS"
    cluster_dns_domain = "cluster.local"
    cluster_dns_scope  = "CLUSTER_SCOPE"
  }
```

This breaks Autopilot customers as the terraform tries to converge dns_config back to null.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants