Skip to content

Allow host cert renewals even if client auth strictness is false#4852

Merged
yadvr merged 1 commit intoapache:mainfrom
ippathways:sml-master-cert-renewal
Sep 1, 2021
Merged

Allow host cert renewals even if client auth strictness is false#4852
yadvr merged 1 commit intoapache:mainfrom
ippathways:sml-master-cert-renewal

Conversation

@Slair1
Copy link
Contributor

@Slair1 Slair1 commented Mar 22, 2021

Description

This PR allows the management servers to renew certificates issued to hosts, even if ca.plugin.root.auth.strictness is false. There is already a global setting named ca.framework.cert.automatic.renewal that should be controlling the certificate renewals. However, ca.plugin.root.auth.strictness currently circumvents that setting.

More Detail

The ca.plugin.root.auth.strictness setting is checked early in the client certificate checking method and exits the method without adding the certificate to the management server's in-memory certificate map if strictness is false. The certificate renewal process (if enabled with ca.framework.cert.automatic.renewal) reads that in-memory certificate map to determine what certs it could renew. Those two settings shouldn't be so closely related.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • [X ] Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

We've deployed and tested this in our CloudStack environments.

@DaanHoogland DaanHoogland added this to the 4.16.0.0 milestone Mar 23, 2021
@yadvr yadvr requested review from nvazquez and yadvr March 24, 2021 06:46
@Slair1 Slair1 force-pushed the sml-master-cert-renewal branch 4 times, most recently from 0d54a45 to c6ba1bf Compare March 24, 2021 16:26
@Slair1 Slair1 marked this pull request as draft March 24, 2021 16:28
@Slair1 Slair1 force-pushed the sml-master-cert-renewal branch from c6ba1bf to 3a90514 Compare March 24, 2021 18:15
@Slair1 Slair1 force-pushed the sml-master-cert-renewal branch 3 times, most recently from 8a83668 to b029acf Compare April 2, 2021 21:29
@Slair1 Slair1 marked this pull request as ready for review April 4, 2021 23:17
@Slair1
Copy link
Contributor Author

Slair1 commented Apr 4, 2021

Update PR with more tests and now when not in strict mode it will still log issues with certificates, but just accept the cert still.

@yadvr yadvr requested a review from sureshanaparti April 5, 2021 09:31
includes updated tests and logging
@Slair1 Slair1 force-pushed the sml-master-cert-renewal branch from b029acf to a56e844 Compare April 19, 2021 17:23
@Slair1
Copy link
Contributor Author

Slair1 commented May 12, 2021

Let me know if i can do anything else to help this along

@DaanHoogland
Copy link
Contributor

@sureshanaparti do you approve now?

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM.

@yadvr
Copy link
Member

yadvr commented May 25, 2021

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 87

@yadvr
Copy link
Member

yadvr commented May 25, 2021

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-753)

@blueorangutan
Copy link

Trillian test result (tid-764)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32757 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4852-t764-kvm-centos7.zip
Smoke tests completed. 88 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@yadvr
Copy link
Member

yadvr commented Jun 18, 2021

Needs some manual testing/validation, otherwise lgtm

@nvazquez
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 338

@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1070)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37786 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4852-t1070-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Smoke tests completed. 87 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestKubernetesCluster>:teardown Error 71.22 test_kubernetes_clusters.py

@nvazquez
Copy link
Contributor

@weizhouapache can you please test this PR?

@weizhouapache weizhouapache self-requested a review July 14, 2021 11:45
Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Slair1 @nvazquez
tested ok. except two small questions.

(1) change global settings (just for my testing, do not set it on production)
ca.framework.cert.expiry.alert.period , from 15 to 1500 (days)
ca.framework.background.task.delay, from 3600 to 60 (seconds)
ca.plugin.root.auth.strictness , from true to false.

(2) add debug info in server/src/main/java/org/apache/cloudstack/ca/CAManagerImpl.java

             try {
-                if (LOG.isTraceEnabled()) {
-                    LOG.trace("CA background task is running...");
+                if (LOG.isDebugEnabled()) {
+                    LOG.debug("CA background task is running...");
                 }

(3) without this patch
$ egrep "Certificate is going to expire|CA background task is running" /var/log/cloudstack/management/management-server.log

2021-07-15 07:12:03,827 DEBUG [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-2:ctx-59f9ec37) (logid:7d7a59b2) CA background task is running...
2021-07-15 07:13:30,314 DEBUG [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-5:ctx-e17c3499) (logid:de3f6c2a) CA background task is running...
2021-07-15 07:14:58,516 DEBUG [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-4:ctx-13297c4e) (logid:4dc266a0) CA background task is running...
2021-07-15 07:16:26,420 DEBUG [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-5:ctx-13d9b8c4) (logid:10203dcf) CA background task is running...

(4) with the patch

$ egrep "Certificate is going to expire|CA background task is running" /var/log/cloudstack/management/management-server.log

2021-07-15 09:12:13,759 DEBUG [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-6:ctx-a913172f) (logid:4c23e662) CA background task is running...
2021-07-15 09:12:13,761 WARN [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-6:ctx-a913172f) (logid:4c23e662) Certificate is going to expire for host id=5, uuid=060e2261-8214-4385-b0d4-ce90b6fe2786, name=s-79-VM, ip=10.0.36.8, zone id=1
2021-07-15 09:12:23,912 WARN [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-6:ctx-a913172f) (logid:4c23e662) Certificate is going to expire for host id=2, uuid=fd8384ff-fe21-41e2-84ea-bc70e6d26b97, name=v-1-VM, ip=10.0.36.3, zone id=1
2021-07-15 09:12:34,978 WARN [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-6:ctx-a913172f) (logid:4c23e662) Certificate is going to expire for host id=1, uuid=4ea38782-280c-4122-aea8-63682c16c115, name=ref-trl-1000-k-Mu20-wei-zhou-kvm2, ip=10.0.34.159, zone id=1
2021-07-15 09:12:40,425 WARN [o.a.c.c.CAManagerImpl] (BackgroundTaskPollManager-6:ctx-a913172f) (logid:4c23e662) Certificate is going to expire for host id=4, uuid=d9988a7b-7e02-4e90-9c4a-798c588b1cba, name=ref-trl-1000-k-Mu20-wei-zhou-kvm1, ip=10.0.33.203, zone id=1

final String errorMsg = String.format("Client certificate has expired with serial=%x, subject=%s from address=%s",
primaryClientCertificate.getSerialNumber(), primaryClientCertificate.getSubjectDN(), clientAddress);
LOG.error(errorMsg);
if (!allowExpiredCertificate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Slair1 what if allowExpiredCertificate is true ? log the error and silently ignore it ?
should it be appended to exceptionMsg ?

@nvazquez
Copy link
Contributor

nvazquez commented Jul 16, 2021

Thanks @weizhouapache, looks good. @Slair1 can you please check the open comment? Then will be good to go

@yadvr
Copy link
Member

yadvr commented Jul 30, 2021

Ping @slavkap thanks for the PR, it seems like an important fix. Can you address the outstanding comment so we can move forward with this PR?

@slavkap
Copy link
Contributor

slavkap commented Jul 30, 2021

@rhtyd, unfortunately, I don't have anything with this PR

@yadvr
Copy link
Member

yadvr commented Aug 2, 2021

My bad @slavkap sorry I wrongly selected the Github ID when I typed `@sla... and selected your handle.

I meant @Slair1 can you address the outstanding comment so we can move forward with this PR?

@yadvr
Copy link
Member

yadvr commented Aug 9, 2021

@Slair1 can you address the outstanding comment so we can move forward with this PR?

@yadvr
Copy link
Member

yadvr commented Aug 19, 2021

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 929

@yadvr
Copy link
Member

yadvr commented Aug 19, 2021

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1722)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 34368 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4852-t1722-kvm-centos7.zip
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@weizhouapache
Copy link
Member

@nvazquez @rhtyd
since we have 2 approvals (including 1 manual test) and integration test passed, can we merge this pr ?

@yadvr yadvr merged commit 76d5ce3 into apache:main Sep 1, 2021
@yadvr
Copy link
Member

yadvr commented Sep 1, 2021

Merging this based on your comment @weizhouapache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants