Skip to content

Upgrade: check systemvm template before db changes#4582

Merged
yadvr merged 3 commits intoapache:4.15from
ustcweizhou:4.15-check-systemvm-template-before-db
Feb 24, 2021
Merged

Upgrade: check systemvm template before db changes#4582
yadvr merged 3 commits intoapache:4.15from
ustcweizhou:4.15-check-systemvm-template-before-db

Conversation

@ustcweizhou
Copy link
Contributor

Description

Currently in upgrade cloudstack checks the systemvm template after db changes.
if user do not install systemvm template and uprgade cloudstack (for example install packages with newer version), upgrade will fail. in this case it will be very complicated to rollback the db if there is no db backup.

This pr checks the latest (only the last) systemvm template in upgrade path, and exits if required systemvm template is not registered.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

@ustcweizhou
Copy link
Contributor Author

@davidjumani @rhtyd

@yadvr
Copy link
Member

yadvr commented Jan 12, 2021

@ustcweizhou isn't the whole db upgrade path wrapped in a transaction? I suppose the whole transaction would fail without committing anything to the DB?

@weizhouapache
Copy link
Member

@ustcweizhou isn't the whole db upgrade path wrapped in a transaction? I suppose the whole transaction would fail without committing anything to the DB?

@rhtyd
unfortunately no...
In each upgrade, the transaction will be started, committed and finally closed.
https://github.com/apache/cloudstack/blob/master/engine/schema/src/main/java/com/cloud/upgrade/DatabaseUpgradeChecker.java#L240-L310
it is bad, but not the worst.

new system template is checked in performMigration. Before that, sql in script will be committed
(1) autocommit is true.
in https://github.com/apache/cloudstack/blob/master/framework/db/src/main/java/com/cloud/utils/db/ScriptRunner.java#L148-L150
(2) autocommit is false.
https://github.com/apache/cloudstack/blob/master/framework/db/src/main/java/com/cloud/utils/db/ScriptRunner.java#L181-L183

options are
(1) check systemvm template at the beginning of whole upgrade, or
(2) check systemvm template at each version upgrade.

@ustcweizhou ustcweizhou force-pushed the 4.15-check-systemvm-template-before-db branch from 575cbcd to 7e9babc Compare January 12, 2021 12:33
Copy link
Contributor

@RodrigoDLopez RodrigoDLopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good functionality, good code.

Comment on lines +241 to +265
for (int i = upgrades.length - 1; i >= 0; i--) {
DbUpgrade upgrade = upgrades[i];
if (upgrade instanceof DbUpgradeSystemVmTemplate) {
TransactionLegacy txn = TransactionLegacy.open("Upgrade");
txn.start();
try {
Connection conn;
try {
conn = txn.getConnection();
} catch (SQLException e) {
String errorMessage = "Unable to upgrade the database";
s_logger.error(errorMessage, e);
throw new CloudRuntimeException(errorMessage, e);
}
((DbUpgradeSystemVmTemplate)upgrade).updateSystemVmTemplates(conn);
break;
} catch (CloudRuntimeException e) {
String errorMessage = "Unable to upgrade the database";
s_logger.error(errorMessage, e);
throw new CloudRuntimeException(errorMessage, e);
} finally {
txn.close();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this in a separate method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland yes, done

@DaanHoogland
Copy link
Contributor

test proposal:

  1. don't upload templates,
  2. upgrade,
  3. start,
  4. stop,
  5. downgrade,
  6. start,
  7. upload templates,
  8. stop,
  9. start.
    agree?

@weizhouapache
Copy link
Member

test proposal:

  1. don't upload templates,
  2. upgrade,
  3. start,
  4. stop,
  5. downgrade,
  6. start,
  7. upload templates,
  8. stop,
  9. start.
    agree?

@DaanHoogland yes, exactly

@weizhouapache
Copy link
Member

@DaanHoogland @rhtyd
can you kick off trillian test ?

@shwstppr shwstppr added this to the 4.15.1.0 milestone Jan 25, 2021
@shwstppr
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2583

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@weizhouapache
Copy link
Member

jenkins build failed at point below. close/open this pr to retry.

[INFO] --< org.apache.cloudstack:cloud-plugin-integrations-cloudian-connector >--
[INFO] Building Apache CloudStack Plugin - Cloudian Connector 4.15.1.0-SNAPSHOT [74/116]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-checkstyle-plugin:3.1.0:check (cloudstack-checkstyle) @ cloud-plugin-integrations-cloudian-connector ---
[INFO] Starting audit...
Audit done.
[INFO] 
[INFO] --- jacoco-maven-plugin:0.8.3:prepare-agent (prepare-coverage-agent) @ cloud-plugin-integrations-cloudian-connector ---
[INFO] argLine set to -javaagent:/home/jenkins/.m2/repository/org/jacoco/org.jacoco.agent/0.8.3/org.jacoco.agent-0.8.3-runtime.jar=destfile=/home/jenkins/jenkins-agent/workspace/Cloudstack/cloudstack-pr-analysis/plugins/integrations/cloudian/target/jacoco.exec
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.3:process (default) @ cloud-plugin-integrations-cloudian-connector ---
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ cloud-plugin-integrations-cloudian-connector ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ cloud-plugin-integrations-cloudian-connector ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 10 source files to /home/jenkins/jenkins-agent/workspace/Cloudstack/cloudstack-pr-analysis/plugins/integrations/cloudian/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ cloud-plugin-integrations-cloudian-connector ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/jenkins/jenkins-agent/workspace/Cloudstack/cloudstack-pr-analysis/plugins/integrations/cloudian/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ cloud-plugin-integrations-cloudian-connector ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 1 source file to /home/jenkins/jenkins-agent/workspace/Cloudstack/cloudstack-pr-analysis/plugins/integrations/cloudian/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.22.2:test (default-test) @ cloud-plugin-integrations-cloudian-connector ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.cloudstack.cloudian.CloudianClientTest
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
log4j:WARN No appenders could be found for logger (org.apache.cloudstack.cloudian.client.CloudianClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[ERROR] Tests run: 25, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.1 s <<< FAILURE! - in org.apache.cloudstack.cloudian.CloudianClientTest
[ERROR] addUserAccountFail(org.apache.cloudstack.cloudian.CloudianClientTest)  Time elapsed: 5.447 s  <<< ERROR!
org.apache.cloudstack.api.ServerApiException: Operation timed out, please try again.
	at org.apache.cloudstack.cloudian.CloudianClientTest.addUserAccountFail(CloudianClientTest.java:145)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   CloudianClientTest.addUserAccountFail:145 » ServerApi Operation timed out, ple...
[INFO] 
[ERROR] Tests run: 25, Failures: 0, Errors: 1, Skipped: 0

@blueorangutan
Copy link

Trillian test result (tid-3413)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37454 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4582-t3413-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 86 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

s_logger.error(errorMessage, e);
throw new CloudRuntimeException(errorMessage, e);
} finally {
txn.close();
Copy link
Contributor

@Pearl1594 Pearl1594 Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ustcweizhou I believe there's a need to do a txn.commit() before closing the transaction as the DB changes made as part of the updateSystemVmTemplate() gets rolled back

Rolling back the transaction: Time = 22 Name =  Upgrade; called by -TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-DatabaseUpgradeChecker.updateSystemVmTemplates:264-DatabaseUpgradeChecker.upgrade:275-DatabaseUpgradeChecker.check:379-CloudStackExtendedLifeCycle.checkIntegrity:64-CloudStackExtendedLifeCycle.start:54-DefaultLifecycleProcessor.doStart:182-DefaultLifecycleProcessor.access$200:53-DefaultLifecycleProcessor$LifecycleGroup.start:360-DefaultLifecycleProcessor.startBeans:158
2021-02-01 09:50:57,795 DEBUG [c.c.u.DatabaseUpgradeChecker] (main:null) (logid:) Running upgrade Upgrade41500to41510 to upgrade from 4.15.0.0-4.15.1.0 to 4.15.1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ustcweizhou I believe there's a need to do a txn.commit() before closing the transaction as the DB changes made as part of the updateSystemVmTemplate() gets rolled back

Rolling back the transaction: Time = 22 Name =  Upgrade; called by -TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-DatabaseUpgradeChecker.updateSystemVmTemplates:264-DatabaseUpgradeChecker.upgrade:275-DatabaseUpgradeChecker.check:379-CloudStackExtendedLifeCycle.checkIntegrity:64-CloudStackExtendedLifeCycle.start:54-DefaultLifecycleProcessor.doStart:182-DefaultLifecycleProcessor.access$200:53-DefaultLifecycleProcessor$LifecycleGroup.start:360-DefaultLifecycleProcessor.startBeans:158
2021-02-01 09:50:57,795 DEBUG [c.c.u.DatabaseUpgradeChecker] (main:null) (logid:) Running upgrade Upgrade41500to41510 to upgrade from 4.15.0.0-4.15.1.0 to 4.15.1.0

@Pearl1594 thanks for testing.
could you tell how to reproduce the issue ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weizhouapache I tested this by merging this code to master. However, the steps are equivalent to :

  1. Deploy a 4.14 env
  2. Register the template (4.15 template)
  3. upgrade to 4.15
  4. The Management server comes up fine, but, if we check the cloud.configuration value (in the DB) for attribute - "router.template." notice it doesn't update the template name to the new one.

The case of not applying the DB changes when template isn't registered, works absolutely fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pearl1594 yeah, you are right !
pushed a change.
thanks for review and testing.

@yadvr
Copy link
Member

yadvr commented Feb 2, 2021

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos7 ✖centos8 ✖debian. JID-2630

@Pearl1594
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2636

Copy link
Contributor

@Pearl1594 Pearl1594 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yadvr yadvr closed this Feb 19, 2021
@yadvr yadvr reopened this Feb 19, 2021
@yadvr
Copy link
Member

yadvr commented Feb 19, 2021

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos7 ✖centos8 ✖debian. JID-2754

@shwstppr
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✖centos7 ✔centos8 ✔debian. JID-2807

@apache apache deleted a comment from blueorangutan Feb 23, 2021
@yadvr
Copy link
Member

yadvr commented Feb 23, 2021

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2815

@DaanHoogland
Copy link
Contributor

I think we don't need this as the upgrade was tested manually by @Pearl1594 . better safe though
@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3608)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32793 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4582-t3608-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_human_readable_logs.py
Smoke tests completed. 86 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


@Override
public void performDataMigration(Connection conn) {
updateSystemVmTemplates(conn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 important thing to remember now is not to run this again

@yadvr yadvr merged commit 5a3ae15 into apache:4.15 Feb 24, 2021
@weizhouapache weizhouapache mentioned this pull request Mar 12, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants