OpenContrail In Service Software Upgrade

This blog goes over the procedure followed for In Service Software Upgrade of OpenContrail.

OpenContrail is an Openstack neutron plugin in the Openstack environment, and it primarily has two components, the Contrail controller services and the vRouter and associated services in the compute node. Since it is OpenContrail ISSU, we assume that OpenContrail and Openstack are installed in separate computational resources as shown in the Figure1 and can be upgraded independently. In this blog, we will go over procedure for OpenContrail In Service Software Upgrade from Version1 (V1) to Version2 (V2).

OpenContrail ISS Upgrade Blog Image 1

As part of first step, spawn a parallel v2 Contrail controller cluster and launch ISSU task as shown in Figure2.

OpenContrail ISS Upgrade Blog Image 2

ISSU task will first, BGP peer v1 and v2 controllers as shown in Figure3.

OpenContrail ISS Upgrade Blog Image 3

Then ISSU task will freeze north bound communication of v1 Contrail controller cluster and start a run time ISSU sync as shown in Figure4. Note during this stage datapath is not impacted.

OpenContrail ISS Upgrade Blog Image 4

Then it will open the northbound communication with Openstack as show in Figure5.

OpenContrail ISS Upgrade Blog Image 5

Note the run time ISSU config sync and Contrail controller BGP peering ensures all the state generated in v1 Contrail controller cluster to be available in v2 Contrail controller cluster. Now system is ready for compute node upgrade.

Now admins can perform rolling upgrades of computes individually or in batches at a time as shown in figures 6 and 7. This will facilitate necessary testing that admin may intend to do before all computes upgrade.

OpenContrail ISS Upgrade Blog Image 6

OpenContrail ISS Upgrade Blog Image 7

Admin completes all compute upgrades as shown in Figure 8.

OpenContrail ISS Upgrade Blog Image 8

Admin can also rollback the upgraded computes if some issue is detected, individually or in batches as shown in Figure 9.

OpenContrail ISS Upgrade Blog Image 9

Once all computes are upgraded, admin can initiate decommissioning v1 Contrail controller cluster. For that, ISSU task will freeze the north bound communication of v1 Contrail controller with Openstack, does a final ISSU config sync of state as shown in Figure 10. It is not recommended to rollback from this step onwards.

OpenContrail ISS Upgrade Blog Image 10

ISSU task finalizes the upgrade by decommissioning v1 Contrail Controller and set v2 Contrail controller as newer neutron plugin of Openstack as shown in Figure 11.

OpenContrail ISS Upgrade Blog Image 11

ISSU task is terminated and the upgrade is done as shown in Figure 12.

OpenContrail ISS Upgrade Blog Image 12

As you can see in the upgrade procedure, a hybrid approach is taken for the upgrade. Contrail controller cluster is side by side upgraded while computes are in place upgraded. Also communication between older version and newer version is over standard protocols like BGP or through ISSU task. This facilitates focused testing and less likely error prone.

Note all the Contrail Controller services including support services such as RabbitMQ, and database services such as Zookeeper and Cassandra and running in the same computational node is for illustrational purpose only. These support and database services could be running external to the Contrail controller and can be shared between old version and newer version, as they are logically partitioned and so old version and newer version don’t step on each other.

In the below screencast, all the support and database services are running in the Openstack Node. And HA proxy would be front ending neutron. This is to avoid touching Openstack components when the upgrade happens. For convenience ISSU task is running in v2 Contrail Controller. For convenience the whole procedure is driven through fabric scripts, which is referred as ISSU task.

In summary, a hybrid approach is taken here. Contrail controller cluster is side by side upgraded while computes are in place upgraded. Also communication between older version and newer version is over standard protocols like BGP or through ISSU task. This facilitates focused testing and less likely error prone. It is not a hitless upgrade, but it is a minimal hit upgrade with lots of flexibility for admins, for example connectivity to a VM that was spawned after upgraded is not impacted even after rollback!