Operational State in the OpenContrail system: UVE – User Visible Entities through Analytics API

Every element of the layered, horizontally scalable OpenContrail system has operational state to report. It’s important to present this operation state to the user with the right abstractions that make the system simpler to understand and manage, and yet provide enough details when required.

Consider the Virtual Network. There may be VM’s in multiple compute nodes that are attached to the same virtual network. So, operational state about this VirtualNetwork exists on VRouterAgent processes on these multiple compute nodes.  We want to present a consolidated view of this VirtualNetwork across all these compute nodes.

But, wait – there’s more. VRouterAgents on Compute nodes are not the only processes that report Operational State of a VirtualNetwork – the SchemaTransformer on the Config Node also reports which other Virtual Network the given VirtualNetwork is connected to. So, we actually need to present a consolidated view of this VirtualNetwork across the whole system – compute nodes and config nodes.

In an earlier Blog post, my colleague, Megh Bhatt, talked about Sandesh – OpenContrail’s Analytics Interface. One of the features of Sandesh is the UVE mechanism, which allows processes to report their Operational State in a way that facilitates the aggregation of state across process types and process instances.  This helps fulfill the promise of presenting the right abstractions for managing the OpenContrail system.

Anish_Mehta_Blogpost_Image1

Figure 1 UVE Data Structure for Virtual Network

Aggregating across Process Instances

Lets go back to the VirtualNetwork example

Here is part of the SandeshUVE for Virtual Network, as reported by VRouterAgents:

 struct UveVirtualNetworkAgent {
 1: string name(key="ObjectVNTable")
 2: optional bool deleted
 3: optional i32 total_acl_rules
 4: optional i32 in_bandwidth_usage (aggtype="sum")
 5: optional list vm_list (aggtype="union")
 }
 uve sandesh UveVirtualNetworkAgentTrace {
 1: UveVirtualNetworkAgent data;
 }
 

As explained in the Sandesh blog, all analytics information is recorded at the “Analytics Collector”, which can be scaled horizontally. The processes in the OpenContrail system act as “Analytics Generators”.  They maintain a connection to one of the Collectors, and send information to it.

VRouterAgents from multiple compute nodes are Generators who are sending this “UveVirtualNetworkAgent “ structure to their Analytics Collector independently, but we are presenting a single instance of this structure, aggregated across all generators, via the Analytics REST API. The annotation “aggtype” controls this aggregation on a per-attribute basis.

  1. No aggtype annotation

For these attributes, we expect that all generators will send the same value. There is certain number of ACL rules configured on this VirtualNetwork, and all VRouterAgents should be operating with the same rules. If all these Generators are actually sending the same value, we will just display that value. Otherwise, we display each unique value along with a list of Generators reporting that unique value.

  1. aggtype=”sum”

For these attributes (which much be integers), we will add together the values sent by each generator and present the sum.

A VirtualNetwork is occupying some input bandwidth on each compute node where it had some VirtualMachines. We need to report the aggregate input bandwidth occupied by this VirtualNetwork across the entire system.

  1. aggtype=”union”

For these attributes (which much be lists of strings), we will treat each list as a set, take a union of the sets, and present the result.

Each VRouterAgent has multiple VirtualMachines that are attached to this VirtualNetwork. We want to report a system-wide list of all VirtualMachines that are on this VirtualNetwork.

UVE Cache in the Generators

Generators report their Operational States to Collectors, which write themto a database. Based on user request, Operational State is presented out of the database via the Analytics REST API. This mechanism must tolerate component failures.

We have High-Availability measure in place to handle failures in each of these layers of Analytics. Explaining all of these is beyond the scope of this discussion, but there is one mechanism in particular that applies to UVEs.

Consider a case where the Collector loses contact with a Generator and at the same time, there are multiple UVEs to which this Generator may have been contributing some state. The Collector will in effect remove this Generator’s contribution from all these UVEs.

However, subsequently, this Generator might re-establish contact with this Collector, or with another Collector. The goal is to get this Generator’s state information back. For this reason, we maintain a UVE Cache in each Generator. When a Generator connects to a Collector, the first step is to sync its UVE Cache with the Collector, so that we always reflect the correct, latest state for any connected Generator.

The system-wide UVE state is accessible via the Analytics REST API, but the UVE cache of any Generator is accessible via IntrospectE.g.: (the VRouterAgent’s Introspect port is 8085)

Anish_Mehta_Blogpost_Image2

Figure 2 UVE Cache for Virtual Network on VRouterAgent

Conclusion

All processes in the OpenContrail system act as “Analytics Generators”.  These generators maintain a robust and resilient connection to one of the Analytics Collectors, and they send their collector various kinds of information – Operational State, System Logs and Statistics (such as flow/traffic information).

UVE – User Visible Entities feature is the mechanism of OpenContrail that allows processes to report their Operational State in a way that facilitates the aggregation of state across process types and process instances. This mechanism also guarantees a consistent and updated view of Operational State in the face of system failures of Analytics Generators, Analytics Collectors and their connections.