Sandesh – A SDN Analytics Interface

In a previous blog posting titled “Debugging and System Visibility in a SDN Environment”, OpenContrail Software Engineer and my colleague, Anish Mehta, gave an overview of the challenges and opportunities facing SDN Analytics.
A SDN analytics solution needs to have the right abstractions, aggregation, and syncing mechanisms to report and present information that can be used by humans to operate a multi-tenant datacenter. Open APIs for both the northbound and the southbound analytics interface of the SDN controller are crucial to operate in a multi-vendor environment. The southbound interface here refers to the interface to gather information from both virtual and physical routers, gateways and network services appliances. Traditionally, the southbound interface consists of protocols like syslog, sFlow, netFlow, SNMP used by network elements to report information. Proprietary CLI commands generally display the current operational state of the network elements. OpenContrail Analytics node uses REST as the northbound interface and Sandesh as the southbound interface as shown below.

sandesh_analytics_blogpost_picture

Figure 1: OpenContrail Analytics Node

In this blog post, we will present an overview of “Sandesh” – A southbound interface protocol that is used by the OpenContrail Analytics engine to gather information from the OpenContrail virtual routers and other software modules like the control-node, and the configuration manager, which run as part of the controller. The name Sandesh comes from Sanskrit and means message. Sandesh consists of three components:

1. An Interface Definition Language (IDL) and code generator based on Apache Thrift that allows users/developers to specify messages.
2. XML-based protocol that is used between the generators of the information and the collector of the information, which is the analytics engine
3. A back-end library used by the generators that integrates the generated code into an asynchronous queue based sending mechanism.

The languages currently supported are C++, Python, and C. The C language support is limited to serialization and deserialization.

The blog post will concentrate on the IDL and explain how the IDL allows the specification of abstractions, aggregation, and syncing mechanisms to be applied by the analytics engine to the messages defined in the IDL. The Juniper /contrail-sandesh Github repository contains source code to help you get started. Readers interested in learning more about the XML-based protocol and the back-end library can look under the library/cpp/protocol for C++ and library/python/pysandesh/protocol for Python based implementation.

Sandesh IDL and code generator:

The Sandesh code generator allows developers to define data types and messages to be sent to the analytics engine or the collector in a simple definition file. Taking that file as input, the code generator produces code to be used to send the messages to the collector. The code generator frees the developer from the burden of writing a load of boilerplate code to serialize and transport the objects to the collector. The developers are exposed a simple API to send the messages and the generated code along with the back-end library handles the grunt work of actually doing the low-level serialization/deserialization and send/receive. The data types supported in the IDL file are bool, byte, i16, i32, i64, string, list, map, struct, sandesh, u16, u32, u64, const static string.  The sandesh data-type is the top-level data type and identifies a unique message type being sent from the generator to the collector. Developers can define different types of sandesh based on the need to convey different types of information to the collector. Annotations are used in the IDL file to convey additional information like the abstraction to index the message against, aggregation mechanisms like sum, append, and union. For example, the annotation (key=<Table-Name>) is used to indicate that the message should be stored in a particular indexed table like the Virtual Network table in the analytics database.

The sandesh code generator is used to transform the Sandesh IDL File (.sandesh) into source code, which is used by the generators. To generate the source from a sandesh file, user can run:

sandesh --gen <language> <sandesh filename>

For example, for a sandesh file – vns.sandesh, running sandesh –gen cpp vns.sandesh produces the following auto-generated C++ code:

vns_types.h, vns_types.cpp, vns_constants.h, vns_constants.cpp, vns_html.cpp, vns_html_template.cpp
 vns_request_skeleton.cpp, vns.html, vns.xsl, vns.xml
 style.css

Similarly, running sandesh –gen py vns.sandesh produces the gen_py and vns python packages.  Following files are auto-generated in the vns package:

ttypes.py, constants.py, http_request.py, vns.xml, vns.xsl, vns.html, request_skeleton.py, style.css, index.html

The source code for the sandesh code generator can be accessed under the compiler/ directory.

Sandesh Types

Generators need to convey different types of information like system logs indicating occurrence of a system event, object state change information, statistics information, lightweight tracing needed for deep dive debugging, information to display current state of data structures. Developers can define different types of sandesh to address each of the above use cases.

1. systemlog
Use Case:
Structured log replacement for syslog.
Example:

systemlog sandesh BgpPeerTableMessageLog {
 1: string PeerType;
 2: "Peer"
 3: string Peer;
 4: "in table";
 5: string Table;
 6: ":";
 7: string Message;
 }

Notes:

systemlog can optionally have a (key=”<Table-Name”>) annotation and this can be used to corelate logs across different network elements. For example, a (key=”IPTable”) can be used to corelate logs pertaining to a specific IP address across the physical routers and the virtual routers / SDN control plane. The const static strings defined in the message above – elements 2, 4, and 6 are only used for display purposes when querying the logs from the analytics database.

2. objectlog

Use Case:

Logging state transitions and lifetime events for objects (VirtualMachine, VirtualNetwork). Objectlog is useful for performing historical state queries on an object. Objects have an object-id, which is indicated using the annotation (key=”<Object-TableName>”). For example, RoutingInstanceInfo below has name as the key.

Example:

 struct RoutingInstanceInfo {
 1: string name (key="ObjectRoutingInstance");
 2: optional string route_distinguisher;
 3: optional string operation;
 4: optional string peer;
 5: optional string family;
 6: optional list<string> add_import_rt;
 7: optional list<string> remove_import_rt;
 8: optional list<string> add_export_rt;
 9: optional list<string> remove_export_rt;
 10: string hostname;
 }
 objectlog sandesh RoutingInstanceCollector {
 1: RoutingInstanceInfo routing_instance;
 }

Notes:

It is best practice to add the optional keyword to elements whose values do not change frequently and thus do not need to be sent with each message.

3. uve  (User Visible Entities)

Use Case:

UVEs (User Visible Entities) are used represent the system-wide state of externally visible objects. uve is a special case of objectlog. UVEs are used to display the operational state of an object like VirtualMachine or VirtualNetwork, by aggregating information from uve sandesh messages across different generator types (configuration manager, virtual router, control node) and across nodes. uve like objectlog need the key annotation.

Details and Example:

For example, consider the VirtualNetwork uve sandesh definition. We specify its state in two “tiers” – configuration manager and virtual router.  The configuration manager tier is defined in virtual_network.sandesh in the src/controller/config/uve directory and the virtual router tier is defined in virtual_network.sandesh in the src/controller/vnsw/agent/uve directory in the Juniper/contrail-controller Github repository. For each tier, we return a single structure, even though a given virtual network might be present on many software modules in that tier. A VirtualNetwork might be present on many virtual routers; these virtual routers are expected to send uve sandesh messages when any attribute of the VirtualNetwork changes state. The virtual router tier of the VirtualNetwork UVE definition looks like:

 struct UveInterVnStats {
 1: string                  other_vn (aggtype="listkey")
 2: i64                     out_tpkts;
 3: i64                     in_tpkts;
 }
 struct UveVirtualNetworkAgent {
 1: string        name(key="ObjectVNTable")
 2: optional bool deleted
 3: optional i32  total_acl_rules
 4: optional i32  total_analyzers(aggtype=”sum”)
 5: optional i64  in_tpkts       (aggtype="counter")
 7: optional i64  out_tpkts      (aggtype="counter")
 9: optional list<UveInterVnStats>stat(aggtype="append")
 11: optional list<string> vm_list (aggtype="union")
 }
 uve sandesh UveVirtualNetworkAgentTrace {
 1: UveVirtualNetworkAgent         data;
 }

UVEs are special case of Objectlog and hence each UVE has an object-id, which is denoted using the (key=”<Object-TableName>”) annotation.  The annotation needs to consistent across all the tiers of the UVEs to allow correlation and aggregation to be performed by the analytics engine.

The “aggtype” annotation allows the developer to choose how the analytics engine should aggregate the attribute when sent across multiple generators and tiers.  For example, for the “aggtype=sum” annotation on an attribute, the analytics engine reports an aggregate value that is a sum of the values sent by all generators. In the VirtualNetwork uve sandesh definition, each virtual router tracks the number of analyzer instances attached to a VirtualNetwork using the attribute “total_analyzers”. The aggregate value reported by the analytics engine should be a sum of the values of this attribute across all virtual routers on which the VirtualNetwork exists.

4. trace

Use Case:

Light-weight in memory buffer logs for frequently occurring events

Example:

 trace sandesh XmppRxStream {
 1: "Received xmpp message from: ";
 2: string IPaddress;
 3: "Port";
 4: i32 port;
 5: "Size: ";
 6: i32 size;
 7: "Packet: ";
 8: string packet;
 9: "$";
 }

5. traceobject

Use Case:

Light-weight in memory buffer logs for frequently occurring object state transitions

Example:

 traceobject sandesh RoutingInstanceCreate {
 1: string name;
 2: list<string> import_rt;
 3: list<string> export_rt;
 4: string virtual_network;
 5: i32 index;
 }

Notes about trace and traceobject:

Developer needs to create a Sandesh Trace Buffer with a given size wherein the trace and traceobject sandesh are stored. HTTP introspect (explained in the request and response sandesh) can be used to request viewing of the trace buffer. Tracing to the buffer can be enabled or disabled, and multiple types of trace and traceobject sandesh can be traced into a single trace buffer.

6. request and response

Use Case:

Request is used to send commands from requestor to generator. Response is used for response from generator to requestor. Request and response are used to dump internal data structures and provide operational information from the software modules needed for in-depth debugging.

Example:

 request sandesh SandeshLoggingParamsSet {
 1: bool enable;
 2: string category;
 3: string level;
 }
 response sandesh SandeshLoggingParams {
 1: bool enable;
 2: string category;
 3: string level;
 }

Notes: 

The developer is expected to provide the implementation of the request handling function. For the above example, in C++ it will be implementation of SandeshLoggingParamsSet::HandleRequest function and for python a bound method named handle_request is expected to be present in SandeshLoggingParamsSet.

HTTP Introspect

Sandesh is also used to implement support for debugging in the OpenContrail controller and virtual router software modules. The debugging facility is called HTTP Introspect. The sandesh code generator produces HTML forms for each request sandesh and associated stylesheets that are required to render response sandesh when invoked with the –gen html option. Each software module contains an embedded web/HTTP server which developers can use to dump internal state of data structures, view trace messages, and perform other extensive debugging.

For example, to debug the BGP neighbor peering status on the control-node, the developer has defined a request sandesh in bgp_peer.sandesh as:

 request sandesh BgpNeighborReq {
 1: string ip_address;
 2: string domain;
 }
 struct BgpNeighborResp {
 1: string peer;             // Peer name
 2: string peer_address (link="BgpNeighborReq");
 3: u32 peer_asn;
 response sandesh BgpNeighborListResp {
 1: list<BgpNeighborResp> neighbors;
 }

To debug, developer can access the web page at http://<IP-Address of control-node>:8083/Snh_BgpNeighborReq?ip_address=&domain= using curl or the web browser and get the XML data corresponding to the response sandesh from the control-node.

The developers specify the commands supported by the web server (GETs) by defining a request sandesh. The data is returned in a response sandesh. The collector can also send a request sandesh and the response sandesh sent to the collector will be stored in the analytics database. Request and response sandesh thus provide a RESTful API to debugging the software modules.

Wrapping Up

Monitoring and running a multi-tenant data-center requires strong SDN analytics. Exposing the right network abstractions and having appropriate aggregation and syncing mechanisms are crucial for analytics to scale in massively distributed systems. The importance of open APIs cannot be understated in achieving the goal of SDN analytics to enable humans to cost-effectively monitor and operate a multi-tenant, multi-vendor data center.

Most modern network operating systems support C++, and Python, and by using Sandesh as the southbound interface, OpenContrail Analytics solution can be used to provide a consolidated view and deep insight across virtual and physical networks and events even in a multi-vendor environment.  As the blog post illustrates, Sandesh is an open southbound interface and protocol that enables developers to expose the right abstractions and aggregation mechanisms required for SDN analytics. Integrating Sandesh on existing physical routers and switches in a multi-vendor environment is just a matter of defining the right abstractions and aggregation mechanisms in the IDL and using the back-end library. In future blog entries we will discuss the OpenContrail Analytics Engine aggregation and syncing mechanisms as well as the statistics collection mechanisms in detail.