Anomaly Detection for Network Flows in Contrail Analytics

Virtualization and Cloud Computing are accelerating the movement of System monitoring from manual, to reactive to proactive. Knowing when a host or service is down is certainly important, but the ready availability of additional metrics and events from both the application and the infrastructure enables us to get ahead of the curve, and manage performance and availability in a better way. One important aspect of being proactive is Anomaly Detection. Once an event notification infrastructure is place, we can raise alerts based on static thresholds. But, it’s often unclear what’s anomalous and what isn’t for a given deployment. Anomaly Detection via Machine learning techniques can help address this.

Contrail Networking provides Alerts based on any aspect of the system’s Operational State. (Operational State in the OpenContrail system: UVE – User Visible Entities through Analytics API).  We also provide an Anomaly Detection model based on time-series analysis of any metric. Based on past time series information, we learn what to expect in the future. If a given metric reports values that are far from this expectation, we raise an Alert. Of course, this must all be configurable – the parameters and algorithms used for learning the expectation, as well as the threshold that causes the alert.

In this example, we will use statistical process control, based on computing the running average and standard deviation and examining the current value using real-time stream processing. The metric being used is the number of flows being added on a Virtual Machine Interface on a host/vRouter. We run a multitier application with some clients – a redmine webserver with a separate mysql database. Then, we launch a TCP SYN attack on the webserver, which causes an unusual number of flows. This triggers an Alert.

Let’s dive in.

Setting up the Application

Our system has the “default-domain:demo” project with two Virtual Networks – “fe-front” and “be-vn”.
“fe-front” has the VM “fe-web” running a Redmine WebServer. It has a floating IP 10.84.53.83 for clients to access it.
“be-vn” has the VM “be-db”, running a Mysql DB being used by the Redmine Webserver.
This is the Network Monitoring page from the Contrail UI.

anomaly-detection-of-flows_blogpost_image1

Configuring the Alert

Every vRouter reports per-VMI (virtual machine interface) flow statistics against the VMI UVE every 30 seconds:

From controller/src/vnsw/agent/uve/vrouter.sandesh:
struct VrouterFlowRate {
1: optional u32 added_flows;
2: optional u32 max_flow_adds_per_second;
3: optional u32 min_flow_adds_per_second;
4: optional u32 deleted_flows;
5: optional u32 max_flow_deletes_per_second;
6: optional u32 min_flow_deletes_per_second;
7: optional u32 active_flows
}

The vRouter also calculates the Exponential Weighted Mean and Standard Deviation for added_flows, deleted_flows and active_flows as per the standard formulas:
μi = (1−α)μi-1 + α xi
σ2i= Si   = (1−α)(Si−1+α(xi − μi-1)2)
Here xi is the observation in the i-th step, μi-1  is the estimated EWM (Exponentially Weighted Mean) , and Si−1 is the previous estimate of the variance.

We will be using α (alpha) of 0.1 on added_flows in this example.

From controller/src/vnsw/agent/uve/interface.sandesh:
struct UveVMInterfaceAgent {
1: string                name (key=”ObjectVMITable”)

26: optional vrouter.VrouterFlowRate flow_rate
41: optional derived_stats_results.AnomalyResult added_flows_ewm
(stats=”flow_rate.added_flows:DSAnomaly:EWM:0.1″)
42: optional derived_stats_results.AnomalyResult deleted_flows_ewm
(stats=”flow_rate.deleted_flows:DSAnomaly:EWM:0.1″)
43: optional derived_stats_results.AnomalyResult active_flows_ewm
(stats=”flow_rate.active_flows:DSAnomaly:EWM:0.1″)

}

(It is possible to override alpha by changing the vRouter configuration file at /etc/contrail/contrail-vrouter-agent.conf. The EWM calculation can also be disabled completely)

The vRouter publishes the value of sigma () as it runs the calculation on each VMI. This is a normalized measure of how far the last sample was from the mean. We can configure an Alert based on this. Lets use 2.5.  (For a normal distribution, 98.8% of samples fall between μ ± 2.5σ)

Alerts can be configured at the global level for all objects, or at the project level for objects that are associated with a project, such as Virtual Networks or Virtual Machine Interfaces. Let’s configure this Alert under the default-domain:demo project. So, we can go to the Contrail UI under the Alarm Rules tab of Configure -> Alarms -> Project -> default-domain -> demo. This is what the Alert will look like:

anomaly-detection-of-flows_blogpost_image2

We added an Alert named vmi-anomalous-added-flows. The EWM calculations for the flows added to a VMI every 30 seconds are available in the UVE as UveVMInterfaceAgent.added_flows_ewm. This is the VMI UVE, but in the Alert contents, it is also useful to have the UVE Key of the VM and the VN, so we add them as the variables UveVMInterfaceAgent.vm_name and UveVMInterfaceAgent.virtual_network .

The Application and the Anomaly

We will use a script to simulate clients accessing the Redmine Webserver via the floating IP.
# while sleep $[ ( $RANDOM % 6 )  + 1 ]s ; do wget 10.84.53.83 -O /dev/null -o /dev/null ; done
Client access results in flows with clients, and between the Redmine Webserver and Database. This is visible in the VMI UVE, looking at the Analytics API:

anomaly-detection-of-flows_blogpost_image3

Now, we will use the hping3 utility to generate a TCP SYN attack to the Floating IP:

# hping3 10.84.53.83 -i u100000 -S -p 80

After 30 seconds, the vRouter sees a change in flow metrics:

anomaly-detection-of-flows_blogpost_image4

 

The added_flows attribute of the VMI UVE has been seen as 268, as the exponential mean has moved from 16.22 to 42.49 and sigma is 2.997. This triggers an alert.

In Contrail UI:

anomaly-detection-of-flows_blogpost_image5

In the Analytics streaming API:

anomaly-detection-of-flows_blogpost_image6

The Alert is against the VMI object, but its contents also report the “fe-front” Virtual Network and the “fe-web” VM.

Anomaly Detection Capabilities

We just saw an example of Anomaly Detection against the Networking Flows of a Virtual Machine Interface. Contrail Networking offers anomaly detection on other vRouter traffic metrics as well – Physical Interface Bandwidth and per-Physical Interface flows. In addition, we also do this for some controller metrics, such as BGP Route Updates.

All Alerts are configurable via the Contrail Configuration API.  To achieve scalability, the Anomaly Detection Algorithms run on the individual Contrail processes (contrail-vrouter-agent or contrail-control) that report the metrics. They can be adjusted on a per-process basis as required by changing configuration files.