Making Networks Cool Again through Network Softwarization

Computer networks are complex and can be very difficult to manage. In a typical network, one can find many kinds of equipment, ranging from forwarding elements such as routers and switches to middleboxes, which are equipments that perform a wide range of networking tasks, such as firewalls, network address translators (NATs), load balancers, intrusion detection/prevention systems, etc.

For the past few decades, network operators have been relying on a handful of equipment vendors that provide proprietary and vertically integrated hardware running complex, closed and proprietary control software. The software implements network protocols that undergo years of standardization and interoperability testing. Because of the lack of network programmability and flexible management interfaces, network administrators typically configure individual network devices adapting tedious and error-prone manual configuration methods.

This mode of operation has slowed innovation, increased complexity, and inflated both the capital and operational costs of running a network.

The recent trend toward Network Softwarization is driving an unprecedented techno-economic shift in the Telecom and ICT (Information and Communication Technologies) industries. By separating the hardware on which network functions/services run and the software that realizes and controls the network functions/services, Software-Defined Networking (SDN) and Network Function Virtualization (NFV) are creating an open ecosystem that drastically reduces the cost of building networks and changes the way operators manage their networks.

SDN and NFV paradigms enable the design, deployment and management of networking services with much lower costs and higher flexibility than traditional networks. In particular, they are contributing to the deployment of 5G infrastructures, from high data rate fixed-mobile services to the Internet of Things. As a result, new value chains and service models are emerging, creating novel business models and significant socio-economic impact.

SDN and NFV are two sides of the same trend toward network softwarization. SDN involves three principles: separation of the control logic (control plane) from packet forwarding (data plane), centralization of the control logic and programmability of the data plane through well defined control plane-data plane interfaces.

Unlike traditional networks where control is distributed and embedded into network devices (switches and routers), SDN logically centralizes the control plane in one entity called the SDN controller. The SDN controller runs on a single or cluster of servers, has a global view of the network, and translates high level operational policies into switch/flow level traffic management decisions. This separation allows employing much simpler forwarding hardware (generic switching equipment that is built using cheap merchant silicon) that provides much faster packet forwarding. OpenFlow is the standard communications interface defined between the control and forwarding planes of an SDN. This programmability enables a great flexibility in network management, and leads to faster innovation in network traffic engineering, security and efficiency.

On the other hand, NFV softwarizes network functions (NFs) such as load balancers, firewalls and intrusion detection systems that were previously provided by special-purpose, generally closed and proprietary hardware. NFs will now be implemented by software that could run on virtual machines running on commodity hardware. They also can be provisioned as virtual NFs (VNFs) in a cloud service to leverage the economies of scale provided by cloud computing, and consequently, reducing network capital and operational expenditures.

While either SDN or NFV can be used by itself, the two technologies are complementary and there is big synergy in combining both of them. However, the new features brought by either SDN or NFV are also the source of new security challenges, and the combination of both technologies may increase the impact of the related security threats. For NFV and SDN to achieve widespread adoption by the industry, security remains a top concern and has been a hurdle to the adoption of network softwarization. Based on recent surveys, security is one of the biggest concerns impacting the broad adoption of SDN and NFV.

Another exciting research question in network softwarization is How to leverage the centralized control in SDN to advance traffic routing, which is arguably the most fundamental networking task. For decades, innovation in routing did not get a fair attention from the industry for various historical, financial and practical reasons. However, the unprecedented growth in network traffic and application requirements witnessed by today’s networks is driving a huge need for automation. Network operators are paying attention to innovative routing and traffic engineering solutions. We argue that the combination of network sofwarization and the recent breakthroughs in machine learning techniques offer an ideal framework for network automation. For example, thanks to new powerful deep learning techniques, operators are able to model very complex networks and find patterns in large amounts of network data, which offers great opportunities toward automation of network control.

References:

Feamster, Nick, Jennifer Rexford, and Ellen Zegura. “The road to SDN.” Queue 11.12 (2013): 20.

Harvey Freeman and Raouf Boutaba. Networking industry transformation through softwarization. IEEE Communications Magazine, August 2016.

Predicting Network Traffic Matrix Using LSTM with Keras (part 2)

The research paper can be found here

In this thread we state the traffic matrix prediction problem and propose an LSTM architecture to solve it.

We train a deep LSTM architecture with a deep learning method (backpropagation through time algorithm) to learn the traffic characteristics from historical traffic data and predict the future TM.

Problem Statement

Let N be the number of nodes in the network. The N -by-N traffic matrix is denoted by Y such as an entry y_{ij} represents the traffic volume flowing from node i to node j. We add the time dimension to obtain a structure of N-by-N-by-T tensor (vector of matrices) S such as an entry s tij represents the volume of traffic flowing from node i to node j at time t, and T is the total number of time-slots. The traffic matrix prediction problem is defined as solving the predictor of Y t (denoted by Y b t ) via a series of historical and measured traffic data set (Y t−1 , Y t−2 , Y t−3 , …, Y t−T ). The main challenge here is how to model the inherent relationships among the traffic data set so that one can exactly predict Y t .

Feeding The LSTM RNN

To effectively feed the LSTM RNN, we transform each matrix Y t to a vector X t (of size N ×N ) by concatenating its N rows from top to bottom. X t is called traffic vector (TV).
Note that x n entries can be mapped to the original y ij using the relation n = i × N + j. Now the traffic matrix prediction problem is defined as solving the predictor of X t (denoted by b t ) via a series of historical measured traffic vectors (X t−1 , X X t−2 , X t−3 , …, X t−T ). One possible way to predict the traffic vector X t is to predict one component x tn at a time by feeding the LSTM RNN one vector (x t 0 , x t 1 , …, x tN 2 ) at a time. This is based on the assumption that each OD traffic is independent from all other ODs which was shown to be wrong by [24]. Hence, considering the previous traffic of all ODs is necessary to obtain a more correct and accurate prediction of the traffic vector.

Continuous Prediction Over Time

Real-time prediction of traffic matrix requires continuous feeding and learning. Over
time, the total number of time-slots become too big resulting in high computational complexity. To cope with this problem, we introduce the notion of learning window (denoted by W ) which indicates a fixed number of previous time-slots to learn
from in order to predict the current traffic vector X t (Fig. 8). We construct the W -by-N 2 traffic-over-time matrix (that we denote by M ) by putting together W vectors (X t−1 , X t−2 , X t−3 , …, X t−W ) ordered in time. Note that T ≥ W (T being the total number of historical matrices) and the number of matrices M is equal to T /W .

window

Performance Metric

To quantitatively assess the overall performance of our LSTM model, Mean Square Error (MSE) is used to estimate the prediction accuracy. MSE is a scale dependent metric which quantifies the difference between the forecasted values and the actual values of the quantity being predicted by computing the average sum of squared errors:

MSE = \frac{1}{N} \sum_{i=1}^N  (y_i-\widehat{y}_i )^2

where y i is the observed value, y b i is the predicted value and N represents the total number of predictions.

Experiments

We implemented NeuTM as a traffic matrix prediction application on top of POX controller [19]. NeuTM’s LSTM model is implemented using Keras library [20] on top of
Googles TensorFlow machine learning framework [21]. We evaluate the prediction accuracy of our method using real traffic data from the GÉANT backbone networks [17] made up of 23 peer nodes interconnected using 38 links (as of 2004). 2004-timeslot traffic matrix data is sampled from the GÉANT network by 15-min interval [18] for several months.

To evaluate our method on short term traffic matrix pre- diction, we consider a set of 309 traffic matrices. As detailed in section IV-B, we transform the matrices to vectors of size
529 each and we concatenate the vectors to obtain the traffic-over-time matrix M of size 309 × 529. We split M into two matrices, training matrix M train and validation matrix M test of sizes 263 × 529 and 46 × 529 consecutively. M train is used to train the LSTM model and M test is used to evaluate and validate its accuracy. Finally, We normalize the data by
dividing by the maximum value.

Figure 5 depicts the MSE obtained over different numbers of hidden layers (depths). The prediction accuracy is better with deeper networks. Figure 6 depicts the variation of the
training time over different depths. Note that it takes less than 5 minutes to train a 6 layers network on 20 epochs. Finally, figure 7 compares the prediction error of the different prediction methods presented in this paper and shows the superiority of LSTM.

msebynhiddenlayers

In this work, we have shown that LSTM architectures are well suited for traffic matrix prediction. We have proposed a data pre-processing and RNN feeding technique that achieves high prediction accuracy in a very short training time. The results of our evaluations show that LSTMs outperforms traditional linear methods and feed forward neural networks by many orders of magnitude.

traintime

Predicting Network Traffic Matrix Using LSTM with Keras (part 1)

The research paper can be found here

In this article we are going to explain step by step how to implement an online  traffic matrix predictor using Recurrent Neural Networks, specifically, the Long Short Term Memory architecture. At the end of this post, you will:

  • Understand the importance of Traffic Matrix (TM) prediction
  • Know what LSTM networks are and understand how they work
  • Be able to implement an LSTM in Keras to perform matrix prediction

Note: The techniques explained in this post are not restricted to traffic matrix in any way. They can be trivially generalized to any kind of matrix (eg. prediction of next frame in a video).

Content

Why predicting traffic matrix is important in communication networks?

Well, as you probably know, computer networks and communication networks in general, like the Internet, have limited resources in term of bandwidth, computing power of the forwarding elements (routers, switches ..), computing power of network middleboxes, etc. Thus, operators need to optimize their resource allocation in order to scale and support more users.

Having an accurate and timely network Traffic Matrix (TM) is essential for most network operation/management tasks such as traffic accounting, short-time traffic scheduling or re-routing, network design, long-term capacity planning, and network anomaly
detection. For example, to prevent DDoS attacks or detect them in their early stage, it is necessary to be able to predict/detect high-volume traffic (heavy hitters).

Neural Networks for Traffic Matrix Prediction

Artificial Neural Networks, or simply Neural Networks (NN) are widely used for modeling and predicting network traffic because they can learn complex non-linear patterns thanks to their strong self-learning and self- adaptive capabilities. That is, an NN can self-learn patterns of, and hence estimate, any linear or non-linear function, only from data and even when the underlying data relationships are unknown.

The NN model is a nonlinear, adaptive modeling approach which, unlike the
traditional prediction techniques such as ARMA, ARAR or HotWinters,  relies on the observed data rather than on an analytical model. The architecture and the parameters of the NN are determined solely by the dataset. NNs are characterized by their generalization ability, robustness, fault tolerance, adaptability, parallel processing ability, etc

Recurrent Neural Networks

There are two classes of neural networks: Feed Forward Neural Networks (or FNNs) and Recurrent Neural Networks or (RNNs). FNNs  can provide only limited temporal modeling by operating on a fixed-size window of TM sequence. They can only model the data within the window and are unsuited to handle historical dependencies. By contrast, recurrent neural networks or deep recurrent neural networks (figure 1) contain cycles that feed back the network activations from a previous time step as inputs to influence predictions at the current time step (figure 2). These activations are stored in the internal states of the network as temporal contextual information [1]. However, training conventional RNNs with the gradient- based back-propagation through time (BPTT) technique is difficult due to the vanishing gradient and exploding gradient problems. The influence of a given input on the hidden layers, and therefore on the network output, either decays or blows up exponentially when cycling around the networks recurrent connections.

drnn
Figure 1. Deep Recurrent Neural Network (DRNN)

These problems limit the capability of RNNs to model the long range context dependencies to 5-10 discrete time steps between relevant input signals and output.

drnnovertime
Figure 2. Flow of information over time in a DRNN

Long Short Term Memory Recurrent Neural Networks

The architecture of LSTMs is composed of units called memory blocks. A memory block contains memory cells with self-connections storing (remembering) the temporal state of
the network in addition to special multiplicative units called gates to control the flow of information. Each memory block contains an input gate to control the flow of input activations into the memory cell, an output gate to control the output flow of cell activations into the rest of the network and a forget gate.

archinode

The forget gate scales the internal state of the cell before adding it back to the cell as input through self recurrent connection, therefore adaptively forgetting or resetting the cells memory. The modern LSTM architecture also contains peephole connections from its internal cells to the gates in the same cell to learn precise timing of the outputs [2].

References

Sak, Hasim, Andrew W. Senior, and Franoise Beaufays. ”Long shortterm memory recurrent neural network architectures for large scale acoustic modeling.” Interspeech. 2014.

Felix A. Gers, Nicol N. Schraudolph, and Jurgen Schmidhuber, Learning precise timing with LSTM recurrent networks, Journal of Machine Learning Research , vol. 3, pp. 115143, Mar. 2003.

 

 

From the Cloud to the Fog

A new networking/computing paradigm called “Fog Computing” was born exactly on November 19, 2015 when Cisco Systems, ARM Holdings, Dell, Intel, Microsoft, and Princeton University, founded the OpenFog Consortium.

So why a new computing paradigm? well, to recall how the Cloud works, let’s consider the example of a smart street-lighting system based on motion censors; The censored data is sent to the Cloud (remote servers, usually belong to a private corporation) to trigger the lights on or off. Each and every data frame goes all the way through the Internet to finally reach the Cloud servers, where it’s aggregated and analyzed to trigger actions or to be stored for future analysis. This scheme doesn’t make a lot of sense when a huge number of The Internet of Things (IoT)  devices get involved. With the increase of IoT adoption, Cloud computing paradigm can hardly satisfy the IoT devices’ requirements of mobility support, location awareness and low latency.

Today’s cloud models are not designed for the volume, variety, and velocity of data that the IoT generates. Billions of new devices are generating more than two exabytes of data each day and an estimated 50 billion “things” will be connected to the Internet by 2020. Moving all data from these devices to the cloud for analysis would require unprecedented amounts of bandwidth [CISCO16].

Introducing Fog Computing

The basic idea of Fog computing is to move the low-latency processing near the edge while latency-tolerant large-scope aggregation is performed on powerful resources in the Cloud.

IoT is generating an unprecedented volume and variety of data, but by the time the data makes its way to the cloud for analysis, the opportunity to act on it might be gone. The most time sensitive data is analyzed at the network edge, close to where it is generated instead of sending vast amounts of IoT data to the cloud. This allows deployment of low-latency applications, like health-care applications and energy-management applications.

 

Analyzing data close to the device that collected the data can make the difference between averting a disaster and a cascading system failure. The rest of the  data can be sent to the cloud for historical analysis and longer term storage.

Analyzing data close to the device that collected the data can make the difference between averting a disaster and a cascading system failure

Analyzing data close to the edge also offloads gigabytes of network traffic from the core network.

Fog architecture offloads the network from unnecessary traffic

When to Consider Fog Computing [CISCO16]

● Data is collected at the extreme edge: vehicles, ships, factory floors, roadways,

railways, etc.

● Thousands or millions of things across a large geographic area are generating data.

● It is necessary to analyze and act on the data in less than a second.

 

Security

Although there are security solutions for Cloud computing, they may not be suitable for Fog computing because Fog devices work at the edge of networks [Ivan14]. However, security and privacy issues studied in the context of smart grids [WANG13] and machine-to-machine communications [LU11] might be useful to secure the Fog.

Conclusion

Fog is a non-trivial extension of the Cloud and it enables a new breed of applications and services. Although there is a fruitful interplay between the Cloud and the Fog, particularly when it comes to data management and analytics, However, in sharp contrast to the more centralized Cloud, the services and applications targeted by the Fog demand widely distributed deployments. The Fog, for instance, will play an important role in enabling high quality communication to moving vehicles, through smart gateways and access points positioned along roads and highways.

The services and applications targeted by the Fog demand widely distributed deployments
References

[CISCO16] https://www.cisco.com/c/dam/en_us/solutions/trends/iot/docs/computing-overview.pdf

[WANG13] W. Wang and Z. Lu, “Survey cyber security in the smart grid: Survey
and challenges,” Comput. Netw. , vol. 57, no. 5, pp. 1344–1371, Apr. 2013.
[LU11] R. Lu, X. Li, X. Liang, X. Shen, and X. Lin, “Grs: The green, relia-
bility, and security of emerging machine to machine communications,” Communications Magazine, IEEE , vol. 49, no. 4, pp. 28–35, April 2011.

 

 

Build your own OpenFlow test lab on one single ubuntu machine – Part 1

This tutorial shows you how to build a simple SDN (Software Defined Networking)/OpenFlow lab to launch your research tests on OpenFlow on one single machine. We assume our machine is running ubuntu14.x or newer ubuntu distribution. We start by presenting tools and their installation/configuration then we will demonstrate some useful tests.

the global architecture of our OpenFlow testing lab
the global architecture of our OpenFlow testing lab

What do we need to build a SDN test lab?

First, we need an OpenFlow-enabled switch. You find here a list of commercial OpenFlow-enabled physical/virtual switches and here some open source OpenFlow switches among other OpenFlow products. Since we are working on a single machine, we will use a virtual switch.

1. Openvswitch: Openvswitch is an open source virtual switch, a software providing switching stack (same functionality than a physical switch) for virtual environments. The simplest way to install Openvswitch on ubuntu is to run:

apt-get install openvswitch-common openvswitch-switch

For more details, I recommend reading this tutorial: https://wiki.linaro.org/LNG/Engineering/OVSOnUbuntu. 

Then, we create an Openvswitch bridge br0:

ovs-vsctl add-br br0       

Now, we need to setup virtual interfaces on br0.  We use Tuntap, a kernel extension that emulates a simple point-to-point or Ethernet device, it sets up and emulates an L2 interface which will show up like a physical interface (when we type ifconfig for example).

ip tuntap add mode tap vnet0  //adds a virtual interface named vnet0
ip link set vnet0 up  //sets up vnet0
ovs-vsctl add-port br0 vnet0  //links vnet0 to br0

NB. It’s possible to create many bridges on the same Openvswitch instance. Each bridge has its own virtual interfaces and they all share the physical resources/interfaces of the host running Openvswitch.

Second, we need a hypervisor to build our virtual environment using virtual machines.

2. VirtualBox: a powerful x86 and AMD64/Intel64 open source hypervisor maintained by Oracle. To install VirtualBox go here, choose to download the last ubuntu package and then install the package. For more details on how to use VirtualBox, you may want to read this complete tutorial: https://www.virtualbox.org/manual/ch01.html

Now, we can complete our virtual environment by creating VMs and connecting them to the bridge br0. To do that, open VirtualBox, select the VM you want to connect, go to Settings then Network, set one network adapter to Bridged Adapter mode and finally choose vnet0 from the name list.

Connecting VM to Openvswitch
Connecting VM to Openvswitch

Once we create as many VMs we want and connect them to the bridge, we move to the controller side. Till here, we have built our data plane wicth needs to be managed by a decoupled controller. That is, the main characteristic of SDN is decoupling the control plane from the underlying network (data plane). There are many OpenFlow controllers in the market but few of them have considerably been maintained by the community, like: OpenDaylight (hosted by the Linux Foundation), Floodlight and POX.

3. OpenDaylight controller: In this tutorial we will use OpenDaylight (its pretty GUI may be a good reason to choose it) controller to manage our data plane. Dwonload one of the OpenDaylight releases here and follow the installation guide then the user guide present in the same page. I have installed Hydrogen, Virtualization edition, for this tutorial.

Now we have our data plane done and our controller installed, we need to connect them so we can manage the data plane through the controller.

First we run OpenDaylight controller: Go to the installation folder and type:

sudo ./run.sh

We connect the switch to the controller using this command on a second terminal:

sudo ovs-vsctl set-controller br0 tcp:[controller IP@]:[controller listening port]

Since our switch and controller are on the same machine, the command will be:

sudo ovs-vsctl set-controller br0 tcp:127.0.0.1:6633

OpenDaylight uses OpenFlow default port: 6633.

This line will appear on the first terminal (controller terminal):

connection

To make sure the connection was correctly made, type:

sudo ovs-vsctl show

witch displays the bridge interfaces and the connection state:

sure

That’s all … Congratulation! your OpenFlow test lab is now ready to host your novel research scenarios 🙂