How AIMachine Learning can Revamp Data Centers Operations

Piyush Kumar Chowhan, CIO and Vice President, Arvind Lifestyle Brands Limited | Friday, 01 September 2017, 04:58 IST

Data is going to drive digital businesses in the coming days and the rate at which data is getting gen­erated is very high. Data Centers (DC) are going to be nerve center for this data driven economy and there would be signif­icant amount of capacity expansion in the coming years.

The amount of power which is used by data center is also increasing at an alarming rate. The power con­sumption by Data center worldwide was about 3 percent about 5 years back which has increased to about 6-7 percent of the total power pro­duced. The rise of power usage needs to be managed and PUE (Power Usage Efficiency) factor needs to be reduced. Also, critical failures in DC operations can cripple a digital business hence these need to provide almost 100 percent availability.

Artificial Intelligence and Ma­chine learning can play an import­ant role in managing Data Center operations better. Machine learning is a field that studies how to design algorithms that can learn by ob­serving data. Machine learning has been traditionally used to discover new insights in data, develop sys­tems that can automatically adapt and customize themselves, and to design systems where it is too com­plex / too expensive to implement all possible circumstances, for example, self-driving cars.

Applications of Machine Learning in Data Center Management:

Delivering Better Power Usage:

Power usage is a major cost driver for Data Center and there are mul­tiple factors which determine the power costs. Power Management has predominantly been managed with limited use of Software, hence most of the Data centers run at PUE of about 1.5 or above. The challenge lies in ability to marry the HVAC data with the IT equipment data and draw actionable insights to manage power usage better.

Software Algorithms based on Artificial Intelligence and Machine Learning can help simulate and manage elements which manage power requirements better. The Cooling, Chilling and other equip­ment’s can generate data in Real time with advent of IoT sensors and provide important data points for power management. The Machine learning algorithms will take these inputs and control the cooling re­quirements thereby reducing pow­er usage. These algorithms need to learn to ensure that feedback is pro­vided on real time to manage power assets better.

Prediction of Critical Outages:

Data Center is a critical operation which requires very high availabili­ty and outages could be very cost­ly. IoT based sensors feed data of critical parameters to manage up­time and deterioration in any DC elements. The challenge is to analyze this large data set in real time and to get actionable insight into the same. The main task is to correlate events, tickets, alerts, and changes using cause-effect relationships, for exam­ple, linking a change request to the actual changes in the environment, linking an APM alert to a specific environment, and linking a log er­ror to a particular web service, etc. As we are dealing with various levels of unstructured data, the linking pro­cess (or correlation) is not that obvious. This is a perfect task for machine learn­ing as it can create general rules between different data sources, de­termine how to link them to envi­ronments, and when it makes sense to do so. These models can help in prediction of DC failures thereby increasing reliability.

Management of DC Capacity:

Data centers capacity planning and management has been complex sub­ject for long and has evolved over years. The advent of Software De­fined Architecture which is run by the webscale hardware needs to be managed very differently unlike tra­ditional Data Centers. The modern data centers can manage capacity in real time by doing auto-provision­ing and de-provisioning thereby us­ing Artificial Intelligence to orches­trate the entire operations.

The new hardware and software are tightly coupled (for e.g. Google TPU and Nvidia Machine Learning Chips) can use these AI / Machine Learning Algorithms and manage them. This will help in better man­agement of IT assets thereby reduc­ing the cost of Operations.

The advent of ML / AI and its ap­plication to DC is picking up quite fast. A lot of players are working on building models and testing them which can disrupt the Data Center landscape. It’s not long before the PUE for a well-managed data center can drop to 1.1 and have very high availability of almost 100 percent with significantly reduced cost.