How AIMachine Learning can Revamp Data Centers Operations
Data is going to drive digital businesses in the coming days and the rate at which data is getting generated is very high. Data Centers (DC) are going to be nerve center for this data driven economy and there would be significant amount of capacity expansion in the coming years.
The amount of power which is used by data center is also increasing at an alarming rate. The power consumption by Data center worldwide was about 3 percent about 5 years back which has increased to about 6-7 percent of the total power produced. The rise of power usage needs to be managed and PUE (Power Usage Efficiency) factor needs to be reduced. Also, critical failures in DC operations can cripple a digital business hence these need to provide almost 100 percent availability.
Artificial Intelligence and Machine learning can play an important role in managing Data Center operations better. Machine learning is a field that studies how to design algorithms that can learn by observing data. Machine learning has been traditionally used to discover new insights in data, develop systems that can automatically adapt and customize themselves, and to design systems where it is too complex / too expensive to implement all possible circumstances, for example, self-driving cars.
Applications of Machine Learning in Data Center Management:
Delivering Better Power Usage:
Power usage is a major cost driver for Data Center and there are multiple factors which determine the power costs. Power Management has predominantly been managed with limited use of Software, hence most of the Data centers run at PUE of about 1.5 or above. The challenge lies in ability to marry the HVAC data with the IT equipment data and draw actionable insights to manage power usage better.
Software Algorithms based on Artificial Intelligence and Machine Learning can help simulate and manage elements which manage power requirements better. The Cooling, Chilling and other equipment’s can generate data in Real time with advent of IoT sensors and provide important data points for power management. The Machine learning algorithms will take these inputs and control the cooling requirements thereby reducing power usage. These algorithms need to learn to ensure that feedback is provided on real time to manage power assets better.
Prediction of Critical Outages:
Data Center is a critical operation which requires very high availability and outages could be very costly. IoT based sensors feed data of critical parameters to manage uptime and deterioration in any DC elements. The challenge is to analyze this large data set in real time and to get actionable insight into the same. The main task is to correlate events, tickets, alerts, and changes using cause-effect relationships, for example, linking a change request to the actual changes in the environment, linking an APM alert to a specific environment, and linking a log error to a particular web service, etc. As we are dealing with various levels of unstructured data, the linking process (or correlation) is not that obvious. This is a perfect task for machine learning as it can create general rules between different data sources, determine how to link them to environments, and when it makes sense to do so. These models can help in prediction of DC failures thereby increasing reliability.
Management of DC Capacity:
Data centers capacity planning and management has been complex subject for long and has evolved over years. The advent of Software Defined Architecture which is run by the webscale hardware needs to be managed very differently unlike traditional Data Centers. The modern data centers can manage capacity in real time by doing auto-provisioning and de-provisioning thereby using Artificial Intelligence to orchestrate the entire operations.
The new hardware and software are tightly coupled (for e.g. Google TPU and Nvidia Machine Learning Chips) can use these AI / Machine Learning Algorithms and manage them. This will help in better management of IT assets thereby reducing the cost of Operations.
The advent of ML / AI and its application to DC is picking up quite fast. A lot of players are working on building models and testing them which can disrupt the Data Center landscape. It’s not long before the PUE for a well-managed data center can drop to 1.1 and have very high availability of almost 100 percent with significantly reduced cost.