|By Don MacVittie||
|May 16, 2016 04:13 AM EDT|
Full Datacenter Automation – minus the AI (for now)
In Arthur C Clark’s 2001: A Space Odyssey, HAL 9000 was the AI. Everyone knows that. But more relevant to todays’ automation efforts in the datacenter, HAL also controlled all the systems on the space ship. That, in the long run, is what we’re headed for with operations automation and DevOps. We want not just management tools that are automated, but orchestration tools that are too, and the more automation the better. In the end, it will require a human to replace physical hardware, but all software related issues from the beginning of installation to the end of life for the app would ideally be automated with as little human intervention as possible.
This still makes some operations folks antsy. Frankly, it shouldn’t. While the ability to quickly deploy systems, upgrade systems, and fix some common problems – all automated – will cut the hours invested in deploying and configuring tools, it will not cut hours. The reason is simple… Look at what else we’re doing right now. Cloud is being integrated into the DC, meaning we have double the networks and security systems to maintain, if the cloud is internal we have another whole layer of application to maintain – today that is easier with VMware than OpenStack, but neither is free in terms of man-hours, and we’re working on continuous integration. That’s before you are doing anything specific to your business/market. But enough, I digress.
The thing about server provisioning, no matter how thorough it is, is that application provisioning is a separate step. If you’ve been reading along with my thoughts on this topic here and at DevOps.com, then you know this already.
By extension, the thing about Full Layered Application Provisioning – FLAP as presented at DevOps.com – is that it too leaves us short. You have a server, fully configured. You have an application (or part of an application in clustered services scenarios, or multiple applications), and it is ready to rock the world. Totally configured, everything on that box from the RAID card to the App GUI is installed and configured… But the infrastructure hasn’t been touched.
This is a problem most of the marketplace recognizes. If you look close at application provisioning tools like Puppet and Chef, you can see they are integrating networking infrastructure configuration into application provisioning through partnerships.
This is a good thing, but it is not at all clear that application provisioning is the right location in the operations automation stack to put this type of configuration. While you could make a case that the application owner knows what they need in terms of security, network, and remote disk, you could also make the case that because these things are limited resources placed on the corporate network for shared use, a higher level than the application provisioning tool should be handling these configurations.
Interestingly, in many of these cases the real work is integrating the automation of the tool in question with your overall processes. One of the last projects I worked on at F5 Networks was to call their BIG-IQ product’s APIs and tell it to do what I needed when I needed it as part of a larger project. This is pretty standard for the orchestration piece of the automation puzzle, and the existence of these types of APIs explain the move by application provisioning vendors to put this control into their systems.
Let’s stop for a moment and talk about what we need to have in place to build a HAL like control layer. There is a combination of pieces that can be divided numerous ways (and let me tell you, writing this I worked on graphics or whiteboard drawings to reflect most of those ways). Assume in the following diagrams that we are not simply talking about deployment, we are also talking about upgrades and re-deployment to recover post hardware error or software instability. That simplifies the drawings enough that I can fit them into a blog, and is useful for our discussion.
Assume further that in this diagram there is also a “Public Cloud” section that merely has the top part of the private cloud – with no infrastructure on site and in the realm of operations’ responsibility, it is the part beginning with “Instance spin up”, but otherwise the required steps are the same.
In an attempt to keep this image consumable, you will notice that I ignored the differences in configuration between VM and Container provisioning. There are differences, but there are more similarities from a spin-up perspective – server provisioning products like Cobbler and Stacki treat both as servers, for example. Truth be told, from an operations perspective containers lie somewhere between cloud (pick a pre-built image and spin it up) and VM (install an image and the apps that run on it). It should have its own stack in the diagram, but you can see it was getting rather tight, and since it shares traits with the other two, I decided to lump it in with one of them.
Those who are familiar with Cloud and VM both will take issue with my use of “OS Provisioning” for both – they use entirely different mechanisms to achieve that step – but they both do need to have configuration done on the OS, so I chose to include the step. A cloud image needs to have its IP and connections and storage all set up, some pre-built cloud images actually take a lot of post-spin-up configuration, depending upon the purpose of the image and what technologies it incorporates. So while on the VM side provisioning includes OS install and configuration, on the cloud side it involves image spin up and configuration.
And even this image doesn’t give us the full picture for data center automation. If we shrink this image down to a box, we then can use the following to depict the overall architecture of a total automation solution:
In this diagram, “Server Provisioning” is the entire first diagram of this blog, and the other boxes are external items that need configuration – NAS or SAN disk creation (or cloud storage allocation), Application security policy and network security configuration, and the overall network (the subnet config, VLAN config and inter-VLAN routing, etc). These things could be kept in the realm of manual automation because they don’t generally change as much as the servers utilizing them, but they can be automated today… The question is if it’s worth it in your environment, and I don’t have those answers, of course, you do.
We’re moving more and more in this direction, where you as an administrator, ops, or devops person will say “New project. Give me X amount of disk, Y ports on a VLAN, apply these security policies, and allocate Z servers to it, two as web servers, the rest as application servers with this engine.” And that will be it, the environment will spin up. Long term the environment will spin up in spite of errors, but short term, the error correction facility will be that subset known in some other great sci fi books as meatware.
What can you do to prepare for this future? Well, the best first step is to get server provisioning (first with hardware and VMs – because they’re basically the same, and someone will always spin up the hardware) down, then get it down with Cloud and Docker. Finally become an expert on one of the application provisioning tools. In essence, the contents of that first diagram are very real today, while the bits added in the second are evolving rapidly as you read this, so work on what’s real today to increase understanding and speed adoption. It helps that doing so will (after implementation) free up some time.
Of course I have my preferences for what you should learn (I DO work for a hardware/server provisioning vendor after all), but I would refer you to my DevOps.com articles linked above for a more balanced look at what might suit your needs if you’re not already started down this path.
The other thing you can do is start to look at logging and monitoring facilities. They will be an integral part of any solution you begin to look at – you cannot resolve problems on systems that just sprouted up on demand unless you can review the logs and see what went wrong. In an increasingly complex environment, that is more true than ever. I’ve seen minor hardware issues bury an entire cluster, and without log analysis, that would have been hard to track down.
It’s getting to be a fun time in the datacenter. Lots of change, thankfully much of it for the better!
- Cloud Database – Are You Prepared?
- The Storage Future Is Cloudy, and It's About Time
- Making Android SSL Work Correctly
- Oracle Enterprise Manager Grid Control Plug-in for BIG-IP LTM Beta
- CIOs Should Know that IT Is Infrastructure as a Service
- A Rose by Any Other Name - Appliances Are More Than Systems
- So You Think You're Enterprise Class?
- Minnesota and Private Cloud
- v.10 - iSessions in the Cloud (or a remote data center, you choose)
- Like “API” Is “Storage Tier” Redefining Itself?