In this lesson we’re doing an overview of DevOps and the culture, practices and tools that are involved with the DevOps philosophies and workflow.
See, DevOps was really created to address a pretty specific gap in communication and efficiencies between our developer teams and our operations teams. So let’s jump right on in and take a look at what these philosophies are and how we can use it for network operations and our automation ventures as well.
First up, why DevOps? Devops was created to do things in a consistent, reliable and controlled manner. It’s a lot about making sure our IT moves faster, and is not a bottleneck for our organization. Faster, while also being more reliable, and a big piece of that is through the mindset of doing things in very small, modular changes. That will allow us to have very consistent and reliable changes to production without having nearly as many issues, and a much more reliable network and system as a whole.
In many teams, what happens is you have your development team, and you have your operations team. Your development team goes ahead and creates some new feature or production change and throws that code base over to operations and says “here you go”, and that’s where the communication stops. You’ve got this wall between them, where not a whole lot of communication and collaboration is happening between the teams. The development engineers might get some trouble tickets escalated, of issues found by the operations team through managing the changes in production, but there’s not usually a whole lot of communication and collaboration there. Increasing that communication and collaboration would ensure that the two teams are in alignment for the changes that they’re making and understand each other’s goals and priorities more.
See, these two teams generally have competing goals. Which you can imagine is not very conducive for having an efficient team. DevOps is a philosophy for trying to unify these teams. Development teams and engineers are usually focused on creating new features, deploying new widgets and coding up new changes. Operations on the other hand is really focused on the stability of the existing network and the existing systems, and making sure that existing functionality is maintained. That’s why Operations operates the network. Operations does regular maintenance, break fix and monitoring to ensure that the current system is performing as expected.
So, DevOps is here to try and speed up development, and give us reliable service delivery. As a side effect of this, we end up improving the overall quality of our product as well; be it the network services, infrastructure, or our software or code base to our internal customers or external customers. As well, we end up with a byproduct of improving the security of our overall product. So, that’s why DevOps.
Now, let’s take a closer look at what DevOps culture really is. You can have all of the DevOps workflows and tool sets, everything under the sun; there’s a whole lot of documentation out there because DevOps as a mindset has been around for quite some time and really done wonders for the software development side of the world. However, if you haven’t employed and really fostered the DevOps culture and philosophies that are behind it, then you’re kind of missing the point. So let’s take a look at that here.
DevOps really is less process and more culture. It’s a set of practices and values highlighting that people and communication are really key to our collaborative environment. It’s the idea of unifying development and operations, that these aren’t two separate teams that are trying to compete or butt heads at all, we want to get rid of that competing culture here and really have more of a unified team. We’re all on the same team here, we’re all trying to put out new features and move faster for either our internal customers or our external customers and really bring value to them. Our overall goal is to enable the business to do what the business finds value in.
A lot of businesses generate revenue performing IT service explicitly, it’s a core competency for them. In that case, development is a primary revenue generating department of our business, however that’s just not always the case. You might have an HVAC company, where we have an IT department completing projects and developing products for the internal customers. If they can’t manage the new CRM software or databases, or can’t get our network infrastructure to a point where we can start opening branch offices throughout our country then IT is really the bottleneck to our business at that point. The IT department would not be enabling the business to move at whatever speed the business is capable of, in whichever direction sees most benefit. We’re giving them some constraints which is not what IT is supposed to do.
So, DevOps achieves this cultural shift really through the three ways
First up: flow
Flow refers to the flow of the system as a whole, or some people refer to it as systems thinking. What this really means is to never let local improvements of an individual area of the whole system hinder the flow of the system as a whole. It might seem like that you’re improving an individual team however if you are not enabling that team to help the system as a whole then your whole system flow will then be hindered.
so two main points for this here is that we want to seek to increase the rate of flow of the system as a whole, and avoid technical debt. You may have heard that term before. Technical Debt is what happens usually when you put a band-aid on something. When you have a break fix situation and you have two options, one is the quick fix that will get things back online right away, the other one is the fix that is the right way it’s supposed to be done and you won’t have to come back and fix something later.
Usually, the quick fix is what gets put into place. You might have people breathing down your neck, looking to get this outage resolved as quickly as possible, so you take the quick fix and get that up and running as soon as possible. The issue is, a lot of times these band-aids end up getting forgotten, and even if they don’t you still now have something you need to go back and fix that perhaps you weren’t intending to later, and that is technical debt.
Next up of the three ways: feedback
Quick feedback is really the key to making quick course corrections. If I am doing something incorrectly now, but I don’t find out about it for months then that doesn’t help me very much and I have months of time where I’ve been doing the wrong thing. The faster that I can get feedback, the better it is. This is where DevOps ends up saying that we should allow for quick, continuous feedback and analyze feedback at every stage of development. We should build feedback into our development workflows and analyze that feedback as we are going. This allows us to make smaller, more incremental course corrections, rather than finding out we have gone completely off track much later in the game and have a whole lot of rework to do.
The last of the three ways: continuous experimentation and learning
So, the third way, right, is all about fostering an environment of taking risks and learning from failures. If there’s a better way to do something, you will never really find out about it unless you experiment with it first. A lot of times your engineers, your day-to-day workers, are the ones who are going to be able to most readily see these things that can possibly improve. So this is about in fostering an environment to allow them to take some risks and make some changes to their workflow and see if it works. Then learning from the failures and finding fault with the process and not with the people.
If you’ve ever been in a Blameless Root Cause Analysis before, and if you haven’t then as a network engineer I’m sure you will be at some point, this is all about getting together all the information of an event. Usually an outage or some other production impacting event. In these meetings, we try to gather all the information about what all involved parties were doing and what information we had, so we can improve our response in the future. Trying to find where things went wrong can be a little difficult, when you need to raise your hand and start saying what it is that you did and you know that what you did was actually just not right but you made a judgment call at the time.
Hindsight is always perfect. It can be really difficult to talk about the fact that you know what you did was wrong but you made a judgment call and you did it. However, it’s about finding fault with the process, not the person. Perhaps in that circumstance, your judgment call should have been peer-reviewed, perhaps you should have had an escalation point where you could talk out the options on how to resolve this.
Finally, along with the continuous experimentation and learning we have to allocate time for the improvement of daily work. The more that you can improve the efficiency and the enjoyment of your daily work, the better your work as a whole will be. That’s where we live a lot, is in our daily work. We have our more repetitive things that need to be done and our everyday tasks that need to be done and the better we can make that, the more efficient we can make it.
Let’s talk a little bit about our continuous practices with DevOps. A lot of DevOps is about continuous thinking, continuous improvement, continuous integration, deployment and delivery. CI / CD which is continuous integration and continuous delivery / deployment. These are actually two separate steps. Delivering your code base or product, and then deploying it out into production. Our continuous integration is the concept of frequently committing your changes, and automatic testing of your changes. Be it code or infrastructure as code, which we’ll talk about shortly.
Then, our continuous delivery is delivering those integrated changes as often as possible and deploying them to production as often as possible. The smaller that we can make our changes to production, the less likely it is that we will have problems with each change, so the more likely it is we will have a reliable push to production. You can imagine that even if there are problems, it’ll be much shorter amount of troubleshooting time involved when you only have a small change to troubleshoot. So, it’s about trying to keep that continuous flow of our changes to our production system, be it to a code base or to our network or our system.
We’ve been talking a lot about software development and things closely related, it’s hard not to since these philosophies and practices were born in that industry. Let’s take a closer look at really how this applies for network operations.
So, network operations works a little differently of course. DevOps is really about taking the network automation concept, with a DevOps approach. Where DevOps has really sped up software development, network operations usually works a lot in the same kind of way it always has. Box-by-box provisioning practices, where we are plugging in each one to the lab and getting them provisioned out with our baseline template and manually typing up different configuration changes. Perhaps copy-pasting them with notepad over onto the box, and things of that sort.
There’s a whole lot of improvement to be made there with automating our provisioning and configuration changes, and automating our testing. That way we don’t have these maintenance plans that include individual manual testing steps, that we can actually automate them and have a log that’s stored in a database showing that our testing steps did succeed. This would enable us to have maintenance windows take less than an hour, instead of multiple hours!
Wouldn’t that be nice? To only be up from midnight to 1am doing your after-hours maintenance, instead of midnight to 8am? Network automation with a DevOps approach will increase the speed of changes to our production, of changes to our service deployments, and changes to our service offerings. This results in improvements to the quality and overall reliability of our network, because we are automating. We are moving quicker and spending time on more value-added problems of trying to automate the repetitive tasks that are done, completing in a more consistent and reliable manner. We’re lowering the cost of our overall network operations, because we’re not doing so much firefighting all the time.
Moving on a bit, let’s see what it means when we talk about our infrastructure being represented as code. That’s really a lot of what we need to do in order to automate more of our network, is represent our Infrastructure as Code, IaC. A sample IaC workflow, you code up your changes to infrastructure, you commit them to a version control system, it goes through a code review process with your team, you integrate that code into the master, and then you deploy that code out to production. That would be your infrastructure as code workflow.
Finally, I wanted to show a fun infographic of various DevOps toolsets. This is really to show that there’s no single set of tools that enables these workflows, there are many to choose from that can meet the needs of each team.