Author: Mike Watts
Date: 15th June 2020
Quite rightly, everyone now a days is concerned about the environmental impact of what they do as a business and this has never been more the case than of Data Centre and Managed Services sectors. When you are providing services rather than manufacturing a product it is difficult to gauge the output versus what is inputted, compared to a industry like manufacturing where you end up with a defined product through a defined production method. Looking at this in an oversimplified way you add X amount of raw materials mixed it with Y amounts of energy add a sprinkling of resources ending up with a quantifiable number of products, by-product and waste. These can then be reviewed, processes amended, investments made to increase production and reduce the by-products creating more efficacy and less environmental impact.
But when you are providing IT infrastructure and services in general whether this is technical assets for IaaS or PaaS or infrastructure assets such as power and cooling for Colocation, you know what your energy inputs and efficiencies are for the use of that power and there are industry standard measurements for calculation and tracked using PUE or DICE, but how do you measure your outputs.
When the outputs are the technologies your customers are delivering, these technologies have different outputs measured differently for each customer, platform and service therefore having different KPI’s or metrics associated. For example for payment gateways being transactional these could be measured in transactions per second, time to complete transaction or percentage of first time successful transactions. In the world for storage these may be measured in Input Output Operations (IPOP’s) or effectiveness of compression on storage and for messaging and forums, the number of posts or messages against latency at times with peak concurrent users. With this great jumble of potential measurement points for each type of service, one thing that is consistent against all of them is availability, therefore if you are providing IT platforms or Colocation the requirement for availability of the services becomes the baseline for measuring output of a Service Provider.
With availability becoming the baseline you then need to start looking at resilience and redundancy of the services you provide and how they can be made as resilient as possible, therefore increasing availability (Output). The first thought of “Its Easy” I will just have two of everything (Called N+N or 2N) that way if one fails, has a fault or needs maintenance I still have the same again and therefore availability is king. Deploying in this manor can come with a whole list of its own technical issues but normally the biggest consideration here is costs, you have just doubled your initial capital expenditure, operational costs for maintenance and support contractors, plus the environmental impact is now twice what it was.
Therefore, planning resilience and redundancy is never a simple thing due to the knock-on impacts, putting application specific requirements to one side looking at this holistically as every system, application and platform has its own nuances. Although there are some baseline questions to start with that apply to infrastructure and systems:
1. Can I load balance? (The use of more than one asset at once to provide the functionality of one larger asset)
2. What are the overheads of load balancing? (Devices are X% efficient and therefore for every Y I lose a whole device in inefficiency)
Where is the balancing point between efficiency and resilience?
Are there additional infrastructure requirements for load balancing?
3. What is the mean time between failure (MTBF) of a device?
4. Am I required to perform intrusive maintenance on these assets?
If so, how often?
What is the impact? Do I need to mitigate this?
– Can the assets run as Active / Active or are they Active / Passive?
If Active / Passive what is the fail over time? Do I need to mitigate this?
With these kinds of questions having been answered you can start the risk assessment process. The great “What ifs”? What if unit A fails? What if that share sprocket falls off? What if it is raining on a Tuesday and Sunny on a Wednesday?
This should all lead to some simple statements which will identify the amount of downtime you can afford, the speed you can recover and therefore what levels of resilience you need.
This now allows you to design infrastructure to suit, is it an N just what is “Needed” to run this service, downtime and recovery times are suitable to account for a failure or maintenance, is it N+N “Need + Need” for every one I have a whole spare one, so therefore utilisation is never greater than 50% (or looking at it another way wasting 50% under normal operations), or is N+X so “Need +1 or +2 etc” This is where sometimes it is better and more cost effective to deploy three instead of two, I need two for operations and I have one spare this means that your “Wasted” resource is now 33% not 50% and this is where it starts to scale and order of magnitude starts to take effect, I need 9 but I have 10 so you are still N+1 but “Wasted” resource is now 10% so even if you start to look at things like N+2 or N+3 your resilience still increases but your effectiveness is still better than N+N.
Sometimes it is more appropriate to have a single large asset for systems that can afford downtime, this also is normally the most efficient with only a single set of losses to consider. The N+X deployment is now becoming more standard with shared resourcing, load balancing and not wasting the resource overheads of N+N. Every need, system and infrastructure are different, there are some systems which will always run as N, others at N+X and others that will for the foreseeable future will always remain at N+N.
BUT when planning infrastructure and adding layers of resilience and redundancy consider the impacts not only financially but environmentally as well. Having layer upon layer of resilience can build up to a massive impact of the environment for similar results being delivered in alternate methods.
In our collection of whitepapers, Cyberfort’s cyber consulting experts explore issues from cyber threat intelligence to incident planning and data security. Read our whitepapers to help make informed decisions for the benefit of your business.Learn more >