The reliability of a computing web infrastructure can be measured by system uptime, which is the percentage of time that a website is online and fully operational. System uptime is important for ecommerce organizations, especially with the increasing number of businesses who rely on internet-generated revenue as their primary means of selling.

Many pure-play ecommerce websites rely on service level guarantees of as-close-to 100% uptime as possible. Any outages or downtime can result in a substantial loss of sales, a weaker brand perception, less business productivity, and the possibility of loyal customers looking elsewhere for services: most likely your competitors.

How Downtime Can Impact an Organization

Downtime is a real possibility if your business does not invest in resilient hosting services. Downtime has struck several major online retailers in recent years such as worldwide fashion giant ASOS, who went offline for over 20 hours. There are no official figures from the outage, but it is expected to have had a significant financial impact.

The world’s biggest online retailer, Amazon.com, went offline for 13 minutes in July 2018 during its Prime Day discount promotion. Some online sources suggest the outage may have cost up to $90 million dollars. It is particularly common for online retailers to experience downtime during peak sales, especially on retail events such as Black Friday and Cyber Monday.

Protecting Against Downtime

There are many technical solutions that can help protect web assets from downtime. Three of the most popular are fault tolerance, high availability, and load balancing. Often these technologies intertwine and work directly with one another. Besides these recommended solutions, it is essential that all infrastructure architecture is built from a designed foundation of uptime, scaling, and stability cogitations.

Private and public managed service providers (MSPs) design and build resilient computer infrastructure from the ground up. Data center facilities are created to be fault-tolerant, which often includes multiple power feeds from the national grid, backed up by many fossil fuel generators.  Cooling systems are usually in pairs for redundancy, and there are typically redundant power feeds direct to each of the server racks.

Within the server racks, dual power feeds are provided to the compute infrastructure. At the hardware level there is at least N+1 redundancy to prevent single component failure; that might mean dual power supplies, dual network connection bonding, or a dual SAN setup. Storage will also be configured in a RAID configuration to prevent data loss upon disk failure.

Fault Tolerance

Fault-tolerant software-defined infrastructure is a key component to consider if instituting failover services. Fault tolerance enables a system to continue operating in the event of a failure or unexpected outage. The technology is commonly found within hyper-converged virtual infrastructure services.

The technology configures a fault-tolerant lead node that runs on a specific host. The virtual infrastructure creates a fault-tolerant secondary copy of the node on a different host and keeps both nodes in sync. In the event of a failure, the hypervisor simply fails over to the secondary node in a seamless manner by powering off the failing node and powering up the secondary node. The user will often never notice such an event has taken place, but the worst-case scenario might be a few seconds of micro-stutter.

High Availability

High availability (HA) is another technology which can help increase uptime on your web infrastructure. HA is not too dissimilar to fault tolerance; the key difference is that within a HA setup, at least 2 versions of the same server run concurrently, often in an active-active or active-passive configuration. Clustered resources, such as database resources, disks, and networking, are typically shared between the HA servers, which are often geographically disparate. Should one server fail, all the load will be transferred to the second server seamlessly.

Highly reliable computing resources are a must for any modern business. Critical business functions like e-mail, inventory management, online presence, scheduling, data protection, or remote collaboration all depend on these resources. Downtime can cost thousands of dollars per hour and seriously hurt your bottom line.

Failure Points for IT Infrastructure

But there are many potential points of failure. Hardware failures, power outages, natural disasters and data corruption can all threaten your computing uptime and seriously damage your business. Until a few years ago, the standard approach to achieving high availability was to purchase and implement two or more of everything. This meant redundant servers, internet uplinks, data storage, etc. With a single server costing $4,000 or $5,000, this level of redundancy was often prohibitively expensive right from the start. What’s worse, it did not protect against power failures, natural disasters, or security breaches.

In addition, the mere fact of adding complexity to the in-house IT infrastructure introduced technical complications such as data mirroring and hardware/software compatibility that many companies found difficult to manage, and which required specialized and dedicated personnel. The hardware and management costs added up to a very steep price while providing only partial protection against downtime.

Redundancy Increases Cost

So purely from a financial perspective, achieving better reliability by doubling your IT infrastructure meant a burdensome up-front cost and a difficult return on investment. Until recently, you had a difficult decision to make: absorb these costs or increase the company’s exposure to potentially catastrophic risk.

Fortunately, the advent of cloud hosting has changed the equation. Today, companies looking to get started, grow, or transition can have access to fully redundant computing infrastructure without paying the up-front premium and ongoing costs of redundant systems. Cloud servers take advantage of advanced data centers, economies of scale and professional, managed VPS Hosting teams to deliver the same level of computing infrastructure as a service. With a quality cloud server provider, the reliability is built-in, and you benefit from limited up-front costs, as well as amortization of IT expenses over the life of your contract.

Load Balancing

One of the most popular and commonly-used failover services is that of a load balancer. Load balancers are designed to point ingress traffic to the available resource on the network (usually a compute node). Load balancers are intelligent and will route traffic using a load balancing policy to available nodes. If any nodes fail, then the load balancer will remove the failed node from the pool and continue to serve traffic to other healthy nodes.

There are many types of load balancers available: HTTP(S), SSL Proxy, TCP Proxy, and internal network load balancers. Traffic is routed to the closest available node or instance and typically uses a method of CPU utilization, requests per second, or weighted round robin to determine which node the ingress traffic is transferred to.

How Load Balancers Work

A load balancer consists of a front end service, which is usually an external IP address, and a backend service which is the endpoint(s) for the traffic, usually a pool of servers. When traffic hits the front end service it is re-routed using a pre-defined policy to the backend service.

Load balancers offer greater flexibility and are extremely useful for web assets, as they can be used to push website updates “live” in a seamless manner. Typically, a systems engineer will drain several nodes from the load balancer, patch or update the website code, and then return the servers to the pool. This process can be repeated and the result on the user is zero downtime and an easy upgrade path. User sessions are gracefully terminated, and new sessions are rerouted to the new nodes.

If there were no load balancers, anyone who accesses a site is directed to the same server. That server is likely to be inundated with user requests (during peak times or as the site is becoming more popular). When an upswing in traffic occurs, people visiting the site will either experience slow page loads, or the server will start denying requests.

Implementing Failover Services

For assistance with implementing failover services as part of a HIPAA-compliant web infrastructure or some other implementation, contact the sales team at Atlantic.Net today!