Continuous Deployment of Load Balancer Configurations

I thought I’d describe some optimizations I’ve made to my load balancers at work, both for the good of the older me, and in case someone would benefit from some of my ideas.

Background

The load balancers are based on four software packages that integrate to create a powerful whole:
Keepalive Daemon provides a common set of virtual IP addresses and ensures that failover happens to a Backup server if the Master would cease responding.
HAProxy does most of the actual load balancing and mangles network traffic when required.
SNMPD throws SNMP trap events from keepalived whenever a failover occurs.
The Zabbix Agent enumerates current configuration and system state for detailed system monitoring.

Now, all of these components get the occasional configuration change, except for HAProxy, which pretty much sees changes on at least a weekly basis.
The procedure for updating the configuration must cover the following steps:

  1. Run a pre-check to confirm that both load balancers in the pair work; we don’t want to initiate an automated update that could kill off service availability completely.
    On the Backup load balancer node:
  2. Backup the current configuration.
  3. Deploy the new configuration.
  4. Reload services.
  5. Run a post-op check on the secondary node to confirm that the new config hasn’t broken anything important.
  6. Fail over operations from the Master load balancer node to the Backup node and repeat steps 2-5 on the Master node.
  7. Perform a final check on the load balanced services to confirm functionality hasn’t been lost.

From experience, this procedure is tedious to say the least. In addition there’s always the risk of introducing a change to an active load balancer and forgetting to deploy the same change to the backup one; something that may not become obvious until after the next major configuration update when the last change disappears and functionality breaks.

These are just the most obvious arguments for an automated and version controlled deployment procedure. So how do we go about that?

Version control

In my case, I use Git connected to a GitLab server for version control, and Ansible for automation.

Configuration changes are prepared in a development environment, from which the relevant files are committed to a git repository.

Other components in the load balancer config – Lua scripts or tools made by our developers are stored in other repositories, and can be pulled by git before a new deployment.

Ansible structure

For each load balancer pair, I’ve built a directory structure containing a playbook directory for the Ansible YAML scripts, and a filesystem directory that mirrors the movable parts of the load balancer, where the relevant parts exist in the etc directory tree.

Automation

Deployment is initialized by a shell script that git-pulls the latest versions of dependencies we have and then ensures that the Ansible playbooks can work on remote computers by wrapping them in an ssh-agent environment.
The execution of Ansible playbooks happens from within a session script called by the ssh-agent.

Ansible-specific tips

The key to ensuring that the production environment doesn’t break lies in the header of the playbook:

The serial keyword makes the script work on one server at a time rather than executing in parallel.
The any_errors_fatal parameter is combined with relevant service checks interspersed among the deployment tasks to ensure that the script fails fast and loudly if a backend web service stops responding while deployment is underway, so that we don’t break both servers in a pair. Note that this requires some thought on the part of the person running the scripts, so they fix the problem before re-attempting to run the script, or fecal matter will hit the fan quickly enough.

The most basic of tests just ensures I can reach the statistics page of my load balancer:

A typical file copying task:

As a side note: Since I don’t want the script to have to care about which server is which, I’ve created one config file for the keepalived master instance and one for the slave. On the actual servers, a symlink points to the correct configuration for the instance.

By reloading the HAProxy service, existing sessions are not lost even though the configuration gets updated. As a bonus, in the Ansible service module, the reloaded state request also starts the service if it wasn’t started before.

With way less than a day’s worth of work, a workflow has been introduced for the deployment process that is repeatable and that mitigates some of the risks involved in letting humans tamper with production systems.

Securing an Internet accessible server – Part 3

This post is part of a series. Part 1, Part 2.

In the last part I briefly mentioned load balancers and proxies. After thinking about it for a while, I realized I see no reason not to run one, since it simplifies things a bit when setting up secure web services. In this part, we will be setting up a HAProxy server which won’t actually load balance anything, but which will act as a kind of extensible gatekeeper for our web services. In addition, the HAProxy instance will act as the TLS termination point for secure traffic between clients on the Internet and services hosted on our server(s).

This article is written from the perspective of running HAProxy on a separate virtual machine. That’s just for my own convenience, though. If you’re running pfSense for a firewall, you already have HAProxy as a module. It is also possible to run HAProxy directly on your web server, just logically putting it in front of whatever web server software you’re running.

Let’s get started. This post will be a rather long one.

Continue reading “Securing an Internet accessible server – Part 3”

WordPress behind HAProxy with TLS termination

My current project has been to set up a publicly accessible web server with a decent level of security. It has been an interesting exercise in applying “old” knowledge and gathering some new.

This weekend I finished this project for now. The current setup is as follows:
Behind my firewall, where I NAT port 80 and 443 for http and https traffic, I have set up a HAProxy instance. This allows me to do some interesting magic with incoming traffic.

In addition to the traffic manipulation, I also use the HAProxy server for contacting Let’s Encrypt to renew my TLS certificates, and for terminating TLS traffic. The latter has two reasons: a) I’m frankly too lazy to automate installing updated certificates on the web server, and b) I’m running the entire solution on so limited hardware that I’m a little bit worried about putting too much of a strain on it should there ever be a bit more traffic on the machine.

The web server is an Nginx running this very WordPress instance.

Let’s Encrypt configuration

I took the best parts from two different solutions to automate the relatively frequent certificate renewals that Let’s Encrypt enforces. I began by installing the HAProxy ACME Domain Validation Lua Plugin into HAProxy, which ensures that there’s a valid listener to show that I own my domain when I trigger the letsencrypt client program. The beauty of running this through HAProxy is that the process requires no downtime.

For the configuration of the letsencrypt client, I basically stole the scripts from Martijn Braam’s blog BrixIT, just adapting them to the fact that there was a listener provided through the Lua script. The benefit from doing it this way is that the BrixIT method is considerably more flexible than the Lua script when one expects HAProxy to use more than one certificate.

Example config:

/etc/haproxy/haproxy.conf

The last line also shows how to redirect regular http traffic to a https listener.

/opt/letsencrypt-haproxy:

TLS termination configuration

The problem with terminating TLS traffic before the web server, is that any good web application should be able to recognize that the client is coming from an insecure connection. Luckily, we can use HAProxy to tell WordPress that the connection was good up until the load balancer and to trust it the rest of the way. Be aware that this is an extremely bad idea if there is any way to reach the web server other than via your HAProxy:

/usr/share/wordpress/wp-config.php:

/etc/haproxy/haproxy.cfg:

As a final touch, I copied the brute force sandboxing scheme straight from this blog post by Baptiste Assmann over at haproxy.com.

 

Securing an Internet accessible server – Part 2

In part 1 we made it significantly harder to gain access to our server once it is opened up to the Internet – but we’re not quite ready for that yet. In this post we’re exploring a firewall in Ubuntu, ufw, which stands for “uncomplicated firewall”, and we’ll set up some additional hardening using Fail2Ban to protect ourselves from some common repeated attacks.

Continue reading “Securing an Internet accessible server – Part 2”

Securing an Internet accessible server – Part 1

This article is part of a series. Part 2.

Let’s look at a simple scenario, and see how common tools in the Linux and BSD world can help us:

We want to be able to remote control a server from wherever in the world, but we really don’t want others to be able to log on to it.

In the real world, this is common enough. Understandably, though, anyone who even has a slight understanding of the risks involved will be somewhat nervous about creating a potential hole in the barricades protecting their network. With a little knowledge, we can achieve the relevant access while minimizing the risks.

In this first part, we’re configuring the Secure Shell for asymmetric key logon rather than the generally speaking less secure username/password combination we’re used to.

Continue reading “Securing an Internet accessible server – Part 1”