Test whether a git pull is needed from within a batch script

Just a quick hack I did to avoid having to sync a couple of scripts unnecessarily when deploying my load balancers. Underlying idea stolen from a post by Neil Mayhew on Stackoverflow.

Shell session script:

#!/bin/bash
UPSTREAM=${1:-'@{u}'}
LOCAL=$(git rev-parse @)
REMOTE=$(git rev-parse "$UPSTREAM")
BASE=$(git merge-base @ "$UPSTREAM")
if [ $LOCAL = $REMOTE ]; then
    GIT_STATUS=nochange
elif [ $LOCAL = $BASE ]; then
    GIT_STATUS=changed
    git pull
fi

Ansible playbook:

---
vars:
-   version_status: "{{ lookup ('env', 'GIT_STATUS') }}"
-   tasks:
    -   name: Update HAProxy scripts
        copy:
            src: "{{ config_root }}/etc/haproxy/scripts"
            dest: "/etc/haproxy"
        when: version_status=="changed"

Environment variables for services in a systemd-based system

My current config deployment automation project has required me to set up a dev/staging environment for my load balancers, since I don’t want to break stuff by deploying untested configurations.
This environment is functionally identical to a single load balancer and can be used along with a hosts file on a client to not only develop configurations and make sure they are syntactically correct, but also to fully test the functionality of the load balancer rules.

As part of this, I naturally need to change the listener addresses in the dev/staging HAProxy environment compared to the production environment that I keep version controlled in my git repository.
My first instinct was to use a script to pull the latest versions, modify the necessary lines using sed, and copy the config to the correct location. I didn’t really like this concept since it would by definition mean that the configs weren’t fully identical between the production environment+git repo and the dev/staging environment.
If I used environment variables, the version controlled configuration could be kept fully identical across all instances.

The first mistake I made took me a while to grasp. HAProxy parsed an obviously fully valid configuration, but intermittently presented a certificate I didn’t expect, and web services intermittently failed to reply to requests.
It turns out Linux daemons don’t inherit even system-wide environment variables.
So how do we check what environment variables a service does see?
First get the PID(s) of the service:

$ pgrep haproxy
1517
1521
1523
$ 

In the case of my HAProxy installation, I got a list of three processes, so I chose the last one and checked out its environment:

# cat /proc/1523/environ

This results in a list of its current environment variables, and naturally the ones I thought I’d added were nowhere to be seen.

So why did HAProxy start without complaining? Naturally since the environment variables weren’t defined in this context, their implicit value was NULL, and so HAProxy figured I wanted to listen on the assigned ports on all interfaces.

How do we assign environment variables to a service in a modern, systemd-based Linux, then?
On a command-prompt, run systemctl edit servicename. This starts your default editor. A valid config looks like this:

[Service]
Environment=ENVVAR1=value
Environment=ENVVAR2=value

On Ubuntu, this file is stored in /etc/systemd/system/servicename.service.d/override.conf, but naturally this file can be renamed to something more descriptive. The systemctl edit command doesn’t do anything magical, it’s just a shortcut.
After the file is in place, run systemctl daemon-reload to make the configuration active, and then the HAProxy service needs to be restarted, not only reloaded, for the changes to apply.

Of course, I want this config too to be deployable through Ansible.
The relevant lines from my playbook:

---
    -   name: Update environment variables for HAProxy service
        copy:
            src: "{{ config_root }}/etc/systemd/system/haproxy.service.d/10-listeners.conf"
            dest: "/etc/systemd/system/haproxy.service.d/"
        register: ha_envvar_status
    
    -   name: Reload systemd service configuration
        systemd:
            daemon_reload: yes
        when: ha_envvar_status|changed
...
    -   name: Reload HAProxy configuration
        service:
            name: haproxy
            state: reloaded
        when: ha_envvar_status|skipped and haproxy_cfg_status|changed
        register: reloaded_haproxy

    -   name: Restart HAProxy daemon
        service:
            name: haproxy
            state: restarted
        when: ha_envvar_status|changed or (haproxy_cfg_status|changed and not reloaded_haproxy|skipped)

Key lines:
The register line stores the result of the command to a variable. Thanks to that, I can use the when keyword to only reload daemons when anything has actually changed.

Summary

Linux daemons don’t automatically inherent environment variables.
In systemd-based distros (which today means pretty much anyone with corporate backing), environment variables can be added using the Environment keyword in the Service section of a file in /etc/systemd/system/servicename.service.d/.

Continuous Deployment of Load Balancer Configurations

I thought I’d describe some optimizations I’ve made to my load balancers at work, both for the good of the older me, and in case someone would benefit from some of my ideas.

Background

The load balancers are based on four software packages that integrate to create a powerful whole:
Keepalive Daemon provides a common set of virtual IP addresses and ensures that failover happens to a Backup server if the Master would cease responding.
HAProxy does most of the actual load balancing and mangles network traffic when required.
SNMPD throws SNMP trap events from keepalived whenever a failover occurs.
The Zabbix Agent enumerates current configuration and system state for detailed system monitoring.

Now, all of these components get the occasional configuration change, except for HAProxy, which pretty much sees changes on at least a weekly basis.
The procedure for updating the configuration must cover the following steps:

  1. Run a pre-check to confirm that both load balancers in the pair work; we don’t want to initiate an automated update that could kill off service availability completely.
    On the Backup load balancer node:
  2. Backup the current configuration.
  3. Deploy the new configuration.
  4. Reload services.
  5. Run a post-op check on the secondary node to confirm that the new config hasn’t broken anything important.
  6. Fail over operations from the Master load balancer node to the Backup node and repeat steps 2-5 on the Master node.
  7. Perform a final check on the load balanced services to confirm functionality hasn’t been lost.

From experience, this procedure is tedious to say the least. In addition there’s always the risk of introducing a change to an active load balancer and forgetting to deploy the same change to the backup one; something that may not become obvious until after the next major configuration update when the last change disappears and functionality breaks.

These are just the most obvious arguments for an automated and version controlled deployment procedure. So how do we go about that?

Version control

In my case, I use Git connected to a GitLab server for version control, and Ansible for automation.

Configuration changes are prepared in a development environment, from which the relevant files are committed to a git repository.

Other components in the load balancer config – Lua scripts or tools made by our developers are stored in other repositories, and can be pulled by git before a new deployment.

Ansible structure

For each load balancer pair, I’ve built a directory structure containing a playbook directory for the Ansible YAML scripts, and a filesystem directory that mirrors the movable parts of the load balancer, where the relevant parts exist in the etc directory tree.

Automation

Deployment is initialized by a shell script that git-pulls the latest versions of dependencies we have and then ensures that the Ansible playbooks can work on remote computers by wrapping them in an ssh-agent environment.
The execution of Ansible playbooks happens from within a session script called by the ssh-agent.

Ansible-specific tips

The key to ensuring that the production environment doesn’t break lies in the header of the playbook:

---

-   name: Update PRODUCTION load balancer configuration
    hosts: lb_hadmzprod
    serial: true
    any_errors_fatal: true

The serial keyword makes the script work on one server at a time rather than executing in parallel.
The any_errors_fatal parameter is combined with relevant service checks interspersed among the deployment tasks to ensure that the script fails fast and loudly if a backend web service stops responding while deployment is underway, so that we don’t break both servers in a pair. Note that this requires some thought on the part of the person running the scripts, so they fix the problem before re-attempting to run the script, or fecal matter will hit the fan quickly enough.

The most basic of tests just ensures I can reach the statistics page of my load balancer:

    -   name: Fail task if lb1 is unavailable
        uri: 
            url: https://lb1.domain.com:1936

A typical file copying task:

    -   name: Update Keepalived configuration
        copy:
            src: "{{ config_root }}/etc/keepalived/{{ item }}"
            dest: "/etc/keepalived/"
            mode: 0600
        with_items:
        -   keepalived-master.conf
        -   keepalived-slave.conf

As a side note: Since I don’t want the script to have to care about which server is which, I’ve created one config file for the keepalived master instance and one for the slave. On the actual servers, a symlink points to the correct configuration for the instance.

By reloading the HAProxy service, existing sessions are not lost even though the configuration gets updated. As a bonus, in the Ansible service module, the reloaded state request also starts the service if it wasn’t started before.

    -   name: Reload HAProxy configuration
        service:
            name: haproxy
            state: reloaded

With way less than a day’s worth of work, a workflow has been introduced for the deployment process that is repeatable and that mitigates some of the risks involved in letting humans tamper with production systems.

Load balancing a Citrix StoreFront

This is well-documented by Citrix too; I just thought I’d write a few lines on a gotcha I happened upon:
Load balancing the Citrix StoreFront has two prerequisites:

  1. The StoreFront servers must see the load balancer VIP when looking up the service FQDN.
  2. The load balancers must be configured with session affinity, since it doesn’t seem as though StoreFront keeps track of user sessions between StoreFront servers.

In addition, Citrix recommends running a load balancing scheme based on least connections.

In HAProxy, a valid backend config looks something like this:


backend bk_citrixsf
    balance leastconn
    stick-table type ip size 200k expire 30m
    stick on src
    acl h_xff_exists req.hdr(X-Forwarded-For) -m found
    http-request add-header X-Forwarded-For %[src] unless h_xff_exists

    server ctxsf01 ....