F5 HTTP Response-Code Alerts

503

UPDATE: Added more logic to remove reporting anomalies from the F5 device (inaccurate response code values). The IF statement before the email pipe will cause the report to not send if the value is empty or the top server responded with less than 15% of the total response codes of that type.

I love the F5 analytics feature, specifically for HTTP. The cool thing about the feature, is that you can get a very good idea of an issue based on response code behavior.

Let’s say you have a URL that is monitored by an external HTTP tool. This tool checks the URL to make sure it gets a 200 HTTP response-code back every 60 seconds. I just got an email that the check failed, but this URL has 10 nodes behind it. How do I know which node is failing? 

Well, you can manually log into the F5 and check through the UI, use the Rest API or some other utility that ties into the F5 API.

  • Manually checking is not ideal, we want automation.
  • The Rest API is great, but why invest a ton of time getting that working, if this is our only goal?

My preference would be to add this to the Analytics Profile on the F5, so any administrator could see it and know what’s happening. Unfortunately, F5 does not alert on HTTP response-code in the Analytics Profile (as of 12.1).

Hopefully F5 does this at some point, but until then, I’m going to show you how to do this with a bash script on the F5 device itself.

Prerequisites

  1. Determine which response code do you want to check for
  2. Determine how often you want to check for potential issues
  3. Make sure you have an email server that the F5 can forward the message through.
  4. Make sure you have access to the F5 advanced shell

The Script

I’ve done most of the heavy lifting for you. I intentionally defined any part that I thought may need to be customized based on usage as variables, change them to suit your needs.

Place it wherever you want, probably somewhere under /root is best so someone could find it easily. Before you schedule the script, make sure you chmod 755 it so it can be executed (chmod 755 /root/script.sh).

#!/bin/bash

#=============================
#** Define your environment **
#** specific variables here **
#=============================

#Which response code do you want to look for?
#Add a trailing space to prevent any possible time matches 
#(the year is followed by : in the output)
code="302 "

#How many minutes in the past do you want your script to look?
#Match this up with cron
howoften="30"

#Topx determines when an email is triggered. If you are implementing this 
#in a dev environment with not much going on, you may want to limit your 
#response code trigger to when it is in the top 4 of all response codes. 
#If you are running in a prod environment, maybe top 7 is fine. 
#If you are getting too many emails, make the number lower. If you want to 
#make sure things are working, make the number higher.
topx="8"

#Email configuration
smtpserver="192.168.10.1"
sender="f5script@email.com"
receiver="me@email.com"
subject="$code Error Notification"


#-----------------------------

# Determine number of response codes in the past x minutes. 
# Typically if response codes are in the topx response-code 
# types, something we care about is happening. 

numcode=(`tmsh show analytics report view-by response-code limit ${topx} range now-${howoften}m | grep $code | awk '{print $3}'`)

# Start our logic to send alerts
# Change the number to insert more granularity than your alerting
if [[ "$numcode" -gt 11000 ]]; then
 # What is the hostname of this F5 for the email
 hostname=`uname -n`
 
 # This will break down the top culprit of the
 # queried response code into three values in an array
 # IP|PORT|CODE
 badguy=(`tmsh show analytics http report view-by pool-member drilldown { { entity response-code values { ${code} } } } limit 1 range now-${howoften}m| tail -n+8 | sed 's/:/ /;s/ | / /'`)
 
 badguysname=`nslookup ${badguy[0]} | awk -F "name = " '{print $2}' | sed '/^\s*$/d'`
 if [[ -n "$badguy" && "$((${badguy[2]}*100/$numcode))" -gt 15 ]]; then
 ( echo "${code}Response Code Report";
 echo "------------------------------";
 echo "";
 echo "Overall, there have been *$numcode*, $code messages in the past $howoften minutes for all VIPs running on $hostname.";
 echo "";
 echo "The top offender of these messages is ${badguy[0]} on port ${badguy[1]}.";
 echo "";
 echo "This server has generated $((${badguy[2]}*100/$numcode))% of the total ${code}messages in this time window.";
 echo "";
 echo "nslookup tells us the bad server is: $badguysname ";
 echo "";
 echo "Total $code : $numcode"
 echo "Node $code : ${badguy[2]}"
 echo "Bad Node : $badguysname"; ) | mailx -S smtp="$smtpserver" -r "$sender" -s "$subject" -v "$receiver" 
 fi
else
 echo "${code}response codes are not in the top $topx overall for this device. Script ended, no email sent."
fi

The email portion is a bit clunky, but with the version of software I was running, I wasn’t able to send an HTML email. If you figure that out, or want to edit the email content, go for it.

Scheduling the Script

Now we need to schedule the script. Use crontab -e to schedule your script. Read this article from F5 for some tips if you need more info. Make sure you match up the timing of your cron task with the script defined time window.

Make sure you document that you have done this!! If you leave your job and a new administrator comes in, they would have to do some digging to figure out what you have done.

I would recommend testing this by setting your response code to 200, if you get emails based on the schedule you set, change to what you want to look for.

F5 HTTP Response-Code Alerts