Everybody wants to take a relaxing holiday from time to time, whether you are a veteran DBA standing watch over million dollar mission-critical database systems or a jack-of-all-trades who maintains all of your small organization’s SQL Servers, Windows Servers, and all of the other corporate IT assets.
Of course, there are those few individuals who like interruptions during a holiday because it makes them feel important. But the vast majority of us don’t want to be interrupted unless there is a serious issue that needs immediate attention. So, what are some steps we can take to help our customers make sure that their issue really is serious before they get us involved?
Sometimes It’s Technology … Except When It’s Not
There are two ways to prepare for a peaceful and uninterrupted holiday: technology and process. The former involves things we can do with our databases to make our lives easier while away from the office, while the latter are steps we can take in advanced to reduce the need for our coworkers to call us. So here we go. Be sure you’ve got these items checked off your list:
- Ensure there is plenty of room on the disks where all of your databases and transaction logs reside so that they can auto-grow as needed while you are out of the office (OOTO). This easily preventable problem crops up annoyingly often.
- Just because you have a backup of a database, doesn’t mean it’s fully recoverable. The only way you can be sure a backup is recoverable is to actually recover the darned thing. So, take the time to verify your backup jobs are fully recoverable well in advance of your holiday. Once completed, make sure there is ample room on the backup drive(s) for all of the backups that will occur while you are away.
- Review your automated jobs. Write a note for your coworkers explaining any special end-of-period (month, quarter, year) processes that might need special babysitting or validation and how to handle those situations. Review any SQL Agent jobs that you often have to manually deal with when they fail and further explain the likely outcomes and solutions for your coworkers.
- Make sure you’ve got database mail and SQL Agent alert notifications configured and working properly for high-severity error conditions. That way SQL Server can tell you when there’s a problem, rather than relying on the telephone to know when issues arise. When you define the SQL Agent operator who gets the emails, ensure that they go to a team inbox or list of people. That way, you have coworkers acting as your backup and can rotate who is on-call during a particular holiday. Tim Radney (b | t) has a great post about how to do this, including a useful, reusable T-SQL script on SQLPerformance.com.
- Ensure that no code releases or other system updates are planned during your outage. This includes software updates, hardware upgrades or security patches. Conversely, make sure everyone who relies on you during troubleshooting situations knows in advance that you will be OOTO. Some people think it’s okay to tell only their boss when they’re going on holiday. No. Just no.
- Set an email auto-reply indicating how long you’ll be OOTO and who to contact in the interim. This seems elementary, but not everyone does it.
Setting an email auto-reply is part of a good Escalation Process. An equally important best practice is to create a Disaster Recovery (DR) runbook, also known as a playbook. Your DR runbook is a set of step-by-step instructions for coworkers who might have to do an emergency recovery or other known critical fix, such as manually failover a server in an AG cluster. If you don’t have a DR runbook, write a quick summary of “things to do first” while you’re away—your own personal FAQ if you will. For example, if you have an application that has occasional debilitating blocking chains, you might write up the steps needed to kill a blocking SPID for your coworkers to use in an emergency. Make sure you have an escalation process in place for who to call when you’re out. If you’re in a big organization, you might even need to list multiple possibilities, using <IF…THEN> situations, such as IF hardware CALL jim_b, IF time_billing CALL evelyn, and so on.
Naturally, these steps can’t prevent a problem from happening. But they can greatly reduce the number of common and easily fixed problems that might make it past your coworkers and end up interrupting your holiday. And if you haven’t already taken these steps in the past, they add value to your organization’s IT best practices all year round. It’s the gift that keeps on giving!
Finally, if you haven’t already set up a monitoring system now would be a great time to do so. Collect data while you are gone so you can see everything that happened and how it impacted performance while you were away. You can even setup alerts that will notify you or your team of anything critical while you are out of the office.
This content originally appeared on Kevin’s blog on Sentryone.