Recovering from Catastrophic Disk System Failure

Sirius helps a client recover from a simultaneous double-disk failure.

The Client

A college with 240 academic programs serving about 6,000 students.

The Challenge

The client experienced a simultaneous double-disk failure on a disk-storage appliance, which was used for the database and primary storage pools of their IBM Spectrum® Protect backup server. The drive failures caused the client’s server application to crash, corrupted their database, and caused complete data loss of their storage pools. Coincidentally, the client was in the process of procuring a new tapeless data-protection solution at the time of the failure. They had concerns about whether their aged legacy backup solution was recoverable from off-site tape copies, as well as the duration of the recovery process. What’s more, the storage pools affected in Spectrum Protect contained data with mandatory retention requirements.

The Solution

After restoring the server database from the most recent full database backup on LTO-4 tape media, Sirius brought the client’s server application back online. The next challenge Sirius faced was recovering the primary storage pools. The reality sank in that the client would require a tape recall from their off-site tape vault in order to recover the data. At a minimum, they would also need enough disk storage to restore the data from tape back into the respective Spectrum Protect disk pools. In total, 50 LTO-4 tapes were recalled and required for recovery. After another disk storage appliance was identified as a replacement for the failed unit, the disk storage was allocated to the Spectrum Protect server and formatted as file systems on the operating system. The recovery took only one week, despite tape library gripper calibration issues due to earthquakes and tape media corruption.

The Results

  • 50 LTO-4 tape volumes successfully recalled
  • 18TB of data restored (99.99%)
  • Recovery in one week

The Benefits

Recovery from tape storage can take weeks depending on the amount of data, the number of volumes, the number of tape drives, the number of tape slots in a tape library, and the time it takes to retrieve the volumes from off-site vaulting (if applicable). Sirius storage experts were able to recover the entire environment in one week, recalling 50 volumes from an off-site tape vault. The recovery was successful, with 99.99% of the data restored to the storage pools (some corruption is not uncommon with tape media, and an acceptable risk compared to the extraordinary efficiency and low cost of tape storage solutions). Upon completing the server application database restore, the application was initialized and brought back online by Sirius.

The client’s simultaneous double-disk failure underscored the importance of a modern disaster recovery solution. The failure provided the client with the justification it needed to invest in a disk-based DR solution. The client engaged Sirius to design and implement a new tapeless data protection solution based on an IBM POWER9 processor-based Power Systems server to replace an aged Windows blade server, as well as a Dell EMC Data Domain to replace their legacy tape library. With a disk-only solution leveraging asynchronous replication, recovery occurs in minutes to hours, with no need for data handling or retrieval. The data is online and available for access at the DR site as soon as the replication context is synchronized.

The solution leverages container pools, cloud container pools and file-device pools where appropriate. It will be a much more efficient and high-performing solution, maximizing disk storage with inline data deduplication and LZ4 compression. The management of tape media (check-ins and check-outs) and tape media processes (mounts, dismounts, migrations, backup of storage pools, etc.) will be eliminated. They will also integrate replication with Data Domain or Spectrum Protect, depending on whether the storage pool is a container pool or file-device pool.

Download PDF
CONTACT US
VIEW MORE CASE STUDIES

ABOUT SIRIUS

Sirius’ data protection solutions are designed to protect you from hardware or software failure, corruption, malware and disasters. Sirius has a vendor-agnostic approach with expertise spanning all leading and emerging data protection technologies. This autonomy and expertise allows Sirius to design and implement solutions while also supporting existing solutions. Call 800-460-1237 today to schedule a conversation about your needs.