Fix Time Machine Sparsebundle Errors

[This is an extract from Fix Time Machine Sparsebundle NAS Based Backup Errors by Garth Gillespie]

This is work in progress! Not all steps are verified yet!

Maybe one reference in the comments to the article denoted above is worth reading: http://sansumbrella.com/writing/2012/the-reluctant-sysadmin-nas-time-machine/¹⁾
Inspired by his basic Idea, that the problem might be related to ext2/3/4 file systems limitations, I changed to XFS for the underlying file system and got a stable backup.

Just realized that the source recommends to use raw disk devices, which means to use /dev/rdisk… instead of /dev/disk…

Connect your Backup Volume²⁾
I recommend to disable your backups³⁾ as successive tries seem to interfere with the recovery process. Furthermore, it might be a good idea to keep the backup volume open by displaying it in the finder and to make sure that your network connection is stable (provided your backup volume is connected via your network one way or the other)
Start a shell (Terminal Window) with root permissions and recursively clear the user immutable flag:
```
~ # chflags -v -R nouchg /Volumes/Backup_Volume/MyHostname_YYYY-MM-DD-HHMMSS.sparsebundle
```
attach the sparsebundle as volume without mounting⁴⁾
```
~ # hdiutil attach -nomount /Volumes/Backup_Volume/MyHostname_YYYY-MM-DD-HHMMSS.sparsebundle
/dev/disk1              Apple_partition_scheme
/dev/disk1s1            Apple_partition_map
/dev/disk1s2            Apple_HFSX
~ #
```
Use the partiion listed as Apple_HFSX in the following steps (i.e. replace /dev/disk1s2 by your partition). This will implicitly start a file system check. As checked with ps, the actual command run seems to be
/System/Library/Filesystems/hfs.fs/Contents/Resources/../../../../../../sbin/fsck_hfs -y /dev/disk1s2

Monitor the progress of the file system check

~ # tail -f /var/log/fsck_hfs.log
/dev/rdisk1s2: ** /dev/rdisk1s2
/dev/rdisk1s2:    Executing fsck_hfs (version diskdev_cmds-491.6~3).
** Checking Journaled HFS Plus volume.
** Detected a case-sensitive volume.
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking multi-linked directories.
** Checking volume bitmap.
   Volume bitmap needs minor repair for orphaned blocks
** Checking volume information.
   Invalid volume file count
   (It should be 3180308 instead of 3355480)
   Invalid volume directory count
   (It should be 356386 instead of 356055)
   Invalid volume free block count
   (It should be 15844066 instead of 16637291)
   Volume header needs minor repair
(2, 0)
/dev/rdisk1s2: ** Repairing volume.
** Rechecking volume.
** Checking Journaled HFS Plus volume.
** Detected a case-sensitive volume.
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking multi-linked directories.
** Checking volume bitmap.
** Checking volume information.
** The volume Time Machine-Backups was repaired successfully.

(If something goes wrong and you have to start over, you need to clear the user immutable flag again.)
Most failures at this point were due to accidental unmount of the underlying /Volumes/Backup_Volume. I believe the unmounts were caused by network errors while using WIFI.

Therefore I recommend cabled network during the recovery process.

If this does not finish successfully or did a “QUICKCHECK ONLY”, run disk repair again⁵⁾:

~ # fsck_hfs -drfy /dev/rdisk1s2
** /dev/rdisk1s2
        Using cacheBlockSize=32K cacheTotalBlock=8192 cacheSize=262144K.
   Executing fsck_hfs (version diskdev_cmds-491.6~3).
        Journal replayed successfully or journal was empty
** Checking Journaled HFS Plus volume.
** Detected a case-sensitive volume.
** Checking extents overflow file.
** Checking catalog file.
** Rebuilding catalog B-tree.
hfs_UNswap_BTNode: invalid node height (1)
** Rechecking volume.
** Checking Journaled HFS Plus volume.
** Detected a case-sensitive volume.
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking multi-linked directories.
        privdir_valence=24850, calc_dirlinks=101703, calc_dirinode=24850
** Checking volume bitmap.
** Checking volume information.
** The volume Time Machine-Backups was repaired successfully.
~ #

Finally, detach the file system
```
~ # hdiutil detach /dev/disk1s2
```
When complete, you need to edit an plist file within⁶⁾ the sparsebundle that records the state of the backup. Within the sparsebundle open the file com.apple.TimeMachine.MachineID.plist with the Plist Editor. Remove the Entry

RecoveryBackupDeclinedDate Date DD.MM.YYYY HH:MM:SS
Finally you want to change

VerificationState Number 2

to

VerificationState Number 0

Now you can eject the network share⁷⁾ and have Time Machine give it another go.

Don't forget to enable backups again ⁸⁾.

After the (long) verification step, backups should proceed once again.

OSX, backup

OSX backup

¹⁾

Basically he suggests to create a sparsebundle on your own first, give it a custom band size:

# creates a sparsebundle disk image with a 128MB band size
MACHINE_NAME=your-machine-name
echo $MACHINE_NAME
hdiutil create -size 900g -type SPARSEBUNDLE -nospotlight -volname "Backup of $MACHINE_NAME" -fs "Case-sensitive Journaled HFS+" -imagekey sparse-band-size=262144 -verbose ./$MACHINE_NAME.sparsebundle

²⁾

Might be done via the Finder or by
mkdir /Volumes/Backup_Volume
mount_afp -i afp://<user>@<host>/Backup_Volume /Volumes/Backup_Volume

³⁾

Either in the Time Machine system preferences dialog or by tmutil disable

⁴⁾

it's been Apple_HFS instead of Apple_HFSX in my case

⁵⁾

The Disk Utility does not use -r which means to rebuild the catalog tree. Try again without it if you run into an “Disk Full” error during rebuilding the catalog B-tree

⁶⁾

use rightclick → open package contents, not doubleclick

⁷⁾

Use hdiutil eject /Volumes/Backup_Volume if you prefer the command line

⁸⁾

Might be done via tmutil enable at the command line