Project Title Software to Extract Deleted Virtual Machine Files from a VMFS Volume
Project Start Date: 2016/06
Project Completion Date: 2016/08
Field of Science or Technology: Software Engineering and Technology
Purpose of work
To achieve technological advancement for the purpose of creating new or improving existing materials, devices, products or processes.
Experimental Development:
Overcoming Uncertainties, Work Performed, Achievements, and Outcome
Background:
We received a large multi-volume RAID storage array that utilized a Virtual Machine File System (VMFS) as the host file system. The file system contained files for multiple virtual machines (VMs) in the client’s environment. The array suffered a multiple drive failure, therefore the client and their IT consultants attempted to replace and rebuild the failed drives in the storage array. The overall storage system contained multiple independent RAID arrays and volumes, and drives from different arrays were placed in incorrect positions when rebuilt. Due to the extensive amount of damage resulting from the incorrect rebuild, the VMFS volumes would no longer mount.
Goal:
We attempted to recreate the original configuration in our labs by removing the drives that were known to be invalid and rebuilt those from parity. The result was a very damaged file system, but there was still a large amount of file system metadata and user data that was not damaged. There are very few companies that deal with highly damaged VMFS file systems, so any tools found on the internet were very basic and were only useful with undamaged or lightly damaged structures.
Our aim was to reverse engineer the VMFS metadata structures in order to see if it was possible to manually extract deleted/missing virtual machine (VM) files. The publicly available tools claimed to recover deleted files, but didn’t address file systems that were either reformatted or missing some of the original file metadata. If we were successful at extracting the deleted VM, we needed to see if it was possible to repair missing/corrupted structures since we highly suspected damage due to the incorrect RAID rebuild. If this wasn’t possible, we wanted to investigate if it was possible to create a utility to extract the virtual machine files and patch areas where the metadata traversed corrupted areas. With the lack of accurate technical documentation and an no publicly available utilities to recover highly damaged VMFS hosts, we were uncertain whether it would even be possible to reach our goal.
The overall (damaged) storage array submitted for recovery was suitable for our research purposes since it contained 3 different VMFS volumes, therefore we had a cross section of VMFS volumes of various sizes. This assisted us in the initial stages of reverse engineering the metadata and inode/pointer structures since the differing structure sizes provided additional valuable information.
Through detailed analysis of directory structures, inodes, secondary and tertiary indirect table files we were eventually able to gather enough details of the file system structure in order to navigate through the file structure and hence extract uncorrupted virtual machine (VMDK) files from the VMFS system.
During the examination of corrupted file structures, we noted a regular occurrence of invalid pointers within the indirect pointer files that we assumed were due to the incorrect RAID rebuild by the client. Some indirect pointers that should have been pointing to other pointer tables were either zeroed or contained illegal values. We devised strategies in our experimental utilities that would allow us to extract all valid data, while segregating the corrupt pointers and voiding these areas in the extracted virtual machine file.
In our recovery attempt, we repaired the damaged areas by filling them with legal values that pointed to a signature block that we created. Our hope was that this would essentially mark all invalid areas in the extracted VM files with an identifiable signature so that we would know that those areas corresponded to invalid data pointers. We attempted to mount the volume under Linux vmfs-fuse but it appeared that the modifications were not accepted and the volume would not mount. This was likely due to the severe damage in some other areas of the data and metadata and possibly the fact that many rebuilt pointers were pointing to the same location (this would normally not occur in an operational VMFS volume). Due to this failure, we then proceeded to modify our extraction utility and used it to extract the virtual machine. We had our utility disregard any duplicate pointers as well as some other issues in the volume structures.
The resulting extracted file had all of the retrievable VM data blocks that were possible with the metadata available. Due to data blocks that were corrupted during the rebuild though, corruption that couldn’t be detected earlier became apparent after extraction and examination of the VM file.
We were unable to successfully extract and rebuild a usable virtual machine file for our client since the combination of lost or bad data due to pointer corruption and data area corruption led to a larger amount of invalid areas than initially expected. However, we are able to extract deleted virtual machine files which was our main goal. We can reliably recover deleted files from a VMFS file system if the client submits their case without any further writing to the volume. With knowledge gained, we are also able to recover files from VMFS file systems with moderately damaged internal structures.