[dm-crypt] LUKS header recovery attempt from apparently healthy SSD

Discussion:

protagonist

2017-04-21 14:26:30 UTC

Hello all,
someone found his way into our local hackerspace looking for help and
advice with recovering his OS partition from a LUKS-encrypted INTEL SSD
(SSDSC2CT240A4), and I've decided to get onto the case. Obviously,
there is no backup, and he's aware of the consequences of this basic
mistake by now.

The disk refused to unlock on boot in the original machine from one day
to the other. Opening it form any other of several machines with
different versions of Ubuntu/Debian, including Debian Stretch with a
recent version of cryptsetup have been completely unsuccessful,
indicating a MK digest mismatch and therefore "wrong password". The
password is fairly simple and contains no special characters or
locale-sensitive characters and had been written down. Therefore I
assume it is known correctly and the header must be partially faulty.

After reading the header specification, the FAQs, relevant recovery
threads on here as well as going through the header with a hex editor
and deducing some of it's contents by hand, it is obvious to me that
losing any significant portion (more than a few bytes) of the relevant
LUKS header sections, either the critical parts of the meta-area or the
actual key slot, would make the device contents provably irrecoverable,
as even brute forcing becomes exponentially hard with the number of
missing pseudo-randomly distributed bits.

Normally, one would move directly to grief stage number five -
"Acceptance" - if the storage device in question was known to have data
loss.

However, upon closer inspection, I can detect no obvious signs of
multiple-byte data loss. There had been no intentional changes to the
LUKS header, linux system upgrade or any other (known) relevant event to
the system between it booting one day and refusing to unlock the day
after. I realize that for *some* reasoning related to anti-forensics,
the LUKS header specification contains no checksum over actual raw byte
fields at all, making it very hard to detect the presence of minor
defects in the header or providing any help in pinpointing their location.

Looking for major defects with the keyslot_checker reveals no obvious
problems:

parameters (commandline and LUKS header):
sector size: 512
threshold: 0.900000

- processing keyslot 0: keyslot not in use
- processing keyslot 1: start: 0x040000 end: 0x07e800
- processing keyslot 2: keyslot not in use
- processing keyslot 3: keyslot not in use
- processing keyslot 4: keyslot not in use
- processing keyslot 5: keyslot not in use
- processing keyslot 6: keyslot not in use
- processing keyslot 7: keyslot not in use

this is also the case if we increase the desired entropy to -t 0.935:

parameters (commandline and LUKS header):
sector size: 512
threshold: 0.935000

- processing keyslot 0: keyslot not in use
- processing keyslot 1: start: 0x040000 end: 0x07e800
- processing keyslot 2: keyslot not in use
[...]

Going through the sectors reported with -v at a higher -t value, I'm
unable to find any suspicious groupings, for example unusual numbers of
00 00 or FF FF. Multi-byte substitution with a non-randomized pattern
seems unlikely.

------------------

The luksDump header information looks sane as well. The encryption had
been created by the Mint 17.1 installation in the second half of 2014 on
a fairly weak laptop and it's password later changed to a better one,
which accounts for the use of keyslot #1 and fairly low iteration counts.

LUKS header information for /dev/sda5

Version: 1
Cipher name: aes
Cipher mode: xts-plain64
Hash spec: sha1
Payload offset: 4096
MK bits: 512
MK digest: ff 5c 64 48 bc 1f b2 f2 66 23 d3 66 38 41 c9 60 8a 7e
de 0a
MK salt: 04 e3 04 8c 51 fd 07 ee d1 f3 4a 5e c1 8c b9 88
ab 0d cf dc 55 7c fa bc ca 1a b7 02 5a 55 ac 2c
MK iterations: 35125
UUID: 24e05704-f8ed-4391-9a3d-a59330a919d2

Key Slot 0: DISABLED
Key Slot 1: ENABLED
Iterations: 144306
Salt: b8 6f 20 a7 fe 8b 6a 9a 21 58 92 13 ce 1a 43 12 9c
4e a0 bf 7c 51 5e a1 78 47 05 ca b6 32 da a4
Key material offset: 512
AF stripes: 4000
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

The disabled key slot #0 salt is correctly filled up with nulls, making
it unusable for any recovery attempt. All magic bytes of the key slots,
including 2 to 7 look good. The uuid is "version: 4 (random data based)"
according to uuid -d output and therefore not of much help.
------------------

smartctl indicates fairly standard use for a 240GB desktop ssd, with
about ~3.7TB written at 2650h runtime, 1 reallocated sector and 0
"Reported Uncorrectable Errors". The firmware version 335u seems to be
the latest available, from what I've read. Smartctl tests with "-t
short", "-t offline" and "-t long" test show no errors:
# 1 Extended offline Completed without error 00% 2648
-
# 2 Offline Completed without error 00% 2646
-
# 3 Short offline Completed without error 00% 2572
-
The device also shows no issues during idle or read states hinting at
physical problems.

Checksumming the 240GB of data read blockwise from the device by dd with
sha512sum lead to identical results on three runs, so the device isn't
mixing sectors or lying about their content in a different fashion
differently each time we ask for data.

All in all, the failure mode is still a mystery to me. I can think of
mainly three explanations:

I. silent data corruption events that have gone undetected by the
SSD-internal sector-wide checksumming, namely bit/byte level changes on
* MK salt / digest
* key slot #1 iterations count / salt
* key slot #1 AF stripe data

II. actual passphrase mistakes
* "constant" mistake or layout mismatch
This seems quite unlikely, as none of the characters change between a US
layout and the DE layout that was used. There are also no characters
that can be easily confused such as O/0.

III. some failure I've overlooked, like an OS-level bug or devilish
malware causing "intentional" writes to the first 2M of the drive.

Failure case #I is still the most likely, but from my understanding, a
four-digit number of system bootups and associated read events over the
lifetime of the header shouldn't be able to cause any kind of flash
wearout, let alone silent data corruption, unless the firmware is broken
in a subtle way. Assuming it is - what to do besides bruteforcing the
AF section for bit flips?

I would be delighted about any advice or idea for further tests to
narrow down whatever happened to this header.
Regards,
protagonist

David Christensen

2017-04-21 23:25:08 UTC