Categories
Uncategorised

Moving SSD pain

The 2TB SSD in my computer got full, so time for the next size up (4TB).

This post is a note-to-self, to remind me of the minefield when it inevitably comes time to move from 4TB to 8TB…

I hoped Clonezilla would copy the old SSD to the new one, after which I could resize the partition, but it didn’t turn out to be this easy.

The first time I ran Clonezilla, it failed with “the disk has bad sectors”.

So I tried it again, with “-rescue” so a read error wouldn’t stop it.

That allowed Clonezilla to complete (1 hr 40 mins, ave 20GB per min), and I could open my LUKS encrypted volume (cryptsetup luksOpen), but the BTRFS inside it was broken: bad tree block start .. failed to read tree root. Bad superblock.

What to do?

First I thought I’d just enlarge and reformat the btrfs partition, then copy the data over using https://github.com/mwilck/btrfs-clone

I gave up after 6 hours. It uses btrfs send receive, and unfortunately that seems very slow. (btrfs send receive was still executing on the first volume, and I could see there were still others to go). I also have some areas set up not to copy on write, and I wondered whether these would be handled correctly.

So then I thought I’d use Clonezilla on the unencrypted volumes: run cryptsetup luksOpen for the source and target, then mount them.

A hiccup was I couldn’t mount the target, because it had a duplicate device fsid. And btrfstune couldn’t change this (because the filesystem is broken – see above). So I ran mkfs.btrfs, and then used btrfstune to change the fsid. (I could’ve changed the fsid on the source device instead, but I didn’t want to do that)

With this, I was optimistic that partclone.btrfs (on Clonezilla live) would “just work”.

It didn’t. It failed with “set block .. out of boundary”.

Others with this problem solved it with btrfs balance (https://sourceforge.net/p/clonezilla/discussion/Clonezilla_live/thread/d622f01a/?limit=25 , https://bbs.archlinux.org/viewtopic.php?id=237391), but I worried I wouldn’t have enough disk space for that, and adding extra (ie allowing btrfs to temporarily use the 2nd ssd) fell into the “too hard” basket. See also https://www.reddit.com/r/btrfs/comments/eqw81a/btrfs_balance_running_on_a_45gb_partition_for_the/

Instead I decided to use dd.

What value to use for the bs parameter? https://unix.stackexchange.com/a/584912/314086 suggests one should pay attention to the “minimum block erase size” (just for erase?). Also note for next time: “it’s generally beneficial to align partitions according to the erase block size”: https://www.phoronix.com/forums/forum/hardware/general-hardware/1030306-samsung-970-evo-nvme-ssd-benchmarks-on-ubuntu-linux

But I just set bs 1M. Then I got lots of Input/Output error (but around 300 MB/s iirc)

That’s no good, so I considered ddrescue, but got cold feet when I read reports of it taking “weeks” to run 🙁 https://superuser.com/questions/413650/is-there-any-way-to-speed-up-ddrescue

So back to dd: I used bs=1K. This was slower (around 100 MB/s), but hopefully less data loss (see https://superuser.com/questions/622541/what-does-dd-conv-sync-noerror-do ). (Since only 1K being written; I think some areas were now also error free; see also https://unix.stackexchange.com/questions/329986/does-dd-i-o-read-error-alway-indicate-hardware-failure )

When this finally completed (after 21,000 sec), it was time to:

  1. use gparted to expand the LUKS volume
  2. sudo cryptsetup luksOpen /dev/nvme0n1p12 cryptroot
  3. sudo cryptsetup resize cryptroot -v
  4. sudo mkdir /mnt/btrfs
  5. sudo mount /dev/mapper/cryptroot /mnt/btrfs
  6. sudo btrfs filesystem resize max /mnt/btrfs
  7. sudo btrfs filesystem show /mnt/btrfs

That done, time for the moment of truth.

In the Dell setup (I use refind, not grub), setup a new UEFI boot option (it populates the “file system list”; I just had to navigate to the “file name”: the refind .efi executable).

And ta dah! .. everything works 🙂

If Clonezilla hadn’t complained at the start about bad sectors, would that have been a viable way to clone? Quite possibly not, it still seems that its better to luksOpen then clone. What I do need to do is make sure that (1) the SSD is physically healthy (using SMART in the Dell bios, or Linux https://wiki.archlinux.org/index.php/S.M.A.R.T. ), and (2) the file system (btrfs) inside LUKS is good too. Time now to run btrfs balance?

Leave a Reply

Your email address will not be published. Required fields are marked *