More eager discard behaviour

From: Chris Webb <chris@arachsys.com>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-bcachefs@vger.kernel.org
Subject: More eager discard behaviour
Date: Sat, 6 Nov 2021 17:11:56 +0000	[thread overview]
Message-ID: <20211106171156.GM11670@arachsys.com> (raw)

Discards issued to a loopback device punch holes in the underlying files, so I
thought they'd be an easy way to check (and maybe ktest) filesystem discard
behaviour. Here, I make a 1GB filesystem then repeatedly create and delete a
400MB file in it:

  # truncate -s 1G /tmp/fs
  # losetup /dev/loop0 /tmp/fs
  # bcachefs format -q --discard /dev/loop0
  initializing new filesystem
  going read-write
  mounted with opts: (null)
  # mkdir -p /tmp/mnt
  # mount -t bcachefs -o discard /dev/loop0 /tmp/mnt
  # while true; do
  >   sync && sleep 1 && du -h /tmp/fs
  >   dd if=/dev/zero of=/tmp/mnt/file bs=1M count=400 status=none
  >   sync && sleep 1 && du -h /tmp/fs
  >   rm /tmp/mnt/file
  > done
  1.7M  /tmp/fs
  404M  /tmp/fs
  403M  /tmp/fs
  806M  /tmp/fs
  806M  /tmp/fs
  992M  /tmp/fs
  993M  /tmp/fs
  992M  /tmp/fs
  992M  /tmp/fs
  992M  /tmp/fs
  [...]

Although bcachefs does issue discards (double-checked with printk in
discard_one_bucket), it only does so when the allocator thread wakes to
reclaim buckets once the entire block device is in use, so the practical
behaviour is that the whole device is kept full to the brim despite the
filesystem never being over 40% capacity. (With count=50, you can get the
same effect with an fs that never goes over 5% capacity.)

(Happy to roll the above into a ktest if it's useful, e.g. that capacity
never goes above x% with repeated deletes?)

The equivalent test with ext4 shows discard doing the expected thing:

  # truncate -s 1G /tmp/fs
  # losetup /dev/loop0 /tmp/fs
  # mkfs.ext4 -q /dev/loop0
  # mkdir -p /tmp/mnt
  # mount -t ext4 -o discard /dev/loop0 /tmp/mnt
  # while true; do
  >   sync && sleep 1 && du -h /tmp/fs
  >   dd if=/dev/zero of=/tmp/mnt/file bs=1M count=400 status=none
  >   sync && sleep 1 && du -h /tmp/fs
  >   rm /tmp/mnt/file
  > done
  33M /tmp/fs
  433M  /tmp/fs
  33M /tmp/fs
  433M  /tmp/fs
  33M /tmp/fs
  433M  /tmp/fs
  33M /tmp/fs
  433M  /tmp/fs
  33M /tmp/fs
  [...]

SSDs are happier TRIMmed, but discard is also invaluable for filesystems on
thin provisioning systems like dm-thin. (virtio-block can pass discards up
from guest to host, so this is a common VM configuration.)

How practical would it be either to more-greedily wake the allocator thread
and reclaim buckets, or to detect buckets available to discard earlier in
their lifetime?

Best wishes,

Chris.