[LSF/MM/BPF TOPIC] Cloud storage optimizations

* [LSF/MM/BPF TOPIC] Cloud storage optimizations
@ 2023-03-01  3:52 Theodore Ts'o
  2023-03-01  4:18 ` Gao Xiang
                   ` (6 more replies)
  0 siblings, 7 replies; 67+ messages in thread
From: Theodore Ts'o @ 2023-03-01  3:52 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-mm, linux-block

Emulated block devices offered by cloud VM’s can provide functionality
to guest kernels and applications that traditionally have not been
available to users of consumer-grade HDD and SSD’s.  For example,
today it’s possible to create a block device in Google’s Persistent
Disk with a 16k physical sector size, which promises that aligned 16k
writes will be atomically.  With NVMe, it is possible for a storage
device to promise this without requiring read-modify-write updates for
sub-16k writes.  All that is necessary are some changes in the block
layer so that the kernel does not inadvertently tear a write request
when splitting a bio because it is too large (perhaps because it got
merged with some other request, and then it gets split at an
inconvenient boundary).

There are also more interesting, advanced optimizations that might be
possible.  For example, Jens had observed the passing hints that
journaling writes (either from file systems or databases) could be
potentially useful.  Unfortunately most common storage devices have
not supported write hints, and support for write hints were ripped out
last year.  That can be easily reversed, but there are some other
interesting related subjects that are very much suited for LSF/MM.

For example, most cloud storage devices are doing read-ahead to try to
anticipate read requests from the VM.  This can interfere with the
read-ahead being done by the guest kernel.  So being able to tell
cloud storage device whether a particular read request is stemming
from a read-ahead or not.  At the moment, as Matthew Wilcox has
pointed out, we currently use the read-ahead code path for synchronous
buffered reads.  So plumbing this information so it can passed through
multiple levels of the mm, fs, and block layers will probably be
needed.

^ permalink raw reply	[flat|nested] 67+ messages in thread