All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: lsf-pc@lists.linux-foundation.org
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-btrfs@vger.kernel.org, linux-block@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@infradead.org>
Subject: [LSF/MM/BPF TOPIC] Dropping page cache of individual fs
Date: Tue, 16 Jan 2024 11:50:32 +0100	[thread overview]
Message-ID: <20240116-tagelang-zugnummer-349edd1b5792@brauner> (raw)

Hey,

I'm not sure this even needs a full LSFMM discussion but since I
currently don't have time to work on the patch I may as well submit it.

Gnome recently got awared 1M Euro by the Sovereign Tech Fund (STF). The
STF was created by the German government to fund public infrastructure:

"The Sovereign Tech Fund supports the development, improvement and
 maintenance of open digital infrastructure. Our goal is to sustainably
 strengthen the open source ecosystem. We focus on security, resilience,
 technological diversity, and the people behind the code." (cf. [1])

Gnome has proposed various specific projects including integrating
systemd-homed with Gnome. Systemd-homed provides various features and if
you're interested in details then you might find it useful to read [2].
It makes use of various new VFS and fs specific developments over the
last years.

One feature is encrypting the home directory via LUKS. An approriate
image or device must contain a GPT partition table. Currently there's
only one partition which is a LUKS2 volume. Inside that LUKS2 volume is
a Linux filesystem. Currently supported are btrfs (see [4] though),
ext4, and xfs.

The following issue isn't specific to systemd-homed. Gnome wants to be
able to support locking encrypted home directories. For example, when
the laptop is suspended. To do this the luksSuspend command can be used.

The luksSuspend call is nothing else than a device mapper ioctl to
suspend the block device and it's owning superblock/filesystem. Which in
turn is nothing but a freeze initiated from the block layer:

dm_suspend()
-> __dm_suspend()
   -> lock_fs()
      -> bdev_freeze()

So when we say luksSuspend we really mean block layer initiated freeze.
The overall goal or expectation of userspace is that after a luksSuspend
call all sensitive material has been evicted from relevant caches to
harden against various attacks. And luksSuspend does wipe the encryption
key and suspend the block device. However, the encryption key can still
be available clear-text in the page cache. To illustrate this problem
more simply:

truncate -s 500M /tmp/img
echo password | cryptsetup luksFormat /tmp/img --force-password
echo password | cryptsetup open /tmp/img test
mkfs.xfs /dev/mapper/test
mount /dev/mapper/test /mnt
echo "secrets" > /mnt/data
cryptsetup luksSuspend test
cat /mnt/data

This will still happily print the contents of /mnt/data even though the
block device and the owning filesystem are frozen because the data is
still in the page cache.

To my knowledge, the only current way to get the contents of /mnt/data
or the encryption key out of the page cache is via
/proc/sys/vm/drop_caches which is a big hammer.

My initial reaction is to give userspace an API to drop the page cache
of a specific filesystem which may have additional uses. I initially had
started drafting an ioctl() and then got swayed towards a
posix_fadvise() flag. I found out that this was already proposed a few
years ago but got rejected as it was suspected this might just be
someone toying around without a real world use-case. I think this here
might qualify as a real-world use-case.

This may at least help securing users with a regular dm-crypt setup
where dm-crypt is the top layer. Users that stack additional layers on
top of dm-crypt may still leak plaintext of course if they introduce
additional caching. But that's on them.

Of course other ideas welcome.

[1]: https://www.sovereigntechfund.de/en
[2]: https://systemd.io/HOME_DIRECTORY
[3]: https://lore.kernel.org/linux-btrfs/20230908-merklich-bebauen-11914a630db4@brauner/
[4]: A bdev_freeze() call ideally does the following:

     (1) Freeze the block device @bdev
     (2) Find the owning superblock of the block device @bdev and freeze the
         filesystem as well.

     Especially (2) wasn't true for a long time. Filesystems would only be
     able to freeze the filesystems on the main block device. For example, an
     xfs filesystem using an external log device would not be able to be
     frozen if the block layer request came via the external log device. This
     is fixed since v6.8 for all filesystems using appropriate holder
     operations.

     Except for btrfs where block device initiated freezes don't work at all;
     not even for the main block device. I've pointed this out months ago in [3].

     Which is why we currently can't use btrfs with LUKS2 encryption as as
     luksSuspend call will leave the filesystem unfrozen.
[5]: https://gitlab.com/cryptsetup/cryptsetup/-/issues/855
     https://gitlab.gnome.org/Teams/STF/homed/-/issues/23

             reply	other threads:[~2024-01-16 11:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-16 10:50 Christian Brauner [this message]
2024-01-16 11:45 ` [LSF/MM/BPF TOPIC] Dropping page cache of individual fs Jan Kara
2024-01-17 12:53   ` Christian Brauner
2024-01-17 14:35     ` Jan Kara
2024-01-17 14:52       ` Matthew Wilcox
2024-01-17 20:51         ` Phillip Susi
2024-01-17 20:58           ` Matthew Wilcox
2024-01-18 14:26         ` Christian Brauner
2024-01-30  0:13         ` Adrian Vovk
2024-02-15 13:57           ` Jan Kara
2024-02-15 19:46             ` Adrian Vovk
2024-02-15 23:17               ` Dave Chinner
2024-02-16  1:14                 ` Adrian Vovk
2024-02-16 20:38                   ` init_on_alloc digression: " John Hubbard
2024-02-16 21:11                     ` Adrian Vovk
2024-02-16 21:19                       ` John Hubbard
2024-01-16 15:25 ` James Bottomley
2024-01-16 15:40   ` Matthew Wilcox
2024-01-16 15:54     ` James Bottomley
2024-01-16 20:56 ` Dave Chinner
2024-01-17  6:17   ` Theodore Ts'o
2024-01-30  1:14     ` Adrian Vovk
2024-01-17 13:19   ` Christian Brauner
2024-01-17 22:26     ` Dave Chinner
2024-01-18 14:09       ` Christian Brauner
2024-02-05 17:39     ` Russell Haley
2024-02-17  4:04 ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240116-tagelang-zugnummer-349edd1b5792@brauner \
    --to=brauner@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.