linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Thumshirn <jth@kernel.org>
To: David Sterba <dsterba@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org,
	Eric Biggers <ebiggers@google.com>,
	Richard Weinberger <richard@nod.at>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Jonathan Corbet <corbet@lwn.net>
Subject: [PATCH v3 3/3] btrfs: document btrfs authentication
Date: Thu, 14 May 2020 11:24:15 +0200	[thread overview]
Message-ID: <20200514092415.5389-4-jth@kernel.org> (raw)
In-Reply-To: <20200514092415.5389-1-jth@kernel.org>

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Document the design, guarantees and limitations of an authenticated BTRFS
file-system.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 .../filesystems/btrfs-authentication.rst      | 168 ++++++++++++++++++
 1 file changed, 168 insertions(+)
 create mode 100644 Documentation/filesystems/btrfs-authentication.rst

diff --git a/Documentation/filesystems/btrfs-authentication.rst b/Documentation/filesystems/btrfs-authentication.rst
new file mode 100644
index 000000000000..f13cab248fc0
--- /dev/null
+++ b/Documentation/filesystems/btrfs-authentication.rst
@@ -0,0 +1,168 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+:orphan:
+
+.. BTRFS Authentication
+.. Western Digital or it's affiliates
+.. 2020
+
+Introduction
+============
+
+This document describes an approach get file contents _and_ full meta-data
+authentication for BTRFS.
+
+This is possible because BTRFS uses checksums embedded in its on-disk
+meta-data structures as well as checksums for each individual extent of data.
+The primary intent of these checksums was to detect bit-flips of data at rest.
+But this mechanism can be extended to provide authentication of all on-disk
+data when the checksum algorithm is replaced by a cryptographically secure
+keyed hash.
+
+BTRFS Data Structures
+---------------------
+
+BTRFS utilizes a special copy-on-write b-tree to store all contents of the
+file-system on disk.
+
+Meta-Data
+~~~~~~~~~
+
+On-disk meta-data in BTRFS is stored in copy-on-write b-trees. These b-trees
+are build using two data structures, ``struct btrfs_node`` and ``struct
+btrfs_leaf``. Both of these structures start with a ``struct btrfs_header``.
+This structure has amongst other fields a checksum field, protecting it's
+contents. As the checksum is the first entry in the structure, the whole
+structure is protected by this checksum. The superblock (``struct
+btrfs_super_block``) is the first on-disk structure which is read on mount and
+it as well starts with a checksum field protecting the rest of the structure.
+The super block is also needed to read the addresses of the other file system
+b-trees, so their location on disk is protected by the checksum.
+
+::
+
+          BTRFS Header
+          +------+------+--------+-------+-----------------+----+-------+---------+
+          | csum | fsid | bytenr | flags | chunk_tree_uuid | gen| owner | nritems |
+          +------+------+--------+-------+-----------------+----+-------+---------+
+          BTRFS Node
+          +--------+-------------+-------------+-----+
+          | Header | key pointer | key pointer | ... |
+          +--------+-------------+-------------+-----+
+          BTRFS Leaf
+          +--------+------+------+-----+
+          | Header | item | item | ... |
+          +--------+------+------+-----+
+
+            Figure 1: BTRFS Header, Node and Leaf data structures
+
+User-Data
+~~~~~~~~~
+
+User data in BRTFS is also protected by checksums, but this checksum is not
+stored alongside the data, as it is with meta-data, but stored in a separate
+b-tree, the checksum tree. The leafs of this tree store the checksums of the
+user-data.  The tree nodes and leafs are of ``struct btrfs_node`` or ``struct
+btrfs_leaf``, so integrity of this tree is protected as well.
+
+BTRFS Authentication
+====================
+
+This chapter introduces BTRFS authentication which enables BTRFS to verify
+the authenticity and integrity of metadata and file contents stored on disk.
+
+Threat Model
+------------
+
+BTRFS authentication enables detection of offline data modification. While it
+does not prevent it, it enables (trusted) code to check the integrity and
+authenticity of on-disk file contents and filesystem metadata. This covers
+attacks where file contents are swapped.
+
+BTRFS authentication will not protect against a rollback of full disk
+contents. Ie. an attacker can still dump the disk and restore it at a later
+time without detection. It will also not protect against a rollback of one
+transaction. That means an attacker is able to partially undo changes. This is
+possible, because BTRFS does not immediately overwrite obsolete versions of
+its meta-data but keeps older generations until they get garbage collected.
+
+BTRFS authentication does not cover attacks where an attacker is able to
+execute code on the device after the authentication key was provided.
+Additional measures like secure boot and trusted boot have to be taken to
+ensure that only trusted code is executed on a device.
+
+As the file-system authentication key is also needed to update data structures
+on disk, the key has to be in the kernel's keyring for the whole time the
+file-system is mounted. An attacker that is able to compromise the kernel can
+be able to extract the key from the kernel's keyring and thus can gain the
+ability to modify the file-system later on.
+
+Authentication
+--------------
+
+To be able to fully trust data read from disk, all BTRFS data structures
+stored on disk are authenticated. That is:
+
+- The super blocks
+- The file-system b-trees
+- The user-data
+
+
+Super-block
+~~~~~~~~~~~
+
+In order to be able to authenticate the file-system's super-block, the
+checksum stored in the checksum field at the beginning of ``struct
+btrfs_super_block`` protecting its contents is replaced by a
+cryptographically secure keyed hash. In order to generate a valid super-block
+or to validate the super-block, one has to provide a key as an additional
+input for the hash function. The super-block is the starting point to read all
+on disk tree structures, so if we cannot trust the authenticity of the
+super-block anymore, we cannot trust the whole file-system.
+
+B-Trees
+~~~~~~~
+
+Starting from the super-block's root-tree root, the root tree holds the b-tree
+roots of all other on disk b-trees. All other file-system meta-data can be
+derived from the trees stored in this tree. As all b-trees in BTRFS are built
+using ``struct btrfs_node`` and ``struct btrfs_leaf`` each building block of
+each tree is checksummed. These checksums are replaced with the cryptographically
+secure keyed hash algorithm and the authentication key used to verify the
+super-block in the mount phase. Without this key it is impossible to alter any
+of the file-system structure without generating invalid hashes.
+
+User-data
+~~~~~~~~~
+
+The checksums for the user or file-data are stored in a separate b-tree, the
+checksum tree. As this tree in itself is authenticated, only the data stored
+in it needs to be authenticated. This is done by replacing the checksums
+stored on disk by the cryptographically secure keyed hash algorithm used for
+the super-block and other meta-data. So each written file block will get
+checksummed with the authentication key and without supplying the correct key
+it is impossible to write data on disk, which can be read back without
+failing the authentication test. If this test is failed, an I/O error is
+reported back to the user.
+
+Key Management
+--------------
+
+For simplicity, BTRFS authentication uses a single key to compute the keyed
+hashes of the super-block, b-tree nodes and leafs as well as file-blocks. This
+key has to be available on creation of the file-system (`mkfs.btrfs`) to
+authenticate all b-tree elements and the super-blocks. Further, it has to be
+available on mount of the file-system to verify the meta-data and user-data
+stored in the file-system.
+
+Limitations
+-----------
+
+As some optional features of BTRFS disable the generation of checksums, these
+features are incompatible with an authenticated BTRFS.
+These features are:
+- nodatacow
+- nodatasum
+
+As well as any offline modifications to the file-system, like setting an FS
+label while the FS is unmounted.
-- 
2.26.1


  parent reply	other threads:[~2020-05-14  9:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14  9:24 [PATCH v3 0/3] Add file-system authentication to BTRFS Johannes Thumshirn
2020-05-14  9:24 ` [PATCH v3 1/3] btrfs: rename btrfs_parse_device_options back to btrfs_parse_early_options Johannes Thumshirn
2020-05-14  9:24 ` [PATCH v3 2/3] btrfs: add authentication support Johannes Thumshirn
2020-05-27 13:24   ` David Sterba
2020-05-27 13:54     ` Johannes Thumshirn
2020-05-27 14:01       ` Johannes Thumshirn
2020-05-27 18:04     ` Johannes Thumshirn
2020-06-01 14:30       ` David Sterba
2020-06-01 14:35       ` David Sterba
2020-05-14  9:24 ` Johannes Thumshirn [this message]
2020-05-14 12:26   ` [PATCH v3 3/3] btrfs: document btrfs authentication Jonathan Corbet
2020-05-14 14:54     ` Johannes Thumshirn
2020-05-14 15:14       ` Richard Weinberger
2020-05-14 16:00         ` Jonathan Corbet
2020-05-14 16:05           ` Richard Weinberger
2020-05-24 19:55   ` David Sterba
2020-05-25 10:57     ` Johannes Thumshirn
2020-05-25 11:26       ` David Sterba
2020-05-25 11:44         ` Johannes Thumshirn
2020-05-25 13:10 ` [PATCH v3 0/3] Add file-system authentication to BTRFS David Sterba
2020-05-26  7:50   ` Johannes Thumshirn
2020-05-26 11:53     ` David Sterba
2020-05-26 12:44       ` Johannes Thumshirn
2020-06-01 14:59         ` David Sterba
2020-05-27  2:08 ` Qu Wenruo
2020-05-27 11:27   ` David Sterba
2020-05-27 11:58     ` Qu Wenruo
2020-05-27 13:11   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200514092415.5389-4-jth@kernel.org \
    --to=jth@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dsterba@suse.cz \
    --cc=ebiggers@google.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).