linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Larsson <alexl@redhat.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, gscrivan@redhat.com,
	david@fromorbit.com, brauner@kernel.org, viro@zeniv.linux.org.uk,
	Alexander Larsson <alexl@redhat.com>,
	linux-doc@vger.kernel.org
Subject: [PATCH v3 5/6] composefs: Add documentation
Date: Fri, 20 Jan 2023 16:23:33 +0100	[thread overview]
Message-ID: <20baca7da01c285b2a77c815c9d4b3080ce4b279.1674227308.git.alexl@redhat.com> (raw)
In-Reply-To: <cover.1674227308.git.alexl@redhat.com>

Add documentation about the composefs filesystem and how to use it.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
---
 Documentation/filesystems/composefs.rst | 159 ++++++++++++++++++++++++
 Documentation/filesystems/index.rst     |   1 +
 2 files changed, 160 insertions(+)
 create mode 100644 Documentation/filesystems/composefs.rst

diff --git a/Documentation/filesystems/composefs.rst b/Documentation/filesystems/composefs.rst
new file mode 100644
index 000000000000..f270a66f4204
--- /dev/null
+++ b/Documentation/filesystems/composefs.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Composefs Filesystem
+====================
+
+Introduction
+============
+
+Composefs is a read-only file system that is backed by regular files
+(rather than a block device). It is designed to help easily share
+content between different directory trees, such as container images in
+a local store or ostree checkouts. In addition it also has support for
+integrity validation of file content and directory metadata, in an
+efficient way (using fs-verity).
+
+The filesystem mount source is a binary blob called the descriptor. It
+contains all the inode and directory entry data for the entire
+filesystem. However, instead of storing the file content each regular
+file inode stores a relative path name, and the filesystem gets the
+file content from the filesystem by looking up that filename in a set
+of base directories.
+
+Given such a descriptor called "image.cfs" and a directory with files
+called "/dir" you can mount it like::
+
+  mount -t composefs image.cfs -o basedir=/dir /mnt
+
+Content sharing
+===============
+
+Suppose you have a single basedir where the files are content
+addressed (i.e. named by content digest), and a set of composefs
+descriptors using this basedir. Any file that happens to be shared
+between two images (same content, so same digest) will now only be
+stored once on the disk.
+
+Such sharing is possible even if the metadata for the file in the
+image differs (common reasons for metadata difference are mtime,
+permissions, xattrs, etc). The sharing is also anonymous in the sense
+that you can't tell the difference on the mounted files from a
+non-shared file (for example by looking at the link count for a
+hardlinked file).
+
+In addition, any shared files that are actively in use will share
+page-cache, because the page cache for the file contents will be
+addressed by the backing file in the basedir, This means (for example)
+that shared libraries between images will only be mmap:ed once across
+all mounts.
+
+Integrity validation
+====================
+
+Composefs uses :doc:`fs-verity <fsverity>` for integrity validation,
+and extends it by making the validation also apply to the directory
+metadata.  This happens on two levels, validation of the descriptor
+and validation of the backing files.
+
+For descriptor validation, the idea is that you enable fs-verity on
+the descriptor file which seals it from changes that would affect the
+directory metadata. Additionally you can pass a "digest" mount option,
+which composefs verifies against the descriptor fs-verity measure. Such
+an option could be embedded in a trusted source (like a signed kernel
+command line) and be used as a root of trust if using composefs for the
+root filesystem.
+
+For file validation, the descriptor can contain digests for each
+backing file, and you can enable fs-verity on them too. Composefs will
+validate the digest before using the backing files. This means any
+(accidental or malicious) modification of the basedir will be detected
+at the time the file is used.
+
+Expected use-cases
+==================
+
+Container Image Storage
+```````````````````````
+
+Typically a container image is stored as a set of "layer" directories,
+merged into one mount by using overlayfs.  The lower layers are
+read-only image and the upper layer is the writable directory of a
+running container. Multiple uses of the same layer can be shared this
+way, but it is hard to share individual files between unrelated layers.
+
+Using composefs, we can instead use a shared, content-addressed
+store for all the images in the system, and use composefs
+for the read-only image of each container, pointing into the
+shared store. Then for a running container we use an overlayfs
+with the lower dir being the composefs and the upper dir being
+the writable directory.
+
+
+Ostree root filesystem validation
+`````````````````````````````````
+
+Ostree uses a content-addressed on-disk store for file content,
+allowing efficient updates and sharing of content. However to actually
+use these as a root filesystem it needs to create a real
+"chroot-style" directory, containing hard links into the store. The
+store itself is validated when created, but once the hard-link
+directory is created, nothing validates the directory structure for
+post-creation changes.
+
+Instead of a chroot we can use composefs. The composefs image pointing
+to the object store is created, then fs-verity is enabled for
+everything and the descriptor digest is encoded in the
+kernel-command line. This will allow booting a trusted system where
+all directory metadata and file content is validated lazily at use.
+
+
+Mount options
+=============
+
+basedir
+    A colon separated list of directories to use as a base when resolving
+    relative content paths.
+
+verity_check=[0,1,2]
+    When to verify backing file fs-verity:
+
+    * 0: never verify
+    * 1: if the digest is specified in image
+    * 2: always verify the file (and require digests in image)
+
+digest
+    A fs-verity sha256 digest that the descriptor file must match. If set,
+    "verity_check" defaults to 2.
+
+
+Filesystem format
+=================
+
+The format of the descriptor contains three sections: superblock,
+inodes and variable data. All data in the file is stored in
+little-endian form.
+
+The superblock starts at the beginning of the file and contains
+version, magic value, and offsets to the variable data section.
+
+The inode table starts at a fixed location right after the
+header. It is a array of fixed size inode data. The first inode
+is the root inode, and inode numbers are index into this array.
+
+The variable data section is stored after the inode section, and you
+can find it from the offset in the header. It contains paths, digests,
+dirents and Xattrs data. The xattrs are referred to by offset and size
+in the xattr attribute in the inode data. Each xattr data can be used
+by many inodes in the filesystem.
+
+For more details, see cfs.h.
+
+Tools
+=====
+
+Tools for composefs can be found at https://github.com/containers/composefs
+
+There is a mkcomposefs tool which can be used to create images on the
+CLI, and a library that applications can use to create composefs
+images.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index bee63d42e5ec..9b7cf136755d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -75,6 +75,7 @@ Documentation for filesystem implementations.
    cifs/index
    ceph
    coda
+   composefs
    configfs
    cramfs
    dax
-- 
2.39.0


  parent reply	other threads:[~2023-01-20 15:24 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 15:23 [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 1/6] fsverity: Export fsverity_get_digest Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 2/6] composefs: Add on-disk layout header Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 3/6] composefs: Add descriptor parsing code Alexander Larsson
2023-01-20 15:23 ` [PATCH v3 4/6] composefs: Add filesystem implementation Alexander Larsson
2023-01-20 15:23 ` Alexander Larsson [this message]
2023-01-21  2:19   ` [PATCH v3 5/6] composefs: Add documentation Bagas Sanjaya
2023-01-20 15:23 ` [PATCH v3 6/6] composefs: Add kconfig and build support Alexander Larsson
2023-01-20 19:44 ` [PATCH v3 0/6] Composefs: an opportunistically sharing verified image filesystem Amir Goldstein
2023-01-20 22:18   ` Giuseppe Scrivano
2023-01-21  3:08     ` Gao Xiang
2023-01-21 16:19       ` Giuseppe Scrivano
2023-01-21 17:15         ` Gao Xiang
2023-01-21 22:34           ` Giuseppe Scrivano
2023-01-22  0:39             ` Gao Xiang
2023-01-22  9:01               ` Giuseppe Scrivano
2023-01-22  9:32                 ` Giuseppe Scrivano
2023-01-24  0:08                   ` Gao Xiang
2023-01-21 10:57     ` Amir Goldstein
2023-01-21 15:01       ` Giuseppe Scrivano
2023-01-21 15:54         ` Amir Goldstein
2023-01-21 16:26           ` Gao Xiang
2023-01-23 17:56   ` Alexander Larsson
2023-01-23 23:59     ` Gao Xiang
2023-01-24  3:24     ` Amir Goldstein
2023-01-24 13:10       ` Alexander Larsson
2023-01-24 14:40         ` Gao Xiang
2023-01-24 19:06         ` Amir Goldstein
2023-01-25  4:18           ` Dave Chinner
2023-01-25  8:32             ` Amir Goldstein
2023-01-25 10:08               ` Alexander Larsson
2023-01-25 10:43                 ` Amir Goldstein
2023-01-25 10:39               ` Giuseppe Scrivano
2023-01-25 11:17                 ` Amir Goldstein
2023-01-25 12:30                   ` Giuseppe Scrivano
2023-01-25 12:46                     ` Amir Goldstein
2023-01-25 13:10                       ` Giuseppe Scrivano
2023-01-25 18:07                         ` Amir Goldstein
2023-01-25 19:45                           ` Giuseppe Scrivano
2023-01-25 20:23                             ` Amir Goldstein
2023-01-25 20:29                               ` Amir Goldstein
2023-01-27 15:57                               ` Vivek Goyal
2023-01-25 15:24                       ` Christian Brauner
2023-01-25 16:05                         ` Giuseppe Scrivano
2023-01-25  9:37           ` Alexander Larsson
2023-01-25 10:05             ` Gao Xiang
2023-01-25 10:15               ` Alexander Larsson
2023-01-27 10:24                 ` Gao Xiang
2023-02-01  4:28                   ` Jingbo Xu
2023-02-01  7:44                     ` Amir Goldstein
2023-02-01  8:59                       ` Jingbo Xu
2023-02-01  9:52                         ` Alexander Larsson
2023-02-01 12:39                           ` Jingbo Xu
2023-02-01  9:46                     ` Alexander Larsson
2023-02-01 10:01                       ` Gao Xiang
2023-02-01 11:22                         ` Gao Xiang
2023-02-02  6:37                           ` Amir Goldstein
2023-02-02  7:17                             ` Gao Xiang
2023-02-02  7:37                               ` Gao Xiang
2023-02-03 11:32                                 ` Alexander Larsson
2023-02-03 12:46                                   ` Amir Goldstein
2023-02-03 15:09                                     ` Gao Xiang
2023-02-05 19:06                                       ` Amir Goldstein
2023-02-06  7:59                                         ` Amir Goldstein
2023-02-06 10:35                                         ` Miklos Szeredi
2023-02-06 13:30                                           ` Amir Goldstein
2023-02-06 16:34                                             ` Miklos Szeredi
2023-02-06 17:16                                               ` Amir Goldstein
2023-02-06 18:17                                                 ` Amir Goldstein
2023-02-06 19:32                                                 ` Miklos Szeredi
2023-02-06 20:06                                                   ` Amir Goldstein
2023-02-07  8:12                                                     ` Alexander Larsson
2023-02-06 12:51                                         ` Alexander Larsson
2023-02-07  8:12                                         ` Jingbo Xu
2023-02-06 12:43                                     ` Alexander Larsson
2023-02-06 13:27                                       ` Gao Xiang
2023-02-06 15:31                                         ` Alexander Larsson
2023-02-01 12:06                       ` Jingbo Xu
2023-02-02  4:57                       ` Jingbo Xu
2023-02-02  4:59                         ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20baca7da01c285b2a77c815c9d4b3080ce4b279.1674227308.git.alexl@redhat.com \
    --to=alexl@redhat.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=gscrivan@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).