All of lore.kernel.org
 help / color / mirror / Atom feed
From: ezemtsov@google.com
To: linux-fsdevel@vger.kernel.org
Cc: tytso@mit.edu, Eugene Zemtsov <ezemtsov@google.com>
Subject: [PATCH 1/6] incfs: Add first files of incrementalfs
Date: Wed,  1 May 2019 21:03:26 -0700	[thread overview]
Message-ID: <20190502040331.81196-2-ezemtsov@google.com> (raw)
In-Reply-To: <20190502040331.81196-1-ezemtsov@google.com>

From: Eugene Zemtsov <ezemtsov@google.com>

- fs/incfs dir
- Kconfig (CONFIG_INCREMENTAL_FS)
- Makefile
- Module and file system initialization and clean up code
- New MAINTAINERS entry
- Add incrementalfs.h UAPI header
- Register ioctl range in ioctl-numbers.txt
- Documentation

Signed-off-by: Eugene Zemtsov <ezemtsov@google.com>
---
 Documentation/filesystems/incrementalfs.rst | 452 ++++++++++++++++++++
 Documentation/ioctl/ioctl-number.txt        |   1 +
 MAINTAINERS                                 |   7 +
 fs/Kconfig                                  |   1 +
 fs/Makefile                                 |   1 +
 fs/incfs/Kconfig                            |  10 +
 fs/incfs/Makefile                           |   4 +
 fs/incfs/main.c                             |  85 ++++
 fs/incfs/vfs.c                              |  37 ++
 include/uapi/linux/incrementalfs.h          | 189 ++++++++
 10 files changed, 787 insertions(+)
 create mode 100644 Documentation/filesystems/incrementalfs.rst
 create mode 100644 fs/incfs/Kconfig
 create mode 100644 fs/incfs/Makefile
 create mode 100644 fs/incfs/main.c
 create mode 100644 fs/incfs/vfs.c
 create mode 100644 include/uapi/linux/incrementalfs.h

diff --git a/Documentation/filesystems/incrementalfs.rst b/Documentation/filesystems/incrementalfs.rst
new file mode 100644
index 000000000000..682e3dcb6b5a
--- /dev/null
+++ b/Documentation/filesystems/incrementalfs.rst
@@ -0,0 +1,452 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Incremental File System
+=======================
+
+Overview
+========
+Incremental FS is special-purpose Linux virtual file system that allows
+execution of a program while its binary and resource files are still being
+lazily downloaded over the network, USB etc. It is focused on incremental
+delivery for a small number (under 100) of big files (more than 10 megabytes).
+Incremental FS doesn’t allow direct writes into files and, once loaded, file
+content never changes. Incremental FS doesn’t use a block device, instead it
+saves data into a backing file located on a regular file-system.
+
+But why?
+--------
+To allow running **big** Android apps before their binaries and resources are
+fully downloaded to an Android device. If an app reads something not loaded yet,
+it needs to wait for the data block to be fetched, but in most cases hot blocks
+can be loaded in advance.
+
+Workflow
+--------
+A userspace process, called a data loader, mounts an instance of incremental-fs
+giving it a file descriptor on an underlying file system (like ext4 or f2fs).
+Incremental-fs reads content (if any) of this backing file and interprets it as
+a file system image with files, directories and data blocks. At this point
+the data loader can declare new files to be shown by incremental-fs.
+
+A process is started from a binary located on incremental-fs.
+All reads are served directly from the backing file
+without roundtrips into userspace. If the process accesses a data block that was
+not originally present in the backing file, the read operation waits.
+
+Meanwhile the data loader can feed new data blocks to incremental-fs by calling
+write() on a special .cmd pseudo-file. The data loader can request information
+about pending reads by calling poll() and read() on the .cmd pseudo-file.
+This mechanism allows the data loader to serve most urgently needed data first.
+Once a data block is given to incremental-fs, it saves it to the backing file
+and unblocks all the reads waiting for this block.
+
+Eventually all data for all files is uploaded by the data loader, and saved by
+incremental-fs into the backing file. At that moment the data loader is not
+needed any longer. The backing file will play the role of a complete
+filesystem image for all future runs of the program.
+
+Non-goals
+---------
+* Allowing direct writes by the executing processes into files on incremental-fs
+* Allowing the data loader change file size or content after it was loaded.
+* Having more than a couple hundred files and directories.
+
+
+Features
+========
+
+Read-only, but not unchanging
+-----------------------------
+On the surface a mount directory of incremental-fs would look similar to
+a read-only instance of network file system: files and directories can be
+listed and read, but can’t be directly created or modified via creat() or
+write(). At the same time the data loader can make changes to a directory
+structure via external ioctl-s. i.e. link and unlink files and directories
+(if they empty). Data can't be changed this way, once a file block is loaded
+there is no way to change it.
+
+Filesystem image in a backing file
+----------------------------------
+Instead of using a block device, all data and metadata is stored in a
+backing file provided as a mount parameter. The backing file is located on
+an underlying file system (like ext4 or f2fs). Such approach is very similar
+to what might be achieved by using loopback device with a traditional file
+system, but it avoids extra set-up steps and indirections. It also allows
+incremental-fs image to dynamically grow as new files and data come without
+having to do any extra steps for resizing.
+
+If the backing file contains data at the moment when incremental-fs is mounted,
+content of the backing file is being interpreted as filesystem image.
+New files and data can still be added through the external interface,
+and they will be saved to the backing file.
+
+Data compression
+----------------
+Incremental-fs can store compressed data. In this case each 4KB data block is
+compressed separately. Data blocks can be provided to incremental-fs by
+the data loader in a compressed form. Incremental-fs uncompresses blocks
+each time a executing process reads it (modulo page cache). Compression also
+takes care of blocks composed of all zero bytes removing necessity to handle
+this case separately.
+
+Partially present files
+-----------------------
+Data in the files consists of 4KB blocks, each block can be present or absent.
+Unlike in sparse files, reading an absent block doesn’t return all zeros.
+It waits for the data block to be loaded via the ioctl interface
+(respecting a timeout). Once a data block is loaded it never disappears
+and can’t be changed or erased from a file. This ability to frictionlessly
+wait for temporary missing data is the main feature of incremental-fs.
+
+Hard links. Multiple names for the same file
+--------------------------------------------
+Like all traditional UNIX file systems, incremental-fs supports hard links,
+i.e. different file names in different directories can refer to the same file.
+As mentioned above new hard links can be created and removed via
+the ioctl interface, but actual data files are immutable, modulo partial
+data loading. Each directory can only have at most one name referencing it.
+
+Inspection of incremental-fs internal state
+-------------------------------------------
+poll() and read() on the .cmd pseudo-file allow data loaders to get a list of
+read operations stalled due to lack of a data block (pending reads).
+
+
+Application Programming Interface
+=================================
+
+Regular file system interface
+-----------------------------
+Executing process access files and directories via regular Linux file interface:
+open, read, close etc. All the intricacies of data loading a file representation
+are hidden from them.
+
+External .cmd file interface
+----------------------------
+When incremental-fs is mounted, a mount directory contains a pseudo-file
+called '.cmd'. The data loader will open this file and call read(), write(),
+poll() and ioctl() on it inspect and change state of incremental-fs.
+
+poll() and read() are used by the data loader to wait for pending reads to
+appear and obtain an array of ``struct incfs_pending_read_info``.
+
+write() is used by the data loader to feed new data blocks to incremental-fs.
+A data buffer given to write() is interpreted as an array of
+``struct incfs_new_data_block``. Structs in the array describe locations and
+properties of data blocks loaded with this write() call.
+
+``ioctl(INCFS_IOC_PROCESS_INSTRUCTION)`` is used to change structure of
+incremental-fs. It receives an pointer to ``struct incfs_instruction``
+where type field can have be one of the following values.
+
+**INCFS_INSTRUCTION_NEW_FILE**
+Creates an inode (a file or a directory) without a name.
+It assumes ``incfs_new_file_instruction.file`` is populated with details.
+
+**INCFS_INSTRUCTION_ADD_DIR_ENTRY**
+Creates a name (aka hardlink) for an inode in a directory.
+A directory can't have more than one hardlink pointing to it, but files can be
+linked from different directories.
+It assumes ``incfs_new_file_instruction.dir_entry`` is populated with details.
+
+**INCFS_INSTRUCTION_REMOVE_DIR_ENTRY**
+Remove a name (aka hardlink) for a file from a directory.
+Only empty directories can be unlinked.
+It assumes ``incfs_new_file_instruction.dir_entry`` is populated with details.
+
+For more details see in uapi/linux/incrementalfs.h and samples below.
+
+Supported mount options
+-----------------------
+See ``fs/incfs/options.c`` for more details.
+
+    * ``backing_fd=<unsigned int>``
+        Required. A file descriptor of a backing file opened by the process
+        calling mount(2). This descriptor can be closed after mount returns.
+
+    * ``read_timeout_msc=<unsigned int>``
+        Default: 1000. Timeout in milliseconds before a read operation fails
+        if no data found in the backing file or provided by the data loader.
+
+Sysfs files
+-----------
+``/sys/fs/incremental-fs/version`` - a current version of the filesystem.
+One ASCII encoded positive integer number with a new line at the end.
+
+
+Examples
+--------
+See ``sample_data_loader.c`` for a complete implementation of a data loader.
+
+Mount incremental-fs
+~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int mount_fs(char *mount_dir, char *backing_file, int timeout_msc)
+    {
+        static const char fs_name[] = INCFS_NAME;
+        char mount_options[512];
+        int backing_fd;
+        int result;
+
+        backing_fd = open(backing_file, O_RDWR);
+        if (backing_fd == -1) {
+            perror("Error in opening backing file");
+            return 1;
+        }
+
+        snprintf(mount_options, ARRAY_SIZE(mount_options),
+            "backing_fd=%u,read_timeout_msc=%u", backing_fd, timeout_msc);
+
+        result = mount(fs_name, mount_dir, fs_name, 0, mount_options);
+        if (result != 0)
+            perror("Error mounting fs.");
+        return result;
+    }
+
+Open .cmd file
+~~~~~~~~~~~~~~
+
+::
+
+    int open_commands_file(char *mount_dir)
+    {
+        char cmd_file[255];
+        int cmd_fd;
+
+        snprintf(cmd_file, ARRAY_SIZE(cmd_file), "%s/.cmd", mount_dir);
+        cmd_fd = open(cmd_file, O_RDWR);
+        if (cmd_fd < 0)
+            perror("Can't open commands file");
+        return cmd_fd;
+    }
+
+Add a file to the file system
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int create_file(int cmd_fd, char *filename, int *ino_out, size_t size)
+    {
+        int ret = 0;
+        __u16 ino = 0;
+        struct incfs_instruction inst = {
+                .version = INCFS_HEADER_VER,
+                .type = INCFS_INSTRUCTION_NEW_FILE,
+                .file = {
+                    .size = size,
+                    .mode = S_IFREG | 0555,
+                }
+        };
+
+        ret = ioctl(cmd_fd, INCFS_IOC_PROCESS_INSTRUCTION, &inst);
+        if (ret)
+            return -errno;
+
+        ino = inst.file.ino_out;
+        inst = (struct incfs_instruction){
+                .version = INCFS_HEADER_VER,
+                .type = INCFS_INSTRUCTION_ADD_DIR_ENTRY,
+                .dir_entry = {
+                    .dir_ino = INCFS_ROOT_INODE,
+                    .child_ino = ino,
+                    .name = ptr_to_u64(filename),
+                    .name_len = strlen(filename)
+                }
+            };
+        ret = ioctl(cmd_fd, INCFS_IOC_PROCESS_INSTRUCTION, &inst);
+        if (ret)
+            return -errno;
+        *ino_out = ino;
+        return 0;
+    }
+
+Load data into a file
+~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int cmd_fd = open_commands_file(path_to_mount_dir);
+    char *data = get_some_data();
+    struct incfs_new_data_block block;
+    int err;
+
+    block.file_ino = file_ino;
+    block.block_index = 0;
+    block.compression = COMPRESSION_NONE;
+    block.data = (__u64)data;
+    block.data_len = INCFS_DATA_FILE_BLOCK_SIZE;
+
+    err = write(cmd_fd, &block, sizeof(block));
+
+
+Get an array of pending reads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+    int poll_res = 0;
+    struct incfs_pending_read_info reads[10];
+    int cmd_fd = open_commands_file(path_to_mount_dir);
+    struct pollfd pollfd = {
+        .fd = cmd_fd,
+        .events = POLLIN
+    };
+
+    poll_res = poll(&pollfd, 1, timeout);
+    if (poll_res > 0 && (pollfd.revents | POLLIN)) {
+        ssize_t read_res = read(cmd_fd, reads, sizeof(reads));
+        if (read_res > 0)
+            printf("Waiting reads %ld\n", read_res / sizeof(reads[0]));
+    }
+
+
+
+Ondisk format
+=============
+
+General principles
+------------------
+* The backbone of the incremental-fs ondisk format is an append only linked
+  list of metadata blocks. Each metadata block contains an offset of the next
+  one. These blocks describe files and directories on the
+  file system. They also represent actions of adding and removing file names
+  (hard links).
+  Every time incremental-fs instance is mounted, it reads through this list
+  to recreate filesystem's state in memory. An offset of the first record in the
+  metadata list is stored in the superblock at the beginning of the backing
+  file.
+
+* Most of the backing file is taken by data areas and blockmaps.
+  Since data blocks can be compressed and have different sizes,
+  single per-file data area can't be pre-allocated. That's why blockmaps are
+  needed in order to find a location and size of each data block in
+  the backing file. Each time a file is created, a corresponding block map is
+  allocated to store future offsets of data blocks.
+
+  Whenever a data block is given by data loader to incremental-fs:
+    - A data area with the given block is appended to the end of
+      the backing file.
+    - A record in the blockmap for the given block index is updated to reflect
+      its location, size, and compression algorithm.
+
+Important format details
+------------------------
+Ondisk structures are defined in the ``format.h`` file. They are all packed
+and use little-endian order.
+A backing file must start with ``incfs_super_block`` with ``s_magic`` field
+equal to 0x5346434e49 "INCFS".
+
+Metadata records:
+
+* ``incfs_inode`` - metadata record to declare a file or a directory.
+                    ``incfs_inode.i_mode`` determents if it is a file
+                    or a directory.
+* ``incfs_blockmap_entry`` - metadata record that specifies size and location
+                            of a blockmap area for a given file. This area
+                            contains an array of ``incfs_blockmap_entry``-s.
+* ``incfs_dir_action`` - metadata record that specifies changes made to a
+                    to a directory structure, e.g. add or remove a hardlink.
+* ``incfs_md_header`` - header of a metadata record. It's always a part
+                    of other structures and served purpose of metadata
+                    bookkeeping.
+
+Other ondisk structures:
+
+* ``incfs_super_block`` - backing file header
+* ``incfs_blockmap_entry`` - a record in a blockmap area that describes size
+                        and location of a data block.
+* Data blocks dont have any particular structure, they are written to the backing
+  file in a raw form as they come from a data loader.
+
+
+Backing file layout
+-------------------
+::
+
+              +-------------------------------------------+
+              |            incfs_super_block              |]---+
+              +-------------------------------------------+    |
+              |                 metadata                  |<---+
+              |                incfs_inode                |]---+
+              +-------------------------------------------+    |
+                        .........................              |
+              +-------------------------------------------+    |   metadata
+     +------->|               blockmap area               |    |  list links
+     |        |          [incfs_blockmap_entry]           |    |
+     |        |          [incfs_blockmap_entry]           |    |
+     |        |          [incfs_blockmap_entry]           |    |
+     |    +--[|          [incfs_blockmap_entry]           |    |
+     |    |   |          [incfs_blockmap_entry]           |    |
+     |    |   |          [incfs_blockmap_entry]           |    |
+     |    |   +-------------------------------------------+    |
+     |    |             .........................              |
+     |    |   +-------------------------------------------+    |
+     |    |   |                 metadata                  |<---+
+     +----|--[|               incfs_blockmap              |]---+
+          |   +-------------------------------------------+    |
+          |             .........................              |
+          |   +-------------------------------------------+    |
+          +-->|                 data block                |    |
+              +-------------------------------------------+    |
+                        .........................              |
+              +-------------------------------------------+    |
+              |                 metadata                  |<---+
+              |             incfs_dir_action              |
+              +-------------------------------------------+
+
+Unreferenced files and absence of garbage collection
+----------------------------------------------------
+Described file format can produce files that don't have any names for them in
+any directories. Incremental-fs takes no steps to prevent such situations or
+reclaim space occupied by such files in the backing file. If garbage collection
+is needed it has to be implemented as a separate userspace tool.
+
+
+Design alternatives
+===================
+
+Why isn't incremental-fs implemented via FUSE?
+----------------------------------------------
+TLDR: FUSE-based filesystems add 20-80% of performance overhead for target
+scenarios, and increase power use on mobile beyond acceptable limit
+for widespread deployment. A custom kernel filesystem is the way to overcome
+these limitations.
+
+From the theoretical side of things, FUSE filesystem adds some overhead to
+each filesystem operation that’s not handled by OS page cache:
+
+    * When an IO request arrives to FUSE driver (D), it puts it into a queue
+      that runs on a separate kernel thread
+    * Then another separate user-mode handler process (H) has to run,
+      potentially after a context switch, to read the request from the queue.
+      Reading the request adds a kernel-user mode transition to the handling.
+    * (H) sends the IO request to kernel to handle it on some underlying storage
+      filesystem. This adds a user-kernel and kernel-user mode transition
+      pair to the handling.
+    * (H) then responds to the FUSE request via a write(2) call.
+      Writing the response is another user-kernel mode transition.
+    * (D) needs to read the response from (H) when its kernel thread runs
+      and forward it to the user
+
+Together, the scenario adds 2 extra user-kernel-user mode transition pairs,
+and potentially has up to 3 additional context switches for the FUSE kernel
+thread and the user-mode handler to start running for each IO request on the
+filesystem.
+This overhead can vary from unnoticeable to unmanageable, depending on the
+target scenario. But it will always burn extra power via CPU staying longer
+in non-idle state, handling context switches and mode transitions.
+One important goal for the new filesystem is to be able to handle each page
+read separately on demand, because we don't want to wait and download more data
+than absolutely necessary. Thus readahead would need to be disabled completely.
+This increases the number of separate IO requests and the FUSE related overhead
+by almost 32x (128KB readahead limit vs 4KB individual block operations)
+
+For more info see a 2017 USENIX research paper:
+To FUSE or Not to FUSE: Performance of User-Space File Systems
+Bharath Kumar Reddy Vangoor, Stony Brook University;
+Vasily Tarasov, IBM Research-Almaden;
+Erez Zadok, Stony Brook University
+https://www.usenix.org/system/files/conference/fast17/fast17-vangoor.pdf
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index c9558146ac58..a5f8e0eaff91 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -227,6 +227,7 @@ Code  Seq#(hex)	Include File		Comments
 'f'	00-0F	fs/ocfs2/ocfs2_fs.h	conflict!
 'g'	00-0F	linux/usb/gadgetfs.h
 'g'	20-2F	linux/usb/g_printer.h
+'g'	30-3F	include/uapi/linux/incrementalfs.h
 'h'	00-7F				conflict! Charon filesystem
 					<mailto:zapman@interlan.net>
 'h'	00-1F	linux/hpet.h		conflict!
diff --git a/MAINTAINERS b/MAINTAINERS
index 5c38f21aee78..c92ad89ee5e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7630,6 +7630,13 @@ F:	Documentation/hwmon/ina2xx
 F:	drivers/hwmon/ina2xx.c
 F:	include/linux/platform_data/ina2xx.h

+INCREMENTAL FILESYSTEM
+M:	Eugene Zemtsov <ezemtsov@google.com>
+S:	Supported
+F:	fs/incfs/
+F:	include/uapi/linux/incrementalfs.h
+F:	Documentation/filesystems/incrementalfs.rst
+
 INDUSTRY PACK SUBSYSTEM (IPACK)
 M:	Samuel Iglesias Gonsalvez <siglesias@igalia.com>
 M:	Jens Taprogge <jens.taprogge@taprogge.org>
diff --git a/fs/Kconfig b/fs/Kconfig
index 3e6d3101f3ff..19f89c936209 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -119,6 +119,7 @@ source "fs/quota/Kconfig"
 source "fs/autofs/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
+source "fs/incfs/Kconfig"

 menu "Caches"

diff --git a/fs/Makefile b/fs/Makefile
index 427fec226fae..08c6b827df1a 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_AUTOFS_FS)		+= autofs/
 obj-$(CONFIG_ADFS_FS)		+= adfs/
 obj-$(CONFIG_FUSE_FS)		+= fuse/
 obj-$(CONFIG_OVERLAY_FS)	+= overlayfs/
+obj-$(CONFIG_INCREMENTAL_FS)	+= incfs/
 obj-$(CONFIG_ORANGEFS_FS)       += orangefs/
 obj-$(CONFIG_UDF_FS)		+= udf/
 obj-$(CONFIG_SUN_OPENPROMFS)	+= openpromfs/
diff --git a/fs/incfs/Kconfig b/fs/incfs/Kconfig
new file mode 100644
index 000000000000..a810131deed0
--- /dev/null
+++ b/fs/incfs/Kconfig
@@ -0,0 +1,10 @@
+config INCREMENTAL_FS
+	tristate "Incremental file system support"
+	depends on BLOCK && CRC32
+	help
+	  Incremental FS is a read-only virtual file system that facilitates execution
+	  of programs while their binaries are still being lazily downloaded over the
+	  network, USB or pigeon post.
+
+	  To compile this file system support as a module, choose M here: the
+	  module will be called incrementalfs.
\ No newline at end of file
diff --git a/fs/incfs/Makefile b/fs/incfs/Makefile
new file mode 100644
index 000000000000..7892196c634f
--- /dev/null
+++ b/fs/incfs/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_INCREMENTAL_FS)	+= incrementalfs.o
+
+incrementalfs-y := main.o vfs.o
\ No newline at end of file
diff --git a/fs/incfs/main.c b/fs/incfs/main.c
new file mode 100644
index 000000000000..07e1952ede9e
--- /dev/null
+++ b/fs/incfs/main.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+#define INCFS_CORE_VERSION 1
+
+extern struct file_system_type incfs_fs_type;
+
+static struct kobject *sysfs_root;
+
+static ssize_t version_show(struct kobject *kobj,
+			    struct kobj_attribute *attr, char *buff)
+{
+	return snprintf(buff, PAGE_SIZE, "%d\n", INCFS_CORE_VERSION);
+}
+
+static struct kobj_attribute version_attr = __ATTR_RO(version);
+
+static struct attribute *attributes[] = {
+	&version_attr.attr,
+	NULL,
+};
+
+static const struct attribute_group attr_group = {
+	.attrs = attributes,
+};
+
+static int __init init_sysfs(void)
+{
+	int res = 0;
+
+	sysfs_root = kobject_create_and_add(INCFS_NAME, fs_kobj);
+	if (!sysfs_root)
+		return -ENOMEM;
+
+	res = sysfs_create_group(sysfs_root, &attr_group);
+	if (res) {
+		kobject_put(sysfs_root);
+		sysfs_root = NULL;
+	}
+	return res;
+}
+
+static void cleanup_sysfs(void)
+{
+	if (sysfs_root) {
+		sysfs_remove_group(sysfs_root, &attr_group);
+		kobject_put(sysfs_root);
+		sysfs_root = NULL;
+	}
+}
+
+static int __init init_incfs_module(void)
+{
+	int err = 0;
+
+	err = init_sysfs();
+	if (err)
+		return err;
+
+	err = register_filesystem(&incfs_fs_type);
+	if (err)
+		cleanup_sysfs();
+
+	return err;
+}
+
+static void __exit cleanup_incfs_module(void)
+{
+	cleanup_sysfs();
+	unregister_filesystem(&incfs_fs_type);
+}
+
+module_init(init_incfs_module);
+module_exit(cleanup_incfs_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Eugene Zemtsov <ezemtsov@google.com>");
+MODULE_DESCRIPTION("Incremental File System");
diff --git a/fs/incfs/vfs.c b/fs/incfs/vfs.c
new file mode 100644
index 000000000000..2e71f0edf8a1
--- /dev/null
+++ b/fs/incfs/vfs.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Google LLC
+ */
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+
+#include <uapi/linux/incrementalfs.h>
+
+static struct dentry *mount_fs(struct file_system_type *type, int flags,
+			       const char *dev_name, void *data);
+static void kill_sb(struct super_block *sb);
+
+struct file_system_type incfs_fs_type = {
+	.owner = THIS_MODULE,
+	.name = INCFS_NAME,
+	.mount = mount_fs,
+	.kill_sb = kill_sb,
+	.fs_flags = 0
+};
+
+static int fill_super_block(struct super_block *sb, void *data, int silent)
+{
+	return 0;
+}
+
+static struct dentry *mount_fs(struct file_system_type *type, int flags,
+			       const char *dev_name, void *data)
+{
+	return mount_nodev(type, flags, data, fill_super_block);
+}
+
+static void kill_sb(struct super_block *sb)
+{
+	generic_shutdown_super(sb);
+}
+
diff --git a/include/uapi/linux/incrementalfs.h b/include/uapi/linux/incrementalfs.h
new file mode 100644
index 000000000000..5bcf66ac852b
--- /dev/null
+++ b/include/uapi/linux/incrementalfs.h
@@ -0,0 +1,189 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Userspace interface for Incremental FS.
+ *
+ * Incremental FS is special-purpose Linux virtual file system that allows
+ * execution of a program while its binary and resource files are still being
+ * lazily downloaded over the network, USB etc.
+ *
+ * Copyright 2019 Google LLC
+ */
+#ifndef _UAPI_LINUX_INCREMENTALFS_H
+#define _UAPI_LINUX_INCREMENTALFS_H
+
+#include <linux/limits.h>
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* ===== constants ===== */
+#define INCFS_NAME "incremental-fs"
+#define INCFS_MAGIC_NUMBER (0x5346434e49ul)
+#define INCFS_DATA_FILE_BLOCK_SIZE 4096
+#define INCFS_HEADER_VER 1
+
+#define INCFS_MAX_FILES 1000
+#define INCFS_COMMAND_INODE 1
+#define INCFS_ROOT_INODE 2
+
+#define INCFS_IOCTL_BASE_CODE 'g'
+
+/* ===== ioctl requests on command file ===== */
+
+/* Make changes to the file system via incfs instructions. */
+#define INCFS_IOC_PROCESS_INSTRUCTION \
+	_IOWR(INCFS_IOCTL_BASE_CODE, 30, struct incfs_instruction)
+
+enum incfs_compression_alg { COMPRESSION_NONE = 0, COMPRESSION_LZ4 = 1 };
+
+/*
+ * Description of a pending read. A pending read - a read call by
+ * a userspace program for which the filesystem currently doesn't have data.
+ *
+ * This structs can be read from .cmd file to obtain a set of reads which
+ * are currently pending.
+ */
+struct incfs_pending_read_info {
+	/* Inode number of a file that is being read from. */
+	__aligned_u64 file_ino;
+
+	/* Index of a file block that is being read. */
+	__u32 block_index;
+
+	/* A serial number of this pending read. */
+	__u32 serial_number;
+};
+
+/*
+ * A struct to be written into a .cmd file to provide a data block for a file.
+ */
+struct incfs_new_data_block {
+	/* Inode number of a file this block belongs to. */
+	__aligned_u64 file_ino;
+
+	/* Index of a data block. */
+	__u32 block_index;
+
+	/* Length of data */
+	__u32 data_len;
+
+	/*
+	 * A pointer ot an actual data for the block.
+	 *
+	 * Equivalent to: __u8 *data;
+	 */
+	__aligned_u64 data;
+
+	/*
+	 * Compression algorithm used to compress the data block.
+	 * Values from enum incfs_compression_alg.
+	 */
+	__u32 compression;
+
+	__u32 reserved1;
+
+	__aligned_u64 reserved2;
+};
+
+enum incfs_instruction_type {
+	INCFS_INSTRUCTION_NOOP = 0,
+	INCFS_INSTRUCTION_NEW_FILE = 1,
+	INCFS_INSTRUCTION_ADD_DIR_ENTRY = 3,
+	INCFS_INSTRUCTION_REMOVE_DIR_ENTRY = 4,
+};
+
+/*
+ * Create a new file or directory.
+ * Corresponds to INCFS_INSTRUCTION_NEW_FILE
+ */
+struct incfs_new_file_instruction {
+	/*
+	 * [Out param. Populated by the kernel after ioctl.]
+	 * Inode number of a newly created file.
+	 */
+	__aligned_u64 ino_out;
+
+	/*
+	 * Total size of the new file. Ignored if S_ISDIR(mode).
+	 */
+	__aligned_u64 size;
+
+	/*
+	 * File mode. Permissions and dir flag.
+	 */
+	__u16 mode;
+
+	__u16 reserved1;
+
+	__u32 reserved2;
+
+	__aligned_u64 reserved3;
+
+	__aligned_u64 reserved4;
+
+	__aligned_u64 reserved5;
+
+	__aligned_u64 reserved6;
+
+	__aligned_u64 reserved7;
+};
+
+/*
+ * Create or remove a name (aka hardlink) for a file in a directory.
+ * Corresponds to
+ * INCFS_INSTRUCTION_ADD_DIR_ENTRY,
+ * INCFS_INSTRUCTION_REMOVE_DIR_ENTRY
+ */
+struct incfs_dir_entry_instruction {
+	/* Inode number of a directory to add/remove a file to/from. */
+	__aligned_u64 dir_ino;
+
+	/* File to add/remove. */
+	__aligned_u64 child_ino;
+
+	/* Length of name field */
+	__u32 name_len;
+
+	__u32 reserved1;
+
+	/*
+	 * A pointer to the name characters of a file to add/remove
+	 *
+	 * Equivalent to: char *name;
+	 */
+	__aligned_u64 name;
+
+	__aligned_u64 reserved2;
+
+	__aligned_u64 reserved3;
+
+	__aligned_u64 reserved4;
+
+	__aligned_u64 reserved5;
+};
+
+/*
+ * An Incremental FS instruction is the way for userspace
+ * to
+ *   - create files and directories
+ *   - show and hide files in the directory structure
+ */
+struct incfs_instruction {
+	/* Populate with INCFS_HEADER_VER */
+	__u32 version;
+
+	/*
+	 * Type - what this instruction actually does.
+	 * Values from enum incfs_instruction_type.
+	 */
+	__u32 type;
+
+	union {
+		struct incfs_new_file_instruction file;
+		struct incfs_dir_entry_instruction dir_entry;
+
+		/* Hard limit on the instruction body size in the future. */
+		__u8 reserved[64];
+	};
+};
+
+#endif /* _UAPI_LINUX_INCREMENTALFS_H */
--
2.21.0.593.g511ec345e18-goog


  reply	other threads:[~2019-05-02  4:03 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-02  4:03 Initial patches for Incremental FS ezemtsov
2019-05-02  4:03 ` ezemtsov [this message]
2019-05-02 19:06   ` [PATCH 1/6] incfs: Add first files of incrementalfs Miklos Szeredi
2019-05-02 20:41   ` Randy Dunlap
2019-05-07 15:57   ` Jann Horn
2019-05-07 17:13   ` Greg KH
2019-05-07 17:18   ` Greg KH
2019-05-02  4:03 ` [PATCH 2/6] incfs: Backing file format ezemtsov
2019-05-02  4:03 ` [PATCH 3/6] incfs: Management of in-memory FS data structures ezemtsov
2019-05-02  4:03 ` [PATCH 4/6] incfs: Integration with VFS layer ezemtsov
2019-05-02  4:03 ` [PATCH 6/6] incfs: Integration tests for incremental-fs ezemtsov
2019-05-02 11:19 ` Initial patches for Incremental FS Amir Goldstein
2019-05-02 13:10   ` Theodore Ts'o
2019-05-02 13:26     ` Al Viro
2019-05-03  4:23       ` Eugene Zemtsov
2019-05-03  5:19         ` Amir Goldstein
2019-05-08 20:09           ` Eugene Zemtsov
2019-05-09  8:15             ` Amir Goldstein
     [not found]               ` <CAK8JDrEQnXTcCtAPkb+S4r4hORiKh_yX=0A0A=LYSVKUo_n4OA@mail.gmail.com>
2019-05-21  1:32                 ` Yurii Zubrytskyi
2019-05-22  8:32                   ` Miklos Szeredi
2019-05-22 17:25                     ` Yurii Zubrytskyi
2019-05-23  4:25                       ` Miklos Szeredi
2019-05-29 21:06                         ` Yurii Zubrytskyi
2019-05-30  9:22                           ` Miklos Szeredi
2019-05-30 22:45                             ` Yurii Zubrytskyi
2019-05-31  9:02                               ` Miklos Szeredi
2019-05-22 10:54                   ` Amir Goldstein
2019-05-03  7:23         ` Richard Weinberger
2019-05-03 10:22         ` Miklos Szeredi
2019-05-02 13:46     ` Amir Goldstein
2019-05-02 18:16   ` Richard Weinberger
2019-05-02 18:33     ` Richard Weinberger
2019-05-02 13:47 ` J. R. Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190502040331.81196-2-ezemtsov@google.com \
    --to=ezemtsov@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.