linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mfasheh@suse.de>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: [RFC][PATCH 0/76] vfs: 'views' for filesystems with more than one root
Date: Tue,  8 May 2018 11:03:20 -0700	[thread overview]
Message-ID: <20180508180436.716-1-mfasheh@suse.de> (raw)

Hi,

The VFS's super_block covers a variety of filesystem functionality. In
particular we have a single structure representing both I/O and
namespace domains.

There are requirements to de-couple this functionality. For example,
filesystems with more than one root (such as btrfs subvolumes) can
have multiple inode namespaces. This starts to confuse userspace when
it notices multiple inodes with the same inode/device tuple on a
filesystem.

In addition, it's currently impossible for a filesystem subvolume to
have a different security context from it's parent. If we could allow
for subvolumes to optionally specify their own security context, we
could use them as containers directly instead of having to go through
an overlay.


I ran into this particular problem with respect to Btrfs some years
ago and sent out a very naive set of patches which were (rightfully)
not incorporated:

https://marc.info/?l=linux-btrfs&m=130074451403261&w=2
https://marc.info/?l=linux-btrfs&m=130532890824992&w=2

During the discussion, one question did come up - why can't
filesystems like Btrfs use a superblock per subvolume? There's a
couple of problems with that:

- It's common for a single Btrfs filesystem to have thousands of
  subvolumes. So keeping a superblock for each subvol in memory would
  get prohibively expensive - imagine having 8000 copies of struct
  super_block for a file system just because we wanted some separation
  of say, s_dev.

- Writeback would also have to walk all of these superblocks -
  again not very good for system performance.

- Anyone wanting to lock down I/O on a filesystem would have to freeze
  all the superblocks. This goes for most things related to I/O really
  - we simply can't afford to have the kernel walking thousands of
  superblocks to sync a single fs.

It's far more efficient then to pull those fields we need for a
subvolume namespace into their own structure.


The following patches attempt to fix this issue by introducing a
structure, fs_view, which can be used to represent a 'view' into a
filesystem. We can migrate super_block fields to this structure one at
a time. Struct super_block gets a default view embedded into
it. Inodes get a new field, i_view, which can be dereferenced to get
the view that an inode belgongs to. By default, we point i_view to the
view on struct super_block. That way existing filesystems don't have
to do anything different.

The patches are careful not to grow the size of struct inode.

For the first patch series, we migrate s_dev over from struct
super_block to struct fs_view. This fixes a long standing bug in how
the kernel reports inode devices to userspace.

The series follows an order:

- We first introduce the fs_view structure and embed it into struct
  super_block. As discussed, struct inode gets a pointer to the
  fs_view, i_view. The only member on fs_view at this point is a
  super_block * so that we can replace i_sb. A helper function is
  provided to get to the super_block from a struct inode.

- Convert the kernel to using our helper function to get to i_sb. This
  is done on in a per-filesystem patch. The other parts of the kernel
  referencing i_sb get their changes batched up in logical groupings.

- Move s_dev from struct super_block to struct fs_view.

- Convert the kernel from inode->i_sb->s_dev to the device from our
  fs_view. In the end, these lines will look like inode_view(inode)->v_dev.

- Add an fs_view struct to each Btrfs root, point inodes to that view
  when we initialize them.


The patches are available via git and are based off Linux
v4.16. There's two branches, with identical code.

- With the inode_sb() changeover patch broken out (as is sent here):

https://github.com/markfasheh/linux fs_view-broken-out

- With the inode_sb() changeover patch in one big change:

https://github.com/markfasheh/linux fs_view


Comments are appreciated.

Thanks,
  --Mark

             reply	other threads:[~2018-05-08 18:04 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-08 18:03 Mark Fasheh [this message]
2018-05-08 18:03 ` [PATCH 01/76] vfs: Introduce struct fs_view Mark Fasheh
2018-05-08 18:03 ` [PATCH 02/76] arch: Use inode_sb() helper instead of inode->i_sb Mark Fasheh
2018-05-08 18:03 ` [PATCH 03/76] drivers: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 04/76] fs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 05/76] include: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 06/76] ipc: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 07/76] kernel: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 08/76] mm: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 09/76] net: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 10/76] security: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 11/76] fs/9p: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 12/76] fs/adfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 13/76] fs/affs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 14/76] fs/afs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 15/76] fs/autofs4: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 16/76] fs/befs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 17/76] fs/bfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 18/76] fs/btrfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 19/76] fs/ceph: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 20/76] fs/cifs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 21/76] fs/coda: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 22/76] fs/configfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 23/76] fs/cramfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 24/76] fs/crypto: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 25/76] fs/ecryptfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 26/76] fs/efivarfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 27/76] fs/efs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 28/76] fs/exofs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 29/76] fs/exportfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 30/76] fs/ext2: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 31/76] fs/ext4: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 32/76] fs/f2fs: " Mark Fasheh
2018-05-10 10:10   ` Chao Yu
2018-05-08 18:03 ` [PATCH 33/76] fs/fat: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 34/76] fs/freevxfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 35/76] fs/fuse: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 36/76] fs/gfs2: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 37/76] fs/hfs: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 38/76] fs/hfsplus: " Mark Fasheh
2018-05-08 18:03 ` [PATCH 39/76] fs/hostfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 40/76] fs/hpfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 41/76] fs/hugetlbfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 42/76] fs/isofs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 43/76] fs/jbd2: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 44/76] fs/jffs2: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 45/76] fs/jfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 46/76] fs/kernfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 47/76] fs/lockd: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 48/76] fs/minix: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 49/76] fs/nfsd: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 50/76] fs/nfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 51/76] fs/nilfs2: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 52/76] fs/notify: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 53/76] fs/ntfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 54/76] fs/ocfs2: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 55/76] fs/omfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 56/76] fs/openpromfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 57/76] fs/orangefs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 58/76] fs/overlayfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 59/76] fs/proc: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 60/76] fs/qnx4: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 61/76] fs/qnx6: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 62/76] fs/quota: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 63/76] fs/ramfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 64/76] fs/read: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 65/76] fs/reiserfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 66/76] fs/romfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 67/76] fs/squashfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 68/76] fs/sysv: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 69/76] fs/ubifs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 70/76] fs/udf: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 71/76] fs/ufs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 72/76] fs/xfs: " Mark Fasheh
2018-05-08 18:04 ` [PATCH 73/76] vfs: Move s_dev to to struct fs_view Mark Fasheh
2018-05-08 18:04 ` [PATCH 74/76] fs: Use fs_view device from struct inode Mark Fasheh
2018-05-08 18:04 ` [PATCH 75/76] fs: Use fs view device from struct super_block Mark Fasheh
2018-05-08 18:04 ` [PATCH 76/76] btrfs: Use fs_view in roots, point inodes to it Mark Fasheh
2018-05-08 23:38 ` [RFC][PATCH 0/76] vfs: 'views' for filesystems with more than one root Dave Chinner
2018-05-09  2:06   ` Jeff Mahoney
2018-05-09  6:41     ` Dave Chinner
2018-06-05 20:17       ` Jeff Mahoney
2018-06-06  9:49         ` Amir Goldstein
2018-06-06 20:42           ` Mark Fasheh
2018-06-07  6:06             ` Amir Goldstein
2018-06-07 20:44               ` Mark Fasheh
2018-06-06 21:19           ` Jeff Mahoney
2018-06-07  6:17             ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180508180436.716-1-mfasheh@suse.de \
    --to=mfasheh@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [RFC][PATCH 0/76] vfs: '\''views'\'' for filesystems with more than one root' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).