linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3 0/2] vfs / btrfs: add support for ustat()
@ 2014-08-15  2:58 Luis R. Rodriguez
  2014-08-15  2:58 ` [RFC v3 1/2] fs/super.c: add new super block sub devices super_block_dev Luis R. Rodriguez
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Luis R. Rodriguez @ 2014-08-15  2:58 UTC (permalink / raw)
  To: viro, clm, jbacik, hch
  Cc: linux-fsdevel, linux-btrfs, linux-kernel, jeffm, fdmanana,
	Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This v3 has this small fix identified by Filipe Manana on the
btrfs specific patch. The v2 series was briefly discussed but
upon providing a use case and reasoning for the way things
were changed I haven't gotten any more further advice or
feedback.

Christoph had noted that this seemed associated to the problem
that the btrfs uses different assignments for st_dev than s_dev,
but much as I'd like to see that changed based on discussions so
far its unclear if this is going to be possible unless strong
commitment is reached. What this tries to do was to take the
other way around the problem, by slowly shifting out junk. I
think this approach might be more feasible over time. I see this
as an extension to Al's original commit 0ee5dc676 but more in line
with how they are really are used and exposes more information to
the VFS. As it stands now other filesystems can pop up and do
similar things, this at least extends the original API to fit
the use case a bit more closely to how its used and allows more
room to grow.

Let's consider this userspace case:

        struct stat buf;
        struct ustat ubuf;

        /* Find a valid device number */
        if (stat("/", &buf)) {
                fprintf(stderr, "Stat failed: %s\n", strerror(errno));
                return 1;
        }

        /* Call ustat on it */
        if (ustat(buf.st_dev, &ubuf)) {
                fprintf(stderr, "Ustat failed: %s\n", strerror(errno));
                return 1;
        }

In the btrfs case it has an inode op for getattr, that is used and we set
the dev to anonymous dev_t. Later ustat will use user_get_super() which
will only be able to work with a userblock if the super block's only
dev_t is assigned to it. Since we have many anonymous to dev_t mapping
to super block though we can't complete the search for btfs and ustat()
fails with -EINVAL. The series expands the number of dev_t's that a super
block can have and allows this search to complete.

Luis R. Rodriguez (2):
  fs/super.c: add new super block sub devices super_block_dev
  btrfs: use the new VFS super_block_dev

 fs/btrfs/ctree.h   |  7 ++----
 fs/btrfs/disk-io.c |  7 +++---
 fs/btrfs/inode.c   |  2 +-
 fs/super.c         | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/fs.h | 10 ++++++++
 5 files changed, 82 insertions(+), 12 deletions(-)

-- 
2.0.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC v3 1/2] fs/super.c: add new super block sub devices super_block_dev
  2014-08-15  2:58 [RFC v3 0/2] vfs / btrfs: add support for ustat() Luis R. Rodriguez
@ 2014-08-15  2:58 ` Luis R. Rodriguez
  2014-08-15  2:58 ` [RFC v3 2/2] btrfs: use the new VFS super_block_dev Luis R. Rodriguez
  2014-08-15  9:29 ` [RFC v3 0/2] vfs / btrfs: add support for ustat() Al Viro
  2 siblings, 0 replies; 10+ messages in thread
From: Luis R. Rodriguez @ 2014-08-15  2:58 UTC (permalink / raw)
  To: viro, clm, jbacik, hch
  Cc: linux-fsdevel, linux-btrfs, linux-kernel, jeffm, fdmanana,
	Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Modern filesystems are using the get_anon_bdev() for internal
notions of volumes, snapshots for a single super block but never
exposing them directly to the VFS layer. While this works its
leaves the VFS layer growing dumb over what filesystems are doing.
This creates a new super block subdevice which we can use to start
stuffing in information about the underlying bdev's and its
associated super block to start off with. This at least now lets
us implement proper support for ustat() once filesystems are
modified to use this data structure and respective helpers.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 fs/super.c         | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/fs.h | 10 ++++++++
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index d20d5b1..d871892 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -133,6 +133,68 @@ static unsigned long super_cache_count(struct shrinker *shrink,
 	return total_objects;
 }
 
+static bool super_dev_match(struct super_block *sb, dev_t dev)
+{
+	struct super_block_dev *sbdev;
+
+	if (sb->s_dev == dev)
+		return true;
+
+	if (list_empty(&sb->s_sbdevs))
+		return false;
+
+	list_for_each_entry(sbdev, &sb->s_sbdevs, entry)
+		if (sbdev->anon_dev ==  dev)
+			return true;
+
+	return false;
+}
+
+int insert_anon_sbdev(struct super_block *sb, struct super_block_dev *sbdev)
+{
+	int ret;
+
+	ret = get_anon_bdev(&sbdev->anon_dev);
+	if (ret)
+		return ret;
+
+	sbdev->sb = sb;
+
+	spin_lock(&sb_lock);
+	list_add_tail(&sbdev->entry, &sb->s_sbdevs);
+	spin_unlock(&sb_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(insert_anon_sbdev);
+
+void remove_anon_sbdev(struct super_block_dev *sbdev)
+{
+	struct super_block *sb;
+	struct super_block_dev *sbdev_i, *tmp;
+
+	if (!sbdev)
+		return;
+
+	sb = sbdev->sb;
+
+	spin_lock(&sb_lock);
+
+	WARN_ON(list_empty(&sb->s_sbdevs));
+
+	list_for_each_entry_safe(sbdev_i, tmp, &sb->s_sbdevs, entry) {
+		if (sbdev == sbdev_i) {
+			list_del_init(&sbdev_i->entry);
+			break;
+		}
+	}
+
+	spin_unlock(&sb_lock);
+
+	free_anon_bdev(sbdev->anon_dev);
+}
+EXPORT_SYMBOL_GPL(remove_anon_sbdev);
+
 /**
  *	destroy_super	-	frees a superblock
  *	@s: superblock to free
@@ -148,6 +210,7 @@ static void destroy_super(struct super_block *s)
 		percpu_counter_destroy(&s->s_writers.counter[i]);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
+	WARN_ON(!list_empty(&s->s_sbdevs));
 	kfree(s->s_subtype);
 	kfree(s->s_options);
 	kfree_rcu(s, rcu);
@@ -188,6 +251,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	INIT_LIST_HEAD(&s->s_inodes);
+	INIT_LIST_HEAD(&s->s_sbdevs);
 
 	if (list_lru_init(&s->s_dentry_lru))
 		goto fail;
@@ -652,7 +716,7 @@ restart:
 	spin_unlock(&sb_lock);
 	return NULL;
 }
- 
+
 struct super_block *user_get_super(dev_t dev)
 {
 	struct super_block *sb;
@@ -662,7 +726,7 @@ rescan:
 	list_for_each_entry(sb, &super_blocks, s_list) {
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
-		if (sb->s_dev ==  dev) {
+		if (super_dev_match(sb, dev)) {
 			sb->s_count++;
 			spin_unlock(&sb_lock);
 			down_read(&sb->s_umount);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f0890e4..c9152ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1197,6 +1197,13 @@ struct sb_writers {
 #endif
 };
 
+/* we can expand this to help the VFS layer with modern filesystems */
+struct super_block_dev {
+	struct super_block	*sb;
+	struct list_head	entry;		/* For struct sb->s_sbdevs */
+	dev_t			anon_dev;
+};
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1221,6 +1228,7 @@ struct super_block {
 
 	struct list_head	s_inodes;	/* all inodes */
 	struct hlist_bl_head	s_anon;		/* anonymous dentries for (nfs) exporting */
+	struct list_head	s_sbdevs;	/* internal fs dev_t */
 	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;
 	struct backing_dev_info *s_bdi;
@@ -1821,6 +1829,8 @@ void deactivate_locked_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int get_anon_bdev(dev_t *);
 void free_anon_bdev(dev_t);
+int insert_anon_sbdev(struct super_block *sb, struct super_block_dev *sbdev);
+void remove_anon_sbdev(struct super_block_dev *sbdev);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC v3 2/2] btrfs: use the new VFS super_block_dev
  2014-08-15  2:58 [RFC v3 0/2] vfs / btrfs: add support for ustat() Luis R. Rodriguez
  2014-08-15  2:58 ` [RFC v3 1/2] fs/super.c: add new super block sub devices super_block_dev Luis R. Rodriguez
@ 2014-08-15  2:58 ` Luis R. Rodriguez
  2014-08-15  9:29 ` [RFC v3 0/2] vfs / btrfs: add support for ustat() Al Viro
  2 siblings, 0 replies; 10+ messages in thread
From: Luis R. Rodriguez @ 2014-08-15  2:58 UTC (permalink / raw)
  To: viro, clm, jbacik, hch
  Cc: linux-fsdevel, linux-btrfs, linux-kernel, jeffm, fdmanana,
	Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Use the new VFS layer struct super_block_dev instead of carrying
the anonymous bdev's on our own. This makes the VFS layer aware of
all of our anonymous dev's on the super block.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
fdmanana: fix for running qgroup sanity tests
---
 fs/btrfs/ctree.h   | 7 ++-----
 fs/btrfs/disk-io.c | 7 +++----
 fs/btrfs/inode.c   | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index be91397..0ece396 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1846,11 +1846,8 @@ struct btrfs_root {
 	 * protected by inode_lock
 	 */
 	struct radix_tree_root delayed_nodes_tree;
-	/*
-	 * right now this just gets used so that a root has its own devid
-	 * for stat.  It may be used for more later
-	 */
-	dev_t anon_dev;
+
+	struct super_block_dev sbdev;
 
 	spinlock_t root_item_lock;
 	atomic_t refs;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08e65e9..7c65307 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1270,7 +1270,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize,
 		root->defrag_trans_start = 0;
 	init_completion(&root->kobj_unregister);
 	root->root_key.objectid = objectid;
-	root->anon_dev = 0;
 
 	spin_lock_init(&root->root_item_lock);
 }
@@ -1573,7 +1572,7 @@ int btrfs_init_fs_root(struct btrfs_root *root)
 	spin_lock_init(&root->cache_lock);
 	init_waitqueue_head(&root->cache_wait);
 
-	ret = get_anon_bdev(&root->anon_dev);
+	ret = insert_anon_sbdev(root->fs_info->sb, &root->sbdev);
 	if (ret)
 		goto free_writers;
 	return 0;
@@ -3532,8 +3531,8 @@ static void free_fs_root(struct btrfs_root *root)
 	WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
 	btrfs_free_block_rsv(root, root->orphan_block_rsv);
 	root->orphan_block_rsv = NULL;
-	if (root->anon_dev)
-		free_anon_bdev(root->anon_dev);
+	if (likely(!test_bit(BTRFS_ROOT_DUMMY_ROOT, &root->state)))
+		remove_anon_sbdev(&root->sbdev);
 	if (root->subv_writers)
 		btrfs_free_subvolume_writers(root->subv_writers);
 	free_extent_buffer(root->node);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..0e8f604 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8277,7 +8277,7 @@ static int btrfs_getattr(struct vfsmount *mnt,
 	u32 blocksize = inode->i_sb->s_blocksize;
 
 	generic_fillattr(inode, stat);
-	stat->dev = BTRFS_I(inode)->root->anon_dev;
+	stat->dev = BTRFS_I(inode)->root->sbdev.anon_dev;
 	stat->blksize = PAGE_CACHE_SIZE;
 
 	spin_lock(&BTRFS_I(inode)->lock);
-- 
2.0.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2014-08-15  2:58 [RFC v3 0/2] vfs / btrfs: add support for ustat() Luis R. Rodriguez
  2014-08-15  2:58 ` [RFC v3 1/2] fs/super.c: add new super block sub devices super_block_dev Luis R. Rodriguez
  2014-08-15  2:58 ` [RFC v3 2/2] btrfs: use the new VFS super_block_dev Luis R. Rodriguez
@ 2014-08-15  9:29 ` Al Viro
  2014-08-17 23:41   ` Luis R. Rodriguez
  2017-08-23 22:31   ` Jeff Mahoney
  2 siblings, 2 replies; 10+ messages in thread
From: Al Viro @ 2014-08-15  9:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: clm, jbacik, hch, linux-fsdevel, linux-btrfs, linux-kernel,
	jeffm, fdmanana, Luis R. Rodriguez

On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:

> Christoph had noted that this seemed associated to the problem
> that the btrfs uses different assignments for st_dev than s_dev,
> but much as I'd like to see that changed based on discussions so
> far its unclear if this is going to be possible unless strong
> commitment is reached.

Explain, please.  Whose commitment and commitment to what, exactly?
Having different ->st_dev values for different files on the same
fs is a bloody bad idea; why does btrfs do that at all?  If nothing else,
it breaks the usual "are those two files on the same fs?" tests...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2014-08-15  9:29 ` [RFC v3 0/2] vfs / btrfs: add support for ustat() Al Viro
@ 2014-08-17 23:41   ` Luis R. Rodriguez
  2017-08-23 22:31   ` Jeff Mahoney
  1 sibling, 0 replies; 10+ messages in thread
From: Luis R. Rodriguez @ 2014-08-17 23:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Luis R. Rodriguez, clm, jbacik, hch, linux-fsdevel, linux-btrfs,
	linux-kernel, jeffm, fdmanana

On Fri, Aug 15, 2014 at 10:29:50AM +0100, Al Viro wrote:
> On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:
> 
> > Christoph had noted that this seemed associated to the problem
> > that the btrfs uses different assignments for st_dev than s_dev,
> > but much as I'd like to see that changed based on discussions so
> > far its unclear if this is going to be possible unless strong
> > commitment is reached.
> 
> Explain, please.  Whose commitment and commitment to what, exactly?

There are two folks, one is the btrfs developers, and the others are
the VFS maintainers to provide proper guidance.

> Having different ->st_dev values for different files on the same
> fs is a bloody bad idea; why does btrfs do that at all?

With the disclosure of stating that I'm new to btrfs as I see its been
done to help cope with the copy on write mechanism, but I welcome btrfs
folks to chime in if there other reasons this was done from an
architectural point of view.

Provided all reasons why this was done are clarified what we'd need
then is proper guidance on what *would* be a much more reasonable
strategy to do what was desired, and finally commitmen from btrfs
folks to change btrfs to switch to this new agreed upon strategy.

> If nothing else,
> it breaks the usual "are those two files on the same fs?" tests...

It would seem that those tests need more context now with copy
on write, even the notion of disk space is all fucked up now, we
need to think of it in terms of different possibilities that the
new filesystems allow us to share data and different outcomes that
could be possible.

  Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2014-08-15  9:29 ` [RFC v3 0/2] vfs / btrfs: add support for ustat() Al Viro
  2014-08-17 23:41   ` Luis R. Rodriguez
@ 2017-08-23 22:31   ` Jeff Mahoney
  2021-04-15 17:53     ` Luis Chamberlain
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Mahoney @ 2017-08-23 22:31 UTC (permalink / raw)
  To: Al Viro, Luis R. Rodriguez
  Cc: clm, jbacik, hch, linux-fsdevel, linux-btrfs, linux-kernel,
	fdmanana, Luis R. Rodriguez


[-- Attachment #1.1: Type: text/plain, Size: 2835 bytes --]

On 8/15/14 5:29 AM, Al Viro wrote:
> On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:
> 
>> Christoph had noted that this seemed associated to the problem
>> that the btrfs uses different assignments for st_dev than s_dev,
>> but much as I'd like to see that changed based on discussions so
>> far its unclear if this is going to be possible unless strong
>> commitment is reached.

Resurrecting a dead thread since we've been carrying this patch anyway
since then.

> Explain, please.  Whose commitment and commitment to what, exactly?
> Having different ->st_dev values for different files on the same
> fs is a bloody bad idea; why does btrfs do that at all?  If nothing else,
> it breaks the usual "are those two files on the same fs?" tests...

It's because btrfs snapshots would have inode number collisions.
Changing the inode numbers for snapshots would negate a big benefit of
btrfs snapshots: the quick creation and lightweight on-disk
representation due to metadata sharing.

The thing is that ustat() used to work.  Your commit 0ee5dc676a5f8
(btrfs: kill magical embedded struct superblock) had a regression:
Since it replaced the superblock with a simple dev_t, it rendered the
device no longer discoverable by user_get_super.  We need a list_head to
attach for searching.

There's an argument that this is hacky.  It's valid.  The only other
feedback I've heard is to use a real superblock for subvolumes to do
this instead.  That doesn't work either, due to things like freeze/thaw
and inode writeback.  Ultimately, what we need is a single file system
with multiple namespaces.  Years ago we just needed different inode
namespaces, but as people have started adopting btrfs for containers, we
need more than that.  I've heard requests for per-subvolume security
contexts.  I'd imagine user namespaces are on someone's wish list.  A
working df can be done with ->d_automount, but the way btrfs handles
having a "canonical" subvolume location has always been a way to avoid
directory loops.  I'd like to just automount subvolumes everywhere
they're referenced.  One solution, for which I have no code yet, is to
have something like a superblock-light that we can hang things like a
security context, a user namespace, and an anonymous dev.  Most file
systems would have just one.  Btrfs would have one per subvolume.

That's a big project with a bunch of discussion.  So for now, I'd like
to move this patch forward while we (I) work on the bigger issue.

BTW, in this same thread, Christoph said:> Again, NAK.  Make btrfs
report the proper anon dev_t in stat and
> everything will just work.

We do.  We did then too.  But what doesn't work is a user doing stat()
and then using the dev_t to call ustat().

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2017-08-23 22:31   ` Jeff Mahoney
@ 2021-04-15 17:53     ` Luis Chamberlain
  2021-04-15 18:17       ` Josef Bacik
  0 siblings, 1 reply; 10+ messages in thread
From: Luis Chamberlain @ 2021-04-15 17:53 UTC (permalink / raw)
  To: Filipe Manana, David Sterba
  Cc: Al Viro, Chris Mason, Josef Bacik, Christoph Hellwig,
	Linux FS Devel, Btrfs BTRFS, linux-kernel, Jeff Mahoney,
	Luis Chamberlain

On Wed, Aug 23, 2017 at 3:31 PM Jeff Mahoney <jeffm@suse.com> wrote:
>
> On 8/15/14 5:29 AM, Al Viro wrote:
> > On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:
> >
> >> Christoph had noted that this seemed associated to the problem
> >> that the btrfs uses different assignments for st_dev than s_dev,
> >> but much as I'd like to see that changed based on discussions so
> >> far its unclear if this is going to be possible unless strong
> >> commitment is reached.
>
> Resurrecting a dead thread since we've been carrying this patch anyway
> since then.
>
> > Explain, please.  Whose commitment and commitment to what, exactly?
> > Having different ->st_dev values for different files on the same
> > fs is a bloody bad idea; why does btrfs do that at all?  If nothing else,
> > it breaks the usual "are those two files on the same fs?" tests...
>
> It's because btrfs snapshots would have inode number collisions.
> Changing the inode numbers for snapshots would negate a big benefit of
> btrfs snapshots: the quick creation and lightweight on-disk
> representation due to metadata sharing.
>
> The thing is that ustat() used to work.  Your commit 0ee5dc676a5f8
> (btrfs: kill magical embedded struct superblock) had a regression:
> Since it replaced the superblock with a simple dev_t, it rendered the
> device no longer discoverable by user_get_super.  We need a list_head to
> attach for searching.
>
> There's an argument that this is hacky.  It's valid.  The only other
> feedback I've heard is to use a real superblock for subvolumes to do
> this instead.  That doesn't work either, due to things like freeze/thaw
> and inode writeback.  Ultimately, what we need is a single file system
> with multiple namespaces.  Years ago we just needed different inode
> namespaces, but as people have started adopting btrfs for containers, we
> need more than that.  I've heard requests for per-subvolume security
> contexts.  I'd imagine user namespaces are on someone's wish list.  A
> working df can be done with ->d_automount, but the way btrfs handles
> having a "canonical" subvolume location has always been a way to avoid
> directory loops.  I'd like to just automount subvolumes everywhere
> they're referenced.  One solution, for which I have no code yet, is to
> have something like a superblock-light that we can hang things like a
> security context, a user namespace, and an anonymous dev.  Most file
> systems would have just one.  Btrfs would have one per subvolume.
>
> That's a big project with a bunch of discussion.

4 years have gone by and this patch is still being carried around for
btrfs. Other than resolving this ustat() issue for btrfs are there new
reasons to support this effort done to be done properly? Are there
other filesystems that would benefit? I'd like to get an idea of the
stakeholder here before considering taking this on or not.

 Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2021-04-15 17:53     ` Luis Chamberlain
@ 2021-04-15 18:17       ` Josef Bacik
  2021-04-15 18:29         ` Luis Chamberlain
  0 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2021-04-15 18:17 UTC (permalink / raw)
  To: Luis Chamberlain, Filipe Manana, David Sterba
  Cc: Al Viro, Chris Mason, Josef Bacik, Christoph Hellwig,
	Linux FS Devel, Btrfs BTRFS, linux-kernel, Jeff Mahoney

On 4/15/21 1:53 PM, Luis Chamberlain wrote:
> On Wed, Aug 23, 2017 at 3:31 PM Jeff Mahoney <jeffm@suse.com> wrote:
>>
>> On 8/15/14 5:29 AM, Al Viro wrote:
>>> On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:
>>>
>>>> Christoph had noted that this seemed associated to the problem
>>>> that the btrfs uses different assignments for st_dev than s_dev,
>>>> but much as I'd like to see that changed based on discussions so
>>>> far its unclear if this is going to be possible unless strong
>>>> commitment is reached.
>>
>> Resurrecting a dead thread since we've been carrying this patch anyway
>> since then.
>>
>>> Explain, please.  Whose commitment and commitment to what, exactly?
>>> Having different ->st_dev values for different files on the same
>>> fs is a bloody bad idea; why does btrfs do that at all?  If nothing else,
>>> it breaks the usual "are those two files on the same fs?" tests...
>>
>> It's because btrfs snapshots would have inode number collisions.
>> Changing the inode numbers for snapshots would negate a big benefit of
>> btrfs snapshots: the quick creation and lightweight on-disk
>> representation due to metadata sharing.
>>
>> The thing is that ustat() used to work.  Your commit 0ee5dc676a5f8
>> (btrfs: kill magical embedded struct superblock) had a regression:
>> Since it replaced the superblock with a simple dev_t, it rendered the
>> device no longer discoverable by user_get_super.  We need a list_head to
>> attach for searching.
>>
>> There's an argument that this is hacky.  It's valid.  The only other
>> feedback I've heard is to use a real superblock for subvolumes to do
>> this instead.  That doesn't work either, due to things like freeze/thaw
>> and inode writeback.  Ultimately, what we need is a single file system
>> with multiple namespaces.  Years ago we just needed different inode
>> namespaces, but as people have started adopting btrfs for containers, we
>> need more than that.  I've heard requests for per-subvolume security
>> contexts.  I'd imagine user namespaces are on someone's wish list.  A
>> working df can be done with ->d_automount, but the way btrfs handles
>> having a "canonical" subvolume location has always been a way to avoid
>> directory loops.  I'd like to just automount subvolumes everywhere
>> they're referenced.  One solution, for which I have no code yet, is to
>> have something like a superblock-light that we can hang things like a
>> security context, a user namespace, and an anonymous dev.  Most file
>> systems would have just one.  Btrfs would have one per subvolume.
>>
>> That's a big project with a bunch of discussion.
> 
> 4 years have gone by and this patch is still being carried around for
> btrfs. Other than resolving this ustat() issue for btrfs are there new
> reasons to support this effort done to be done properly? Are there
> other filesystems that would benefit? I'd like to get an idea of the
> stakeholder here before considering taking this on or not.
> 

Not really sure why this needs to be addressed, we have statfs(), and what we 
have has worked forever now.  There's a lot of larger things that need to be 
addressed in general to support the volume approach inside file systems that is 
going to require a lot of work inside of VFS.  If you feel like tackling that 
work and then wiring up btrfs by all means have at it, but I'm not seeing a 
urgent need to address this.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2021-04-15 18:17       ` Josef Bacik
@ 2021-04-15 18:29         ` Luis Chamberlain
  2021-04-16 17:20           ` Neal Gompa
  0 siblings, 1 reply; 10+ messages in thread
From: Luis Chamberlain @ 2021-04-15 18:29 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Filipe Manana, David Sterba, Al Viro, Chris Mason, Josef Bacik,
	Christoph Hellwig, Linux FS Devel, Btrfs BTRFS, linux-kernel,
	Jeff Mahoney

On Thu, Apr 15, 2021 at 02:17:58PM -0400, Josef Bacik wrote:
> There's a lot of larger things that need to
> be addressed in general to support the volume approach inside file systems
> that is going to require a lot of work inside of VFS.  If you feel like
> tackling that work and then wiring up btrfs by all means have at it, but I'm
> not seeing a urgent need to address this.  Thanks,

That's precisely what I what I want to hear me about. Things like this.
Would btrfs be the ony user of volumes inside filesystem? Jeff had
mentioned before this could also allow namespaces per volumes, and this
might be a desirable feature.

What else?

 Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
  2021-04-15 18:29         ` Luis Chamberlain
@ 2021-04-16 17:20           ` Neal Gompa
  0 siblings, 0 replies; 10+ messages in thread
From: Neal Gompa @ 2021-04-16 17:20 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Josef Bacik, Filipe Manana, David Sterba, Al Viro, Chris Mason,
	Josef Bacik, Christoph Hellwig, Linux FS Devel, Btrfs BTRFS,
	linux-kernel, Jeff Mahoney

On Thu, Apr 15, 2021 at 2:29 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Thu, Apr 15, 2021 at 02:17:58PM -0400, Josef Bacik wrote:
> > There's a lot of larger things that need to
> > be addressed in general to support the volume approach inside file systems
> > that is going to require a lot of work inside of VFS.  If you feel like
> > tackling that work and then wiring up btrfs by all means have at it, but I'm
> > not seeing a urgent need to address this.  Thanks,
>
> That's precisely what I what I want to hear me about. Things like this.
> Would btrfs be the ony user of volumes inside filesystem? Jeff had
> mentioned before this could also allow namespaces per volumes, and this
> might be a desirable feature.
>
> What else?

Wouldn't this be useful for union filesystems like OverlayFS? Or other
filesystems that support nested filesystems like bcachefs?



-- 
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-16 17:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-15  2:58 [RFC v3 0/2] vfs / btrfs: add support for ustat() Luis R. Rodriguez
2014-08-15  2:58 ` [RFC v3 1/2] fs/super.c: add new super block sub devices super_block_dev Luis R. Rodriguez
2014-08-15  2:58 ` [RFC v3 2/2] btrfs: use the new VFS super_block_dev Luis R. Rodriguez
2014-08-15  9:29 ` [RFC v3 0/2] vfs / btrfs: add support for ustat() Al Viro
2014-08-17 23:41   ` Luis R. Rodriguez
2017-08-23 22:31   ` Jeff Mahoney
2021-04-15 17:53     ` Luis Chamberlain
2021-04-15 18:17       ` Josef Bacik
2021-04-15 18:29         ` Luis Chamberlain
2021-04-16 17:20           ` Neal Gompa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).