All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "hsiangkao@linux.alibaba.com" <hsiangkao@linux.alibaba.com>,
	"tytso@mit.edu" <tytso@mit.edu>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"joseph.qi@linux.alibaba.com" <joseph.qi@linux.alibaba.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>
Subject: Re: [PATCH] nfs: set block size according to pnfs_blksize first
Date: Wed, 16 Jun 2021 18:51:38 +0000	[thread overview]
Message-ID: <f35d25eb088df4d541672d39c1c4dbb3bf2ee749.camel@hammerspace.com> (raw)
In-Reply-To: <YMo6CKAaNcZlqzNC@B-P7TQMD6M-0146.local>

On Thu, 2021-06-17 at 01:51 +0800, Gao Xiang wrote:
> Hi Ted,
> 
> On Wed, Jun 16, 2021 at 12:14:44PM -0400, Theodore Ts'o wrote:
> > On Wed, Jun 16, 2021 at 10:41:34PM +0800, Gao Xiang wrote:
> > > > > Yet my question is how to deal with generic/486, should we
> > > > > just skip
> > > > > the case directly? I cannot find some proper way to get
> > > > > underlayfs
> > > > > block size or real xattr value limit.
> > 
> > Note that the block size does not necessarily have thing to do with
> > the xattr value limit.  For example, in ext4, if you create the
> > file
> > system with the ea_inode feature enabled, you can create extended
> > attributes up to the maximum value of 64k defined by the xattr
> > interface --- unless, of course, there isn't enough space in the
> > file
> > system.
> > 
> > (The ea_inode feature will also use shared space for large xattrs,
> > so
> > if you are storing hundreds of thousands of files that all have an
> > identical 20 kilbyte Windows security id or ACL, ea_inode is going
> > to
> > be much more efficient way of supporting that particular use case.)
> 
> Thanks for your detailed explanation too.
> 
> Yeah, yet it's not enabled in the issue setup I was assigned :(
> 
> > 
> > > > As noted above, the NFS server should really be returning
> > > > NFS4ERR_XATTR2BIG in this case, which the client, again, should
> > > > be
> > > > transforming into -E2BIG. Where does ENOSPC come from?
> > > 
> > > Thanks for the detailed explanation...
> > > 
> > > I think that is due to ext4 returning ENOSPC since I tested
> > 
> > It's not just ext4.  From inspection, under some circumstances f2fs
> > and btrfs can return ENOSPC.
> 
> I did some test on ext4 only earlier since it's our test environment.
> I didn't mean the ENOSPC behavior was ext4 only ( :-) very sorry
> about
> that if some misunderstanding here )
> 
> > 
> > The distinction is that E2BIG is (from the attr_set man page):
> > 
> >        [E2BIG]          The value of the given attribute is too
> > large,
> >                         it exceeds the maximum allowable size of an
> >                         attribute value.
> > 
> > The maximum allowable size (enforced by the VFS) is XATTR_MAX_SIZE,
> > which is 65536 bytes.  Some file systems may impose a smaller max
> > allowable size.
> > 
> > In contrast ENOSPC means something different:
> > 
> >        ENOSPC           No space left on device.
> > 
> > This isn't necessarily just block space, BTW; it might some other
> > file
> > system space --- thre might not be a free inode in the case of
> > ext4's
> > ea_inode feature.  Or it be the f2fs file system not being able to
> > allocate a node id via f2fs_alloc_nid().
> > 
> > Note that generic/486 is testing a very specific case, which is
> > replacing a small xattr (16 bytes) with an xattr with a large
> > value.
> > This is would be a really interesting test for ext4 ea_inode, when
> > we
> > are replacing an xattr stored inline in the inode, or in a single
> > 4k
> > block, with an xattr stored in a separate inode.  But not the way
> > src/attr_replace_test.c (which does all of the heavy lifting for
> > the
> > generic/486 test) is currently written.
> > 
> 
> Yeah, as for the original case, it tried to test when it turned into
> the XFS extents format, but I'm not sure if it's just the regression
> test for such report only (similiar to ext4 inline xattr to non-
> inline
> xattr case.) rather than test the XFS_DINODE_FMT_BTREE case or
> ea_inode
> case...
> 
> > So what I would suggest is to have attr_replace_test.c try to
> > determine the maximum attr value size using binary search.  Start
> > with
> > min=16, and max=65536, and try creating an xattr of a particular
> > size,
> > and then delete the xattr, and then retry creating the xattr with
> > the
> > next binary search size.  This will allow you to create a function
> > which determines the maximum size attr for a particular file
> > system,
> > especially in those cases where it is dependent on how the file
> > system
> > is configured.
> 
> Considering the original XFS regression report [1], I think
> underlayfs blksize may still be needed. And binary search to get the
> maximum attr value may be another new case for this as well. Or
> alternatively just add by block size to do a trip test.
> 
> Although I have no idea if we can just skip the case when testing
> with
> NFS. If getting underlayfs blksize is unfeasible, I think we might
> skip such case for now since nfs blksize is not useful for
> generic/486.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=199119

I'm still not seeing why we care about block sizes, and I'm certainly
not seeing why we should care about them in the NFS case.

xattrs are a form of key-value storage with atomic semantics (i.e. when
you attempt to get/set/update/remove them, then the operation either
succeeds or it fails without any side-effects).
There is no interface for doing any form of block I/O to an xattr. 
There is no requirement in the documentation that the user know
anything about block size in order to use them.

In other words, if this xfstest 'generic/486' is depending on knowledge
of the block size, then it is fundamentally a filesystem
implementation-specific test. It doesn't belong in the 'generic' test
category, because it is not testing anything that is generic to xattrs.

> 
> Thanks,
> Gao Xiang
> 
> > 
> > > should we transform it to E2BIG instead (at least in NFS
> > > protocol)? but I'm still not sure that E2BIG is a valid return
> > > code for
> > > setxattr()...
> > 
> > E2BIG is defined in the attr_set(3) man page.  ENOSPC isn't
> > mentioned
> > in the attr_set man page, but given that multiple file systems
> > return
> > ENOSPC, and ENOSPC has a well-defined meaning in POSIX.1 which very
> > much makes sense when creating extended attributes, we should fix
> > that
> > by adding ENOSPC to the attr_set(3) man page (which is shipped as
> > part
> > of the libattr library sources).
> > 
> > Cheers,
> > 
> >                                                 - Ted

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2021-06-16 18:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 12:44 Gao Xiang
2021-06-16 13:47 ` Trond Myklebust
2021-06-16 14:06   ` Gao Xiang
2021-06-16 14:20     ` Trond Myklebust
2021-06-16 14:41       ` Gao Xiang
2021-06-16 15:14         ` Trond Myklebust
2021-06-16 17:08           ` Theodore Ts'o
2021-06-16 17:17           ` Frank van der Linden
2021-06-16 22:51             ` Theodore Ts'o
2021-06-16 16:14         ` Theodore Ts'o
2021-06-16 17:51           ` Gao Xiang
2021-06-16 18:51             ` Trond Myklebust [this message]
2021-06-16 22:55             ` Theodore Ts'o
2021-06-17  2:39               ` Gao Xiang
2021-06-17 13:08                 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f35d25eb088df4d541672d39c1c4dbb3bf2ee749.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=anna.schumaker@netapp.com \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --subject='Re: [PATCH] nfs: set block size according to pnfs_blksize first' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.