All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	ceph-devel@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	fstests <fstests@vger.kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [PATCH] ceph: don't return -ESTALE if there's still an open file
Date: Fri, 15 May 2020 12:15:48 +0100	[thread overview]
Message-ID: <20200515111548.GA54598@suse.com> (raw)
In-Reply-To: <CAOQ4uxhireZBRvcPQzTS8yOoO4gQt78M0ktZo-9yQ-zcaLZbow@mail.gmail.com>

On Fri, May 15, 2020 at 09:42:24AM +0300, Amir Goldstein wrote:
> +CC: fstests
> 
> On Thu, May 14, 2020 at 4:15 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Thu, 2020-05-14 at 13:48 +0100, Luis Henriques wrote:
> > > On Thu, May 14, 2020 at 08:10:09AM -0400, Jeff Layton wrote:
> > > > On Thu, 2020-05-14 at 12:14 +0100, Luis Henriques wrote:
> > > > > Similarly to commit 03f219041fdb ("ceph: check i_nlink while converting
> > > > > a file handle to dentry"), this fixes another corner case with
> > > > > name_to_handle_at/open_by_handle_at.  The issue has been detected by
> > > > > xfstest generic/467, when doing:
> > > > >
> > > > >  - name_to_handle_at("/cephfs/myfile")
> > > > >  - open("/cephfs/myfile")
> > > > >  - unlink("/cephfs/myfile")
> > > > >  - open_by_handle_at()
> > > > >
> > > > > The call to open_by_handle_at should not fail because the file still
> > > > > exists and we do have a valid handle to it.
> > > > >
> > > > > Signed-off-by: Luis Henriques <lhenriques@suse.com>
> > > > > ---
> > > > >  fs/ceph/export.c | 13 +++++++++++--
> > > > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/fs/ceph/export.c b/fs/ceph/export.c
> > > > > index 79dc06881e78..8556df9d94d0 100644
> > > > > --- a/fs/ceph/export.c
> > > > > +++ b/fs/ceph/export.c
> > > > > @@ -171,12 +171,21 @@ struct inode *ceph_lookup_inode(struct super_block *sb, u64 ino)
> > > > >
> > > > >  static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
> > > > >  {
> > > > > + struct ceph_inode_info *ci;
> > > > >   struct inode *inode = __lookup_inode(sb, ino);
> > > > > +
> > > > >   if (IS_ERR(inode))
> > > > >           return ERR_CAST(inode);
> > > > >   if (inode->i_nlink == 0) {
> > > > > -         iput(inode);
> > > > > -         return ERR_PTR(-ESTALE);
> > > > > +         bool is_open;
> > > > > +         ci = ceph_inode(inode);
> > > > > +         spin_lock(&ci->i_ceph_lock);
> > > > > +         is_open = __ceph_is_file_opened(ci);
> > > > > +         spin_unlock(&ci->i_ceph_lock);
> > > > > +         if (!is_open) {
> > > > > +                 iput(inode);
> > > > > +                 return ERR_PTR(-ESTALE);
> > > > > +         }
> > > > >   }
> > > > >   return d_obtain_alias(inode);
> > > > >  }
> > > >
> > > > Thanks Luis. Out of curiousity, is there any reason we shouldn't ignore
> > > > the i_nlink value here? Does anything obviously break if we do?
> > >
> > > Yes, the scenario described in commit 03f219041fdb is still valid, which
> > > is basically the same but without the extra open(2):
> > >
> > >   - name_to_handle_at("/cephfs/myfile")
> > >   - unlink("/cephfs/myfile")
> > >   - open_by_handle_at()
> > >
> >
> > Ok, I guess we end up doing some delayed cleanup, and that allows the
> > inode to be found in that situation.
> >
> > > The open_by_handle_at man page isn't really clear about these 2 scenarios,
> > > but generic/426 will fail if -ESTALE isn't returned.  Want me to add a
> > > comment to the code, describing these 2 scenarios?
> > >
> >
> > (cc'ing Amir since he added this test)
> >
> > I don't think there is any hard requirement that open_by_handle_at
> > should fail in that situation. It generally does for most filesystems
> > due to the way they handle cl794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters
> eaning up unlinked inodes, but I don't
> > think it's technically illegal to allow the inode to still be found. If
> > the caller cares about whether it has been unlinked it can always test
> > i_nlink itself.
> >
> > Amir, is this required for some reason that I'm not aware of?
> 
> Hi Jeff,
> 
> The origin of this test is in fstests commit:
> 794798fa xfsqa: test open_by_handle() on unlinked and freed inode clusters
> 
> It was introduced to catch an xfs bug, so this behavior is the expectation
> of xfs filesystem, but note that it is not a general expectation to fail
> open_by_handle() after unlink(), it is an expectation to fail open_by_handle()
> after unlink() + sync() + drop_caches.

Yes, sorry I should have mentioned the sync+drop_caches in the
description.

> I have later converted the test to generic, because I needed to check the
> same expectation for overlayfs use case, which is:
> The original inode is always there (in lower layer), unlink creates a whiteout
> mark and open_by_handle should treat that as ESTALE, otherwise the
> unlinked files would be accessible to nfs clients forever.
> 
> In overlayfs, we handle the open file case by returning a dentry only
> in case the inode with deletion mark in question is already in inode cache,
> but we take care not to populate inode cache with the check.
> It is easier, because we do not need to get inode into cache for checking
> the delete marker.
> 
> Maybe you could instead check in __fh_to_dentry():
> 
>     if (inode->i_nlink == 0 && atomic_read(&inode->i_count) == 1)) {
>         iput(inode);
>         return ERR_PTR(-ESTALE);
>     }
> 
> The above is untested, so I don't know if it's enough to pass generic/426.

Yes, I can confirm that this also fixes the issue -- both tests pass.
__ceph_is_file_opened() uses some internal counters per inode, incremented
each time a file is open in a specific mode.  The problem is that these
counters require some extra locking (maybe they should be atomic_t?), so
you're suggestion is probably better.

> Note that generic/467 also checks the same behavior for rmdir().

Yeah, but the only test-case failing with cephfs is the one described
above (i.e. "open_by_handle -dkr ...").

> If you decide that ceph does not need to comply to this behavior,
> then we probably need to whitelist/blocklist the filesystems that
> want to test this behavior, which will be a shame.

Unless Jeff has any objection, I'm happy sending v2, simplifying the patch
to use your simpler solution (and mentioning sync+drop_caches in the
commit message).

Cheers,
--
Luis

  reply	other threads:[~2020-05-15 11:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-14 11:14 [PATCH] ceph: don't return -ESTALE if there's still an open file Luis Henriques
2020-05-14 12:10 ` Jeff Layton
2020-05-14 12:48   ` Luis Henriques
2020-05-14 13:15     ` Jeff Layton
2020-05-15  6:42       ` Amir Goldstein
2020-05-15 11:15         ` Luis Henriques [this message]
2020-05-15 11:38           ` Jeff Layton
2020-05-15 16:56             ` Amir Goldstein
2020-05-15 19:14               ` Jeff Layton
2020-05-16  6:58                 ` Amir Goldstein
2020-05-16 12:10                   ` Jeff Layton
2020-05-18 22:30                 ` Gregory Farnum
2020-05-19  4:00                   ` Amir Goldstein
2020-05-19 10:43                     ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515111548.GA54598@suse.com \
    --to=lhenriques@suse.com \
    --cc=amir73il@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dchinner@redhat.com \
    --cc=fstests@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.