ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Help understanding xfstest generic/467 failure
@ 2020-05-12 15:13 Luis Henriques
  2020-05-12 15:33 ` Jeff Layton
  0 siblings, 1 reply; 3+ messages in thread
From: Luis Henriques @ 2020-05-12 15:13 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel

Hi Jeff,

I've been looking at xfstest generic/467 failure in cephfs, and I simply
can not decide if it's a genuine bug on ceph kernel code.  Since you've
recently been touching the ceph_unlink code maybe you could help me
understanding what's going on.

generic/467 runs a couple of tests using src/open_by_handle, but the one
failing can be summarized with the following:

- get a handle to /cephfs/myfile using name_to_handle_at(2)
- open(2) file /cephfs/myfile
- unlink(2) /cephfs/myfile
- drop caches
- open_by_handle_at(2) => returns -ESTALE

This test succeeds opening the handle with other (local) filesystems
(maybe I should run it with other networked filesystem such as NFS).

The -ESTALE is easy to trace to __fh_to_dentry, where inode->i_nlink is
checked against 0.  My question is: should we really be testing the
i_nlink here?  We dropped the name, but the file may still be there (as in
this case).

I guess I'm missing something, but hopefully you'll be able to shed some
light on this.  Thanks in advance for any help you may provide!

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help understanding xfstest generic/467 failure
  2020-05-12 15:13 Help understanding xfstest generic/467 failure Luis Henriques
@ 2020-05-12 15:33 ` Jeff Layton
  2020-05-12 16:11   ` Luis Henriques
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Layton @ 2020-05-12 15:33 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel

On Tue, 2020-05-12 at 16:13 +0100, Luis Henriques wrote:
> Hi Jeff,
> 
> I've been looking at xfstest generic/467 failure in cephfs, and I simply
> can not decide if it's a genuine bug on ceph kernel code.  Since you've
> recently been touching the ceph_unlink code maybe you could help me
> understanding what's going on.
> 
> generic/467 runs a couple of tests using src/open_by_handle, but the one
> failing can be summarized with the following:
> 
> - get a handle to /cephfs/myfile using name_to_handle_at(2)
> - open(2) file /cephfs/myfile
> - unlink(2) /cephfs/myfile
> - drop caches
> - open_by_handle_at(2) => returns -ESTALE
> 
> This test succeeds opening the handle with other (local) filesystems
> (maybe I should run it with other networked filesystem such as NFS).
> 
> The -ESTALE is easy to trace to __fh_to_dentry, where inode->i_nlink is
> checked against 0.  My question is: should we really be testing the
> i_nlink here?  We dropped the name, but the file may still be there (as in
> this case).
> 
> I guess I'm missing something, but hopefully you'll be able to shed some
> light on this.  Thanks in advance for any help you may provide!

Yeah, I took a brief look at this a while back and never got back to
looking at it again. I think cephfs's behavior is wrong here. We should
be able to look up an open-but-unlinked file by filehandle.

That said those checks went in via commit 570df4e9c23f8, and it looks
like it was deliberately added to __fh_to_dentry. I'm unclear as to why.
It may be interesting to remove the i_nlink checks and see whether it
breaks anything?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help understanding xfstest generic/467 failure
  2020-05-12 15:33 ` Jeff Layton
@ 2020-05-12 16:11   ` Luis Henriques
  0 siblings, 0 replies; 3+ messages in thread
From: Luis Henriques @ 2020-05-12 16:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel

On Tue, May 12, 2020 at 11:33:13AM -0400, Jeff Layton wrote:
> On Tue, 2020-05-12 at 16:13 +0100, Luis Henriques wrote:
> > Hi Jeff,
> > 
> > I've been looking at xfstest generic/467 failure in cephfs, and I simply
> > can not decide if it's a genuine bug on ceph kernel code.  Since you've
> > recently been touching the ceph_unlink code maybe you could help me
> > understanding what's going on.
> > 
> > generic/467 runs a couple of tests using src/open_by_handle, but the one
> > failing can be summarized with the following:
> > 
> > - get a handle to /cephfs/myfile using name_to_handle_at(2)
> > - open(2) file /cephfs/myfile
> > - unlink(2) /cephfs/myfile
> > - drop caches
> > - open_by_handle_at(2) => returns -ESTALE
> > 
> > This test succeeds opening the handle with other (local) filesystems
> > (maybe I should run it with other networked filesystem such as NFS).
> > 
> > The -ESTALE is easy to trace to __fh_to_dentry, where inode->i_nlink is
> > checked against 0.  My question is: should we really be testing the
> > i_nlink here?  We dropped the name, but the file may still be there (as in
> > this case).
> > 
> > I guess I'm missing something, but hopefully you'll be able to shed some
> > light on this.  Thanks in advance for any help you may provide!
> 
> Yeah, I took a brief look at this a while back and never got back to
> looking at it again. I think cephfs's behavior is wrong here. We should
> be able to look up an open-but-unlinked file by filehandle.
> 
> That said those checks went in via commit 570df4e9c23f8, and it looks
> like it was deliberately added to __fh_to_dentry. I'm unclear as to why.

Right, I saw that commit too and that check came from the 'old' version of
ceph_lookup_inode into the 'new' version of ceph_lookup_inode and into
__fh_to_dentry.

> It may be interesting to remove the i_nlink checks and see whether it
> breaks anything?

I've done that already and a very quick test didn't show anything.  But it
may break things a very subtle ways.  I'll see if it's able to handle a
full run of xfstests.

Thanks for your hints, Jeff.  I'll see if I can progress a bit further on
this.

Cheers,
--
Luis

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-05-12 16:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12 15:13 Help understanding xfstest generic/467 failure Luis Henriques
2020-05-12 15:33 ` Jeff Layton
2020-05-12 16:11   ` Luis Henriques

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).