linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [GIT PULL] Ceph fixes for 5.1-rc7
       [not found]             ` <20190428043801.GE2217@ZenIV.linux.org.uk>
@ 2019-04-28 13:27               ` Jeff Layton
  2019-04-28 14:48                 ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2019-04-28 13:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ilya Dryomov, ceph-devel,
	Linux List Kernel Mailing, linux-cifs

On Sun, 2019-04-28 at 05:38 +0100, Al Viro wrote:
> On Fri, Apr 26, 2019 at 01:30:53PM -0400, Jeff Layton wrote:
> 
> > > I _probably_ would take allocation out of the loop (e.g. make it
> > > __getname(), called unconditionally) and turned it into the
> > > d_path.c-style read_seqbegin_or_lock()/need_seqretry()/done_seqretry()
> > > loop, so that the first pass would go under rcu_read_lock(), while
> > > the second (if needed) would just hold rename_lock exclusive (without
> > > bumping the refcount).  But that's a matter of (theoretical) livelock
> > > avoidance, not the locking correctness for ->d_name accesses.
> > > 
> > 
> > Yeah, that does sound better. I want to think about this code a bit
> 
> FWIW, is there any reason to insist that the pathname is put into the
> beginning of the buffer?  I mean, instead of path + pathlen we might
> return path + offset, with the pathname going from path + offset to
> path + PATH_MAX - 1 inclusive, with path being the thing eventually
> freed.
> 
> It's easier to build the string backwards, seeing that we are walking
> from leaf to root...

(cc'ing linux-cifs)

I don't see a problem doing what you suggest. An offset + fixed length
buffer would be fine there.

Is there a real benefit to using __getname though? It sucks when we have
to reallocate but I doubt that it happens with any frequency. Most of
these paths will end up being much shorter than PATH_MAX and that slims
down the memory footprint a bit.

Also, FWIW -- this code was originally copied from cifs'
build_path_from_dentry(). Should we aim to put something in common
infrastructure that both can call?

There are some significant logic differences in the two functions though
so we might need some sort of callback function or something to know
when to stop walking.

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] Ceph fixes for 5.1-rc7
  2019-04-28 13:27               ` [GIT PULL] Ceph fixes for 5.1-rc7 Jeff Layton
@ 2019-04-28 14:48                 ` Al Viro
  2019-04-28 15:47                   ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-04-28 14:48 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, Ilya Dryomov, ceph-devel,
	Linux List Kernel Mailing, linux-cifs

On Sun, Apr 28, 2019 at 09:27:20AM -0400, Jeff Layton wrote:

> I don't see a problem doing what you suggest. An offset + fixed length
> buffer would be fine there.
> 
> Is there a real benefit to using __getname though? It sucks when we have
> to reallocate but I doubt that it happens with any frequency. Most of
> these paths will end up being much shorter than PATH_MAX and that slims
> down the memory footprint a bit.

AFAICS, they are all short-lived; don't forget that slabs have cache,
so in that situation allocations are cheap.

> Also, FWIW -- this code was originally copied from cifs'
> build_path_from_dentry(). Should we aim to put something in common
> infrastructure that both can call?
> 
> There are some significant logic differences in the two functions though
> so we might need some sort of callback function or something to know
> when to stop walking.

Not if you want it fast...  Indirect calls are not cheap; the cost of
those callbacks would be considerable.  Besides, you want more than
"where do I stop", right?  It's also "what output do I use for this
dentry", both for you and for cifs (there it's "which separator to use",
in ceph it's "these we want represented as //")...

Can it be called on detached subtree, during e.g. open_by_handle()?
There it can get really fishy; you end up with base being at the
random point on the way towards root.  How does that work, and if
it *does* work, why do we need the whole path in the first place?

BTW, for cifs there's no need to play with ->d_lock as we go.  For
ceph, the only need comes from looking at d_inode(), and I wonder if
it would be better to duplicate that information ("is that a
snapdir/nosnap") into dentry iself - would certainly be cheaper.
OTOH, we are getting short on spare bits in ->d_flags...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] Ceph fixes for 5.1-rc7
  2019-04-28 14:48                 ` Al Viro
@ 2019-04-28 15:47                   ` Jeff Layton
  2019-04-28 15:52                     ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2019-04-28 15:47 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ilya Dryomov, ceph-devel,
	Linux List Kernel Mailing, linux-cifs

On Sun, 2019-04-28 at 15:48 +0100, Al Viro wrote:
> On Sun, Apr 28, 2019 at 09:27:20AM -0400, Jeff Layton wrote:
> 
> > I don't see a problem doing what you suggest. An offset + fixed length
> > buffer would be fine there.
> > 
> > Is there a real benefit to using __getname though? It sucks when we have
> > to reallocate but I doubt that it happens with any frequency. Most of
> > these paths will end up being much shorter than PATH_MAX and that slims
> > down the memory footprint a bit.
> 
> AFAICS, they are all short-lived; don't forget that slabs have cache,
> so in that situation allocations are cheap.
> 

Fair enough. Al also pointed out on IRC that the __getname/__putname
caches are likely to be hot, so using that may be less costly cpu-wise.

> > Also, FWIW -- this code was originally copied from cifs'
> > build_path_from_dentry(). Should we aim to put something in common
> > infrastructure that both can call?
> > 
> > There are some significant logic differences in the two functions though
> > so we might need some sort of callback function or something to know
> > when to stop walking.
> 
> Not if you want it fast...  Indirect calls are not cheap; the cost of
> those callbacks would be considerable.  Besides, you want more than
> "where do I stop", right?  It's also "what output do I use for this
> dentry", both for you and for cifs (there it's "which separator to use",
> in ceph it's "these we want represented as //")...
> 
> Can it be called on detached subtree, during e.g. open_by_handle()?
> There it can get really fishy; you end up with base being at the
> random point on the way towards root.  How does that work, and if
> it *does* work, why do we need the whole path in the first place?
> 

This I'm not sure of. commit 79b33c8874334e (ceph: snapshot nfs re-
export) explains this a bit, but I'm not sure it really covers this
case.

Zheng/Sage, feel free to correct me here:

My understanding is that for snapshots you need the base inode number,
snapid, and the full path from there to the dentry for a ceph MDS call.

There is a filehandle type for a snapshotted inode:

struct ceph_nfs_snapfh {
        u64 ino;
        u64 snapid;
        u64 parent_ino;
        u32 hash;
} __attribute__ ((packed));

So I guess it is possible. You could do name_to_handle_at for an inode
deep down in a snapshotted tree, and then try to open_by_handle_at after
the dcache gets cleaned out for some other reason.

What I'm not clear on is why we need to build paths at all for
snapshots. Why is a parent inode number (inside the snapshot) + a snapid
+ dentry name not sufficient?

> BTW, for cifs there's no need to play with ->d_lock as we go.  For
> ceph, the only need comes from looking at d_inode(), and I wonder if
> it would be better to duplicate that information ("is that a
> snapdir/nosnap") into dentry iself - would certainly be cheaper.
> OTOH, we are getting short on spare bits in ->d_flags...

We could stick that in ceph_dentry_info (->d_fsdata). We have a flags
field in there already.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] Ceph fixes for 5.1-rc7
  2019-04-28 15:47                   ` Jeff Layton
@ 2019-04-28 15:52                     ` Al Viro
  2019-04-28 16:18                       ` Jeff Layton
  2019-04-28 16:40                       ` Al Viro
  0 siblings, 2 replies; 6+ messages in thread
From: Al Viro @ 2019-04-28 15:52 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, Ilya Dryomov, ceph-devel,
	Linux List Kernel Mailing, linux-cifs

On Sun, Apr 28, 2019 at 11:47:58AM -0400, Jeff Layton wrote:

> We could stick that in ceph_dentry_info (->d_fsdata). We have a flags
> field in there already.

Yes, but...  You have it freed in ->d_release(), AFAICS, and without
any delays.  So lockless accesses will be trouble.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] Ceph fixes for 5.1-rc7
  2019-04-28 15:52                     ` Al Viro
@ 2019-04-28 16:18                       ` Jeff Layton
  2019-04-28 16:40                       ` Al Viro
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2019-04-28 16:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ilya Dryomov, ceph-devel,
	Linux List Kernel Mailing, linux-cifs

On Sun, 2019-04-28 at 16:52 +0100, Al Viro wrote:
> On Sun, Apr 28, 2019 at 11:47:58AM -0400, Jeff Layton wrote:
> 
> > We could stick that in ceph_dentry_info (->d_fsdata). We have a flags
> > field in there already.
> 
> Yes, but...  You have it freed in ->d_release(), AFAICS, and without
> any delays.  So lockless accesses will be trouble.

That's easy enough to fix -- we could add a rcu_head to it and call_rcu
in ceph_d_release instead of just freeing it immediately.

I guess if we find that d_fsdata is NULL, we can just treat it like we
currently do a negative dentry?
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] Ceph fixes for 5.1-rc7
  2019-04-28 15:52                     ` Al Viro
  2019-04-28 16:18                       ` Jeff Layton
@ 2019-04-28 16:40                       ` Al Viro
  1 sibling, 0 replies; 6+ messages in thread
From: Al Viro @ 2019-04-28 16:40 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, Ilya Dryomov, ceph-devel,
	Linux List Kernel Mailing, linux-cifs

On Sun, Apr 28, 2019 at 04:52:16PM +0100, Al Viro wrote:
> On Sun, Apr 28, 2019 at 11:47:58AM -0400, Jeff Layton wrote:
> 
> > We could stick that in ceph_dentry_info (->d_fsdata). We have a flags
> > field in there already.
> 
> Yes, but...  You have it freed in ->d_release(), AFAICS, and without
> any delays.  So lockless accesses will be trouble.

You could RCU-delay the actual kmem_cache_free(ceph_dentry_cachep, di)
in there, but I've no idea whether the overhead would be painful -
on massive eviction (e.g. on memory pressure) it might be.  Another
variant is to introduce ->d_free(), to be called from __d_free()
and __d_free_external().  That, however, would need another ->d_flags
bit for presence of that method, so that we don't get extra overhead
from looking into ->d_op...

Looking through ->d_release() instances, we have

afs: empty, might as well have not been there

autofs: does some sync stuff (eviction from ->active_list/->expire_list)
plus kfree_rcu

ceph: some sync stuff + immediate kmem_cache_free()

debugfs: kfree(), might or might not be worth RCU-delaying

ecryptfs: sync stuff (path_put for ->lower) + RCU-delayed part

fuse: kfree_rcu()

nfs: kfree()

overlayfs: a bunch of dput() (obviously sync) + kfree_rcu()

9p: sync

So it actually might make sense to move the RCU-delayed bits to
separate method.  Some ->d_release() instances would be simply
gone, as for the rest...  I wonder which of the sync parts can
be moved over to ->d_prune().  Not guaranteed to be doable
(or a good idea), but...  E.g. for autofs it almost certainly
would be the right place for the sync parts - we are,
essentially, telling the filesystem to forget its private
(non-refcounted) references to the victim.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-28 16:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190425174739.27604-1-idryomov@gmail.com>
     [not found] ` <CAHk-=wgPBx9xPWY7obK60=RTnOF3Ln0j9XkhYbekML=Fi3bFXg@mail.gmail.com>
     [not found]   ` <342ef35feb1110197108068d10e518742823a210.camel@kernel.org>
     [not found]     ` <20190425200941.GW2217@ZenIV.linux.org.uk>
     [not found]       ` <86674e79e9f24e81feda75bc3c0dd4215604ffa5.camel@kernel.org>
     [not found]         ` <20190426165055.GY2217@ZenIV.linux.org.uk>
     [not found]           ` <b175faae4bb98d3379a8642fe5f4e00587c3a734.camel@kernel.org>
     [not found]             ` <20190428043801.GE2217@ZenIV.linux.org.uk>
2019-04-28 13:27               ` [GIT PULL] Ceph fixes for 5.1-rc7 Jeff Layton
2019-04-28 14:48                 ` Al Viro
2019-04-28 15:47                   ` Jeff Layton
2019-04-28 15:52                     ` Al Viro
2019-04-28 16:18                       ` Jeff Layton
2019-04-28 16:40                       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).