linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] several ceph dentry leaks
@ 2015-02-02  0:31 Al Viro
  2015-02-02  3:23 ` Yan, Zheng
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2015-02-02  0:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel, linux-fsdevel

	What do you expect to happen when if () is taken in
int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry)
{
        struct dentry *result = ceph_lookup(dir, dentry, 0);

        if (result && !IS_ERR(result)) {

If result is non-NULL, it means that we have just acquired a new reference
to preexisting dentry (in ceph_finish_lookup()); where do you expect that
reference to be dropped?

Another thing: in ceph_readdir_prepopulate()
                if (!dn->d_inode) {  
                        dn = splice_dentry(dn, in, NULL); 
                        if (IS_ERR(dn)) {
                                err = PTR_ERR(dn);
                                dn = NULL;
                                goto next_item;
                        }
                }
you leak dn if that IS_ERR() ever gets hit - d_splice_alias(d, i) does *not*
drop reference to d in any cases, so splice_dentry() leaves the sum total
of all dentry refcounts unchanged.  And in case when return value is
ERR_PTR(...), this assignment results in a leak.  That one is trival to
fix, but ceph_handle_notrace_create() looks very confusing - if nothing else,
we should _never_ create multiple dentries pointing to directory inode, so
d_instantiate() in there isn't mitigating anything - it's actively breaking
things as far as the rest of the kernel is concerned...  What are you
trying to do there?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] several ceph dentry leaks
  2015-02-02  0:31 [RFC] several ceph dentry leaks Al Viro
@ 2015-02-02  3:23 ` Yan, Zheng
  2015-02-02  4:41   ` Al Viro
  0 siblings, 1 reply; 4+ messages in thread
From: Yan, Zheng @ 2015-02-02  3:23 UTC (permalink / raw)
  To: Al Viro; +Cc: Sage Weil, ceph-devel, linux-fsdevel

On Mon, Feb 2, 2015 at 8:31 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>         What do you expect to happen when if () is taken in
> int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry)
> {
>         struct dentry *result = ceph_lookup(dir, dentry, 0);
>
>         if (result && !IS_ERR(result)) {
>
> If result is non-NULL, it means that we have just acquired a new reference
> to preexisting dentry (in ceph_finish_lookup()); where do you expect that
> reference to be dropped?
>
> Another thing: in ceph_readdir_prepopulate()
>                 if (!dn->d_inode) {
>                         dn = splice_dentry(dn, in, NULL);
>                         if (IS_ERR(dn)) {
>                                 err = PTR_ERR(dn);
>                                 dn = NULL;
>                                 goto next_item;
>                         }
>                 }
> you leak dn if that IS_ERR() ever gets hit - d_splice_alias(d, i) does *not*
> drop reference to d in any cases, so splice_dentry() leaves the sum total
> of all dentry refcounts unchanged.  And in case when return value is
> ERR_PTR(...), this assignment results in a leak.  That one is trival to
> fix, but ceph_handle_notrace_create() looks very confusing - if nothing else,
> we should _never_ create multiple dentries pointing to directory inode, so
> d_instantiate() in there isn't mitigating anything - it's actively breaking
> things as far as the rest of the kernel is concerned...  What are you
> trying to do there?

ceph_handle_notrace_create() is for case: Ceph Metadata server restarted,
client re-sent a request. Ceph MDS found the request has already completed,
so it sent a traceless reply to client. (traceless reply contains no information
for the dentry and the newly created inode). It's hard to handle this case
elegantly, because MDS may have done other operation, which moved the
newly created file/directory to other place.

For multiple dentries pointing to directory inode, maybe we can return an error
(-ENOENT or -ESTALE).

Regards
Yan, Zheng


> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] several ceph dentry leaks
  2015-02-02  3:23 ` Yan, Zheng
@ 2015-02-02  4:41   ` Al Viro
  2015-02-02  6:19     ` Yan, Zheng
  0 siblings, 1 reply; 4+ messages in thread
From: Al Viro @ 2015-02-02  4:41 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Sage Weil, ceph-devel, linux-fsdevel

On Mon, Feb 02, 2015 at 11:23:58AM +0800, Yan, Zheng wrote:
> > we should _never_ create multiple dentries pointing to directory inode, so
> > d_instantiate() in there isn't mitigating anything - it's actively breaking
> > things as far as the rest of the kernel is concerned...  What are you
> > trying to do there?
> 
> ceph_handle_notrace_create() is for case: Ceph Metadata server restarted,
> client re-sent a request. Ceph MDS found the request has already completed,
> so it sent a traceless reply to client. (traceless reply contains no information
> for the dentry and the newly created inode). It's hard to handle this case
> elegantly, because MDS may have done other operation, which moved the
> newly created file/directory to other place.
> 
> For multiple dentries pointing to directory inode, maybe we can return an error
> (-ENOENT or -ESTALE).

IDGI...  Suppose we got non-NULL from ceph_lookup() there.  So we'd sent either
CEPH_MDS_OP_LOOKUP or CEPH_MDS_OP_LOOKUPSNAP and got ->r_dentry changed
(presumably in ceph_fill_trace(), after it got the sucker from splice_dentry(),
i.e. ultimately d_splice_alias()).  Right?  Now, we have acquired a reference
to it (in ceph_finish_lookup()).  So far, so good; for one thing, we definitely
do *NOT* want to forget about that reference.  For another, we've got
a good and valid dentry; so why are we playing with the originally passed
one?  Just unhash the original and be done with that...

Looking at the code downstream from the calls of ceph_handle_notrace_create(),
ceph_mkdir() looks very dubious - we do ceph_init_inode_acls(dentry->d_inode,
&acls); there, after having set ->d_inode to that of a preexisting alias.
Is that really correct in the case when such an alias used to exist?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] several ceph dentry leaks
  2015-02-02  4:41   ` Al Viro
@ 2015-02-02  6:19     ` Yan, Zheng
  0 siblings, 0 replies; 4+ messages in thread
From: Yan, Zheng @ 2015-02-02  6:19 UTC (permalink / raw)
  To: Al Viro; +Cc: Sage Weil, ceph-devel, linux-fsdevel

On Mon, Feb 2, 2015 at 12:41 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Feb 02, 2015 at 11:23:58AM +0800, Yan, Zheng wrote:
> > > we should _never_ create multiple dentries pointing to directory inode, so
> > > d_instantiate() in there isn't mitigating anything - it's actively breaking
> > > things as far as the rest of the kernel is concerned...  What are you
> > > trying to do there?
> >
> > ceph_handle_notrace_create() is for case: Ceph Metadata server restarted,
> > client re-sent a request. Ceph MDS found the request has already completed,
> > so it sent a traceless reply to client. (traceless reply contains no information
> > for the dentry and the newly created inode). It's hard to handle this case
> > elegantly, because MDS may have done other operation, which moved the
> > newly created file/directory to other place.
> >
> > For multiple dentries pointing to directory inode, maybe we can return an error
> > (-ENOENT or -ESTALE).
>
> IDGI...  Suppose we got non-NULL from ceph_lookup() there.  So we'd sent either
> CEPH_MDS_OP_LOOKUP or CEPH_MDS_OP_LOOKUPSNAP and got ->r_dentry changed
> (presumably in ceph_fill_trace(), after it got the sucker from splice_dentry(),
> i.e. ultimately d_splice_alias()).  Right?

In theory, yes. But I think it never happens in reality. Because
parent directory of the newly created
file/directory is locked, client has no other way to lookup the newly
created file/directory.


Now, we have acquired a reference
> to it (in ceph_finish_lookup()).  So far, so good; for one thing, we definitely
> do *NOT* want to forget about that reference.  For another, we've got
> a good and valid dentry; so why are we playing with the originally passed
> one?  Just unhash the original and be done with that...
>

I think the reason is that VFS still uses the original dentry. (for
example, after calling
filesystem's mkdir callback, vfs_mkdir() calls fsnotify_mkdir() and
passing the original
dentry to it)


> Looking at the code downstream from the calls of ceph_handle_notrace_create(),
> ceph_mkdir() looks very dubious - we do ceph_init_inode_acls(dentry->d_inode,
> &acls); there, after having set ->d_inode to that of a preexisting alias.
> Is that really correct in the case when such an alias used to exist?

you are right, it's incorrect.

how about make ceph_handle_notrace_create()  return an error when it
gets an non-NULL
from ceph_lookup() ? this method is not perfect but should work in most case.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-02  6:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02  0:31 [RFC] several ceph dentry leaks Al Viro
2015-02-02  3:23 ` Yan, Zheng
2015-02-02  4:41   ` Al Viro
2015-02-02  6:19     ` Yan, Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).