From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-lb0-f181.google.com ([209.85.217.181]:36723 "EHLO
	mail-lb0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S965454AbcBQWY6 (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 17 Feb 2016 17:24:58 -0500
Received: by mail-lb0-f181.google.com with SMTP id x1so17992919lbj.3
        for <linux-fsdevel@vger.kernel.org>; Wed, 17 Feb 2016 14:24:57 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <20160217211754.GO17997@ZenIV.linux.org.uk>
References: <20160214234312.GX17997@ZenIV.linux.org.uk>
	<CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com>
	<20160215184554.GY17997@ZenIV.linux.org.uk>
	<CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com>
	<20160215230434.GZ17997@ZenIV.linux.org.uk>
	<CAOg9mSSdsVL8tRiAojg=UCNPQ6iPcthtdowH9kWyiWnXUvTEHg@mail.gmail.com>
	<20160216233609.GE17997@ZenIV.linux.org.uk>
	<20160216235441.GF17997@ZenIV.linux.org.uk>
	<CAOg9mSR0QZOy-JAesN32o51omxdOQyfUh-Zj8k7ZohKLaa3GxQ@mail.gmail.com>
	<20160217201120.GN17997@ZenIV.linux.org.uk>
	<20160217211754.GO17997@ZenIV.linux.org.uk>
Date: Wed, 17 Feb 2016 17:24:56 -0500
Message-ID: <CAOg9mSRtGvMtr2b8xjFUdgJDgS7hb+dYtjEBBMKkatZJsbLuJg@mail.gmail.com>
Subject: Re: Orangefs ABI documentation
From: Mike Marshall <hubcap@omnibond.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Brandenburg <martin@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Content-Type: text/plain; charset=UTF-8
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

> with reinit_completion() added?

In the right place, even...

> Am I right assuming that you are seeing zero from service_operation()
> for ORANGEFS_VFS_OP_CREATE with zeroed ->downcall.resp.create.refn?
> AFAICS, that can only happen with wait_for_matching_downcall() returning
> 0, which mean op_state_serviced(op).

Yes sir:

Feb 17 12:25:15 be1 kernel: [  857.901225] service_operation:
wait_for_matching_downcall returned 0 for ffff880015698000
Feb 17 12:25:15 be1 kernel: [  857.901228] orangefs: service_operation
orangefs_create returning: 0 for ffff880015698000.
Feb 17 12:25:15 be1 kernel: [  857.901230] orangefs_create:
MAILBOX2.CPT: handle:00000000-0000-0000-0000-000000000000: fsid:0:
new_op:ffff880015698000: ret:0:

I'll get those WARN_ONs in there and do some more
tests.

Martin thinks he might be on to a solution... he'll
probably send a response soon... I can hear him typing
at 500 characters a minute a few cubes away...

-Mike

On Wed, Feb 17, 2016 at 4:17 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Feb 17, 2016 at 08:11:20PM +0000, Al Viro wrote:
>> With reinit_completion() added?
>>
>> > Maybe this is relevant:
>> >
>> > Alloced OP ffff880015698000 <- doomed op for orangefs_create MAILBOX2.CPT
>> > service_operation: orangefs_create op ffff880015698000
>> > ffff880015698000 got past is_daemon_in_service
>> >
>> > ... lots of stuff ...
>> >
>> > w_f_m_d returned -11 for ffff880015698000 <- first op to get EAGAIN
>> >
>> > first client core is NOT in service
>> > second op to get EAGAIN
>> >           ...
>> > last client core is NOT in service
>> >
>> > ... lots of stuff ...
>> >
>> > service_operation returns to orangef_create with handle 0 fsid 0 ret 0
>> > for MAILBOX2.CPT
>> >
>> > I'm guessing you want me to wait to do the switching of my branch
>> > until we fix this (last?) thing, let me know...
>>
>> What I'd like to check is the value of op->waitq.done at retry_servicing.
>> If we get there with a non-zero value, we've a problem.  BTW, do you
>> hit any of gossip_err() in orangefs_clean_up_interrupted_operation()?
>
> Am I right assuming that you are seeing zero from service_operation()
> for ORANGEFS_VFS_OP_CREATE with zeroed ->downcall.resp.create.refn?
> AFAICS, that can only happen with wait_for_matching_downcall() returning
> 0, which mean op_state_serviced(op).  And prior to that we had
> set_op_state_waiting(op), which means set_op_state_serviced(op) done between
> those two...
>
> So unless you have something really nasty going on (buggered op refcounting,
> memory corruption, etc.), we had orangefs_devreq_write_iter() pick that
> op and copy the entire op->downcall from userland.  With op->downcall.status
> not set to -EFAULT, or we would've...  Oh, shit - it's fed through
> orangefs_normalize_to_errno().  OTOH, it would've hit
> gossip_err("orangefs: orangefs_normalize_to_errno: got error code which is not
> from ORANGEFS.\n") and presumably you would've noticed that.  And it still
> would've returned that -EFAULT intact, actually.  In any case, it's a bug -
> we need
>         ret = -(ORANGEFS_ERROR_BIT|9);
>         goto Broken;
> instead of those
>         ret = -EFAULT;
>         goto Broken;
> and
>         ret = -(ORANGEFS_ERROR_BIT|8);
>         goto Broken;
> instead of
>         ret = -ENOMEM;
>         goto Broken;
> in orangefs_devreq_write_iter().
>
> Anyway, it looks like we didn't go through Broken: in there, so copy_from_user()
> must've been successful.
>
> Hmm...  Could the daemon have really given that reply?  Relevant test would
> be something like
>         WARN_ON(op->upcall.type == ORANGEFS_VFS_OP_CREATE &&
>             !op->downcall.resp.create.refn.fsid);
> right after
> wakeup:
>         /*
>          * tell the vfs op waiting on a waitqueue
>          * that this op is done
>          */
>         spin_lock(&op->lock);
>         if (unlikely(op_state_given_up(op))) {
>                 spin_unlock(&op->lock);
>                 goto out;
>         }
>         set_op_state_serviced(op);
>
> and plain WARN_ON(1) right after Broken:
>
> If that op goes through either of those set_op_state_serviced() with that
> zero refn.fsid, we'll see a WARN_ON triggered...