From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:58478 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932145AbcA3S1e (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 30 Jan 2016 13:27:34 -0500
Date: Sat, 30 Jan 2016 18:27:31 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Martin Brandenburg <martin@omnibond.com>
Cc: Mike Marshall <hubcap@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Orangefs ABI documentation
Message-ID: <20160130182731.GF17997@ZenIV.linux.org.uk>
References: <20160123012808.GK17997@ZenIV.linux.org.uk>
 <CAOg9mST8rpzSs4FGB=7yiGJ2PZBqw2s=H1ozQBiWjDF232ZnhQ@mail.gmail.com>
 <20160123191055.GN17997@ZenIV.linux.org.uk>
 <CAOg9mSR667SdVvR323MOaqEs0DVGKcxr8e0tyx=Bu_=ENops0g@mail.gmail.com>
 <20160123214006.GO17997@ZenIV.linux.org.uk>
 <CAOg9mSRwNcXZ2W31Ua-zReYWLS1oQcXL+=5kJuXjwHWUQHQRhg@mail.gmail.com>
 <20160124001615.GT17997@ZenIV.linux.org.uk>
 <20160124040529.GX17997@ZenIV.linux.org.uk>
 <CA+D=wkjuAq-krCKPMFJ6M4iik4Cz=WLS1KE=pQQGSWpz07q6Rw@mail.gmail.com>
 <20160130173413.GE17997@ZenIV.linux.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160130173413.GE17997@ZenIV.linux.org.uk>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat, Jan 30, 2016 at 05:34:13PM +0000, Al Viro wrote:

> And again, how could
>                 /* op uses shared memory */
>                 if (orangefs_get_bufmap_init() == 0) {
> in service_operation() possibly be true, when we have
> 	* op->uses_shared_memory just checked to be set
> 	* all callers that set it (orangefs_readdir() and wait_for_direct_io()
> having allocated a slot before calling service_operation() and not releasing
> it until service_operation() returns
> 	* __orangefs_bufmap not becoming NULL until all slots are freed and
> 	* orangefs_get_bufmap_init() returning 1 unless __orangefs_bufmap is
> NULL?
> 
> AFAICS, that code (waiting for daemon to be restarted) is provably never
> executed.  What am I missing?

While we are at it, what happens to original code (without refcount changes,
etc. from my pile - the problem remains with those, but let's look at the
code at merge from v4.4) if something does read() and gets a signal
just as the daemon gets to
        n = copy_from_iter(&op->downcall, downcall_size, iter);
in ->write_iter(), reporting that is has finished that read?  We were in
wait_for_matching_downcall(), interruptibly sleeping.  We got woken up by
signal delivery.  Checked op->op_state; it's still not marked as serviced.
We check signal_pending() and find it true.  We hit
orangefs_clean_up_interrupted_operation(), which doesn't do anything to
op->op_state, and we return -EINTR to service_operation().  Which returns
without waiting for anything to wait_for_direct_io() and we go into
sending a cancel.  Now the daemon regains the timeslice.  Marks op as
serviced.  And proceeds to wait on op->io_completion_waitq.

Who's going to wake it up?  orangefs_cancel_op_in_progress() sure as hell
won't - it has no way to find op, nevermind doing wakeups.  The rest of
wait_for_direct_io() also doesn't wake the daemon up.  How is that supposed
to work?  Moreover, we issue a cancel; when is it supposed to be processed
and how do we tell if it's already been processed?

Who should be waiting for what in case of cancel being issued just as the
daemon gets around to reporting success of original operation?  On the
protocol level, that is.