From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:35304 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1424285AbcBQXJD (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 17 Feb 2016 18:09:03 -0500
Date: Wed, 17 Feb 2016 23:09:00 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Martin Brandenburg <martin@omnibond.com>
Cc: Mike Marshall <hubcap@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Message-ID: <20160217230900.GP17997@ZenIV.linux.org.uk>
References: <20160214234312.GX17997@ZenIV.linux.org.uk>
 <CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com>
 <20160215184554.GY17997@ZenIV.linux.org.uk>
 <CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com>
 <20160215230434.GZ17997@ZenIV.linux.org.uk>
 <CAOg9mSSdsVL8tRiAojg=UCNPQ6iPcthtdowH9kWyiWnXUvTEHg@mail.gmail.com>
 <20160216233609.GE17997@ZenIV.linux.org.uk>
 <20160216235441.GF17997@ZenIV.linux.org.uk>
 <CAOg9mSR0QZOy-JAesN32o51omxdOQyfUh-Zj8k7ZohKLaa3GxQ@mail.gmail.com>
 <alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Feb 17, 2016 at 05:40:08PM -0500, Martin Brandenburg wrote:

> In orangefs_clean_up_interrupted_operation
> 
> 	if (op_state_waiting(op)) {
> 		/*
> 		 * upcall hasn't been read; remove op from upcall request
> 		 * list.
> 		 */
> 		spin_unlock(&op->lock);
> 
> 		/* HERE */
> 
> 		spin_lock(&orangefs_request_list_lock);
> 		list_del(&op->list);
> 		spin_unlock(&orangefs_request_list_lock);

Hmm...  We'd already marked it as given up, though.  Before dropping op->lock.

> 		gossip_debug(GOSSIP_WAIT_DEBUG,
> 			     "Interrupted: Removed op %p from request_list\n",
> 			     op);
> 	} else if (op_state_in_progress(op)) {
> 
> and orangefs_devreq_read
> 
> restart:
> 	/* Get next op (if any) from top of list. */
> 	spin_lock(&orangefs_request_list_lock);
> 	list_for_each_entry_safe(op, temp, &orangefs_request_list, list) {
> 		__s32 fsid;
> 		/* This lock is held past the end of the loop when we break. */
> 
> 		/* HERE */
> 
> 		spin_lock(&op->lock);
> 		if (unlikely(op_state_purged(op))) {
> 			spin_unlock(&op->lock);
> 			continue;
> 		}
> 
> I think both processes can end up working on the same
> op.

It can be picked up.  And then we'll run into

        if (unlikely(op_state_given_up(cur_op))) {
                spin_unlock(&cur_op->lock);
                spin_unlock(&htable_ops_in_progress_lock);
                op_release(cur_op);
                goto restart;

Oh, I see...  OK, yes - by the time we get to that check the sucker has
already been through resubmitting it into the list, so the "given up" flag
is lost.

Hrm...  The obvious approach is to at least avoid taking it off the list
if it's given up - i.e. turn
                __s32 fsid;
                /* This lock is held past the end of the loop when we break. */
                spin_lock(&op->lock);
                if (unlikely(op_state_purged(op))) {
into
                __s32 fsid;
                /* This lock is held past the end of the loop when we break. */
                spin_lock(&op->lock);
                if (unlikely(op_state_purged(op) || op_state_given_up(op))) {
a bit before that point.

However, that doesn't prevent all unpleasantness here - giving up just as
it's being copied to userland and going into restart.  Ho-hum...  How about
the following:
	* move increment of op->attempts into the same place where we
set "given up"
	* in addition to the check for "given up" in the request-picking loop
(as above), fetch op->attempts before dropping op->lock
	* after having retaken op->lock (after copy_to_user()) recheck
op->attempts instead of checking for "given up".

IOW, something like this:

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index b27ed1c..1938d55 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -109,6 +109,7 @@ static ssize_t orangefs_devreq_read(struct file *file,
 	static __s32 magic = ORANGEFS_DEVREQ_MAGIC;
 	struct orangefs_kernel_op_s *cur_op = NULL;
 	unsigned long ret;
+	int attempts;
 
 	/* We do not support blocking IO. */
 	if (!(file->f_flags & O_NONBLOCK)) {
@@ -133,7 +134,7 @@ restart:
 		__s32 fsid;
 		/* This lock is held past the end of the loop when we break. */
 		spin_lock(&op->lock);
-		if (unlikely(op_state_purged(op))) {
+		if (unlikely(op_state_purged(op) || op_state_given_up(op))) {
 			spin_unlock(&op->lock);
 			continue;
 		}
@@ -207,6 +208,7 @@ restart:
 	list_del_init(&cur_op->list);
 	get_op(op);
 	spin_unlock(&orangefs_request_list_lock);
+	attempts = op->attempts;
 
 	spin_unlock(&cur_op->lock);
 
@@ -227,7 +229,8 @@ restart:
 
 	spin_lock(&htable_ops_in_progress_lock);
 	spin_lock(&cur_op->lock);
-	if (unlikely(op_state_given_up(cur_op))) {
+	if (unlikely(cur_op->attempts != attempts)) {
+		/* given up just as we copied to userland */
 		spin_unlock(&cur_op->lock);
 		spin_unlock(&htable_ops_in_progress_lock);
 		op_release(cur_op);
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index d980240..cc43ac8 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -139,7 +139,6 @@ retry_servicing:
 	op->downcall.status = ret;
 	/* retry if operation has not been serviced and if requested */
 	if (ret == -EAGAIN) {
-		op->attempts++;
 		timeout = op_timeout_secs * HZ;
 		gossip_debug(GOSSIP_WAIT_DEBUG,
 			     "orangefs: tag %llu (%s)"
@@ -208,6 +207,7 @@ static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s
 	 * Called with op->lock held.
 	 */
 	op->op_state |= OP_VFS_STATE_GIVEN_UP;
+	op->attempts++;
 
 	if (op_state_waiting(op)) {
 		/*