From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:46766 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752603AbcBJQol (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 10 Feb 2016 11:44:41 -0500
Date: Wed, 10 Feb 2016 16:44:36 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Message-ID: <20160210164435.GA4950@ZenIV.linux.org.uk>
References: <20160207035331.GZ17997@ZenIV.linux.org.uk>
 <CAOg9mSR1-7xOZCC=8dbx+zfHDep4NyUZm0e5XBDQzpCFaCNH7Q@mail.gmail.com>
 <20160208233535.GC17997@ZenIV.linux.org.uk>
 <20160209033203.GE17997@ZenIV.linux.org.uk>
 <CAOg9mSTFivsJDL-Ppruivd8gp_iThdq6N6+bWg0TfLXpV=rs8g@mail.gmail.com>
 <20160209174049.GG17997@ZenIV.linux.org.uk>
 <CAOg9mSTzE7KmrVC1zWSgC+vo20HfKLrsM3VPkxnYLN9roi+ZOw@mail.gmail.com>
 <20160209221623.GI17997@ZenIV.linux.org.uk>
 <20160209224050.GJ17997@ZenIV.linux.org.uk>
 <20160209231328.GK17997@ZenIV.linux.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160209231328.GK17997@ZenIV.linux.org.uk>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Tue, Feb 09, 2016 at 11:13:28PM +0000, Al Viro wrote:
> On Tue, Feb 09, 2016 at 10:40:50PM +0000, Al Viro wrote:
> 
> > And the version in orangefs-2.9.3.tar.gz (your Frankenstein module?) is
> > vulnerable to the same race.  2.8.1 isn't - it ignores signals on the
> > cancel, but that means waiting for cancel to be processed (or timed out)
> > on any interrupted read() before we return to userland.  We can return
> > to that behaviour, of course, but I suspect that offloading it to something
> > async (along with freeing the slot used by original operation) would be
> > better from QoI point of view.
> 
> That breakage had been introduced between 2.8.5 and 2.8.6 (at some point
> during the spring of 2012).  AFAICS, all versions starting with 2.8.6 are
> vulnerable...

BTW, what about kill -9 delivered to readdir in progress?  There's no
cancel for those (and AFAICS the daemon will reject cancel on anything
other than FILE_IO), so what's to stop another thread from picking the
same readdir slot and getting (daemon-side) two of them spewing into
the same area of shared memory?  Is it simply that daemon-side the shared
memory on readdir is touched only upon request completion in completely
serialized process_vfs_requests()?  That doesn't seem to be enough -
suppose the second readdir request completes (daemon-side) first, its results
get packed into shared memory slot and it is reported to kernel, which
proceeds to repack and copy that data to userland.  In the meanwhile,
daemon completes the _earlier_ readdir and proceeds to pack its results into
the same slot of shared memory.  Sure, the kernel won't take that (the
op with the matching tag has been gone already), but the data is stored
into shared memory *before* writev() on the control device that would pass
the response to the kernel, so it still gets overwritten.  Right under
decoding readdir()...

Or is there something in the daemon that would guarantee readdir responses
to happen in the same order in which it had picked the requests?  I'm not
familiar enough with that beast (and overall control flow in there is, er,
not the most transparent one I've seen), so I might be missing something,
but I don't see anything obvious that would guarantee such ordering.

Please, clarify.