From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:49929 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750801AbcBKAof (ORCPT ); Wed, 10 Feb 2016 19:44:35 -0500 Date: Thu, 11 Feb 2016 00:44:32 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160211004432.GM17997@ZenIV.linux.org.uk> References: <20160207035331.GZ17997@ZenIV.linux.org.uk> <20160208233535.GC17997@ZenIV.linux.org.uk> <20160209033203.GE17997@ZenIV.linux.org.uk> <20160209174049.GG17997@ZenIV.linux.org.uk> <20160209221623.GI17997@ZenIV.linux.org.uk> <20160209224050.GJ17997@ZenIV.linux.org.uk> <20160209231328.GK17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160209231328.GK17997@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Feb 09, 2016 at 11:13:28PM +0000, Al Viro wrote: > On Tue, Feb 09, 2016 at 10:40:50PM +0000, Al Viro wrote: > > > And the version in orangefs-2.9.3.tar.gz (your Frankenstein module?) is > > vulnerable to the same race. 2.8.1 isn't - it ignores signals on the > > cancel, but that means waiting for cancel to be processed (or timed out) > > on any interrupted read() before we return to userland. We can return > > to that behaviour, of course, but I suspect that offloading it to something > > async (along with freeing the slot used by original operation) would be > > better from QoI point of view. > > That breakage had been introduced between 2.8.5 and 2.8.6 (at some point > during the spring of 2012). AFAICS, all versions starting with 2.8.6 are > vulnerable... Matter of fact, older versions are _also_ broken, but it's much harder to trigger; you need the daemon to stall long enough for read() to time out, then everything as before. After 2.8.5 all it takes on the read() side is a SIGKILL to read() caller just after the daemon has picked your request... TBH, it's tempting to kill the "wait for cancel to be finished" logics completely, mark cancel requests with a flag and stash the slot number into them. Then, if flag is present, let set_op_state_purged() and set_op_state_serviced() callers release the slot and drop the request. And have wait_for_direct_io() treat the "need to cancel" case as "fire and forget" - no waiting for anything, no releasing the slots, just free the original op and bugger off. The set_op_state_purged() part is where it gets delicate - we want to remove the sucker from list/hash before dropping it. AFAICS, it's doable - there's nothing in the "release slot and drop the request" that would not be allowed under the list/hash spinlocks... In principle, readdir problem could also be handled in a similar way, but there we'd need to have the interrupted readdir itself marked with that new flag and left in in-progress hash, so that when the response finally arrives it would be found and slot freeing would be handled ;-/ Doable, but not fun... If there is (or at least supposed to be) something that prevents completions of readdir requests (on unrelated directories, by different processes, etc.) out of order, PLEASE SAY SO. I would really prefer not to have to fight the readdir side of that mess; cancels are already bad enough ;-/