From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:36779 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757753AbcBIRky (ORCPT ); Tue, 9 Feb 2016 12:40:54 -0500 Date: Tue, 9 Feb 2016 17:40:49 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160209174049.GG17997@ZenIV.linux.org.uk> References: <20160130173413.GE17997@ZenIV.linux.org.uk> <20160130182731.GF17997@ZenIV.linux.org.uk> <20160206194210.GX17997@ZenIV.linux.org.uk> <20160207013835.GY17997@ZenIV.linux.org.uk> <20160207035331.GZ17997@ZenIV.linux.org.uk> <20160208233535.GC17997@ZenIV.linux.org.uk> <20160209033203.GE17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Feb 09, 2016 at 09:34:12AM -0500, Mike Marshall wrote: > > Objections? > > Heck no... I've been trying to keep from changing the protocol so as to > avoid making a whole nother project out of keeping the out-of-tree > Frankenstein version of the kernel module going, but getting this version > of the kernel module upstream and getting it infused with ideas from you > depth-of-knowledge folks is the real goal here. > > You're talking about changing orangefs_kernel_op_s (pvfs2_kernel_op_t > out of tree) and it doesn't cross the boundary into userspace... even if > it did, that "completion" structure looks like it has been been around > as long as any of the Linux versions we try to run on... OK. While we are at it... Remember the question about the need for devreq ->write_iter() to wait wait_for_direct_io() to finish copying the data from slots to final destination? You said that removing that wait ends up with daemon somehow stomping on those slots and I wonder if that was another effect of that double-free bug. Could you try, on top of those fixes, comment the entire if (op->downcall.type == ORANGEFS_VFS_OP_FILE_IO) { long n = wait_for_completion_interruptible_timeout(&op->done, op_timeout_secs * HZ); if (unlikely(n < 0)) { gossip_debug(GOSSIP_DEV_DEBUG, "%s: signal on I/O wait, aborting\n", __func__); } else if (unlikely(n == 0)) { gossip_debug(GOSSIP_DEV_DEBUG, "%s: timed out.\n", __func__); } } in orangefs_devreq_write_iter() out and see if the corruption happens?