From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:58405 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754760AbcAWWqf (ORCPT ); Sat, 23 Jan 2016 17:46:35 -0500 Date: Sat, 23 Jan 2016 22:46:32 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel Subject: write() semantics (Re: Orangefs ABI documentation) Message-ID: <20160123224632.GQ17997@ZenIV.linux.org.uk> References: <20160122200442.GF17997@ZenIV.linux.org.uk> <20160123001202.GJ17997@ZenIV.linux.org.uk> <20160123012808.GK17997@ZenIV.linux.org.uk> <20160123191055.GN17997@ZenIV.linux.org.uk> <20160123214006.GO17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160123214006.GO17997@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Jan 23, 2016 at 09:40:06PM +0000, Al Viro wrote: > Yes... BTW, speaking of that codepath - how can the second caller of > handle_io_error() ever get !op_state_serviced(new_op)? That failure, > after all, had been in postcopy_buffers(), so the daemon is sitting > in its write_iter() waiting until we finish copying the data out of > bufmap; it's too late for sending cancel anyway, is it not? IOW, would > the following do the right thing? That would've left us with only > one caller of handle_io_error()... FWIW, I'm not sure I like the correctness implications of the cancel thing. Look: we do large write(), it sends a couple of chunks successfully, gets to submitting the third one, copies its data to bufmap, tells the daemon to start writing, then gets a signal, sends cancel and buggers off. What should we get? -EINTR, despite having written some data? That's what the code does now, but I'm not sure it's what the userland expects. Two chunks worth of data we'd written? That's what one would expect if the third one had hit an unmapped page, but in scenario with a signal hitting us the daemon might very well have overwritten more of the file by the time it had seen the cancel. AFAICS, POSIX flat-out prohibits the current behaviour - what it says for write(2) is [EINTR] The write operation was terminated due to the receipt of a signal, and no data was transferred. ^^^^^^^^^^^^^^^^^^^^^^^^^^^ but I'm not sure if "return a short write and to hell with having some data beyond the returned amount actually written" would be better from the userland POV. It would be closer to what e.g. NFS is doing, though... Linus?