From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:60393 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751035AbcBLAzI (ORCPT ); Thu, 11 Feb 2016 19:55:08 -0500 Date: Fri, 12 Feb 2016 00:55:04 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160212005504.GO17997@ZenIV.linux.org.uk> References: <20160208233535.GC17997@ZenIV.linux.org.uk> <20160209033203.GE17997@ZenIV.linux.org.uk> <20160209174049.GG17997@ZenIV.linux.org.uk> <20160209221623.GI17997@ZenIV.linux.org.uk> <20160209224050.GJ17997@ZenIV.linux.org.uk> <20160209231328.GK17997@ZenIV.linux.org.uk> <20160210164435.GA4950@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Feb 11, 2016 at 06:54:43PM -0500, Mike Marshall wrote: > The readdir buffer isn't a shared buffer like the IO buffer is. > The readdir buffer is preallocated when the client-core starts up > though. The kernel module picks which readdir buffer slot that > the client-core fills, but gets back a copy of that buffer - the > trailer. Unless the kernel module isn't managing the buffer slots > properly, the client core shouldn't have more than one upcall > on-hand that specifies any particular buffer slot. The "kill -9" > on a ls (or whatever) might lead to such mis-management, > but since readdir decoding is happening on a discrete copy > of the buffer slot that was filled by the client-core, it doesn't > seem to me like it could be overwritten during a decode... > > I believe there's nothing in userspace that guarantees that > readdirs are replied to in the same order they are received... So... why the hell does readdir need to be aware of those slots? After rereading this code I have to agree - it *does* copy that stuff, and no sharing is happening at all. But the only thing that buf_index is used for is "which buffer do we memcpy() to before passing it to writev()?" in the daemon, and it looks like it would've done just as well using plain and simple malloc() in /* get a buffer for xfer of dirents */ vfs_request->out_downcall.trailer_buf = PINT_dev_get_mapped_buffer(BM_READDIR, s_io_desc, vfs_request->in_upcall.req.readdir.buf_index); in the copy_dirents_to_downcall()... Why is that even a part of protocol? Were you planning to do zero-copy there at some point, but hadn't gotten around to that yet? If you do (and decide to do it via shared buffers rather than e.g. splice), please add cancel for readdir before going there...