From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:60393 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751035AbcBLAzI (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 11 Feb 2016 19:55:08 -0500
Date: Fri, 12 Feb 2016 00:55:04 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Message-ID: <20160212005504.GO17997@ZenIV.linux.org.uk>
References: <20160208233535.GC17997@ZenIV.linux.org.uk>
 <20160209033203.GE17997@ZenIV.linux.org.uk>
 <CAOg9mSTFivsJDL-Ppruivd8gp_iThdq6N6+bWg0TfLXpV=rs8g@mail.gmail.com>
 <20160209174049.GG17997@ZenIV.linux.org.uk>
 <CAOg9mSTzE7KmrVC1zWSgC+vo20HfKLrsM3VPkxnYLN9roi+ZOw@mail.gmail.com>
 <20160209221623.GI17997@ZenIV.linux.org.uk>
 <20160209224050.GJ17997@ZenIV.linux.org.uk>
 <20160209231328.GK17997@ZenIV.linux.org.uk>
 <20160210164435.GA4950@ZenIV.linux.org.uk>
 <CAOg9mSTM9cuZxZeXTb=9DbE4oRV54sG+TxkOCdUC-hVF3x_BRA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOg9mSTM9cuZxZeXTb=9DbE4oRV54sG+TxkOCdUC-hVF3x_BRA@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Thu, Feb 11, 2016 at 06:54:43PM -0500, Mike Marshall wrote:

> The readdir buffer isn't a shared buffer like the IO buffer is.
> The readdir buffer is preallocated when the client-core starts up
> though. The kernel module picks which readdir buffer slot that
> the client-core fills, but gets back a copy of that buffer - the
> trailer. Unless the kernel module isn't managing the buffer slots
> properly, the client core shouldn't have more than one upcall
> on-hand that specifies any particular buffer slot. The "kill -9"
> on a ls (or whatever) might lead to such mis-management,
> but since readdir decoding is happening on a discrete copy
> of the buffer slot that was filled by the client-core, it doesn't
> seem to me like it could be overwritten during a decode...
> 
> I believe there's nothing in userspace that guarantees that
> readdirs are replied to in the same order they are received...

So... why the hell does readdir need to be aware of those slots?  After
rereading this code I have to agree - it *does* copy that stuff, and no
sharing is happening at all.  But the only thing that buf_index is used
for is "which buffer do we memcpy() to before passing it to writev()?"
in the daemon, and it looks like it would've done just as well using
plain and simple malloc() in
    /* get a buffer for xfer of dirents */
    vfs_request->out_downcall.trailer_buf =
        PINT_dev_get_mapped_buffer(BM_READDIR, s_io_desc,
            vfs_request->in_upcall.req.readdir.buf_index);
in the copy_dirents_to_downcall()...

Why is that even a part of protocol?  Were you planning to do zero-copy
there at some point, but hadn't gotten around to that yet?  If you do
(and decide to do it via shared buffers rather than e.g. splice), please
add cancel for readdir before going there...