From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753362AbcI0HfA (ORCPT ); Tue, 27 Sep 2016 03:35:00 -0400 Received: from mail-lf0-f45.google.com ([209.85.215.45]:32784 "EHLO mail-lf0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753628AbcI0Hes (ORCPT ); Tue, 27 Sep 2016 03:34:48 -0400 MIME-Version: 1.0 In-Reply-To: <20160927034035.GG19539@ZenIV.linux.org.uk> References: <1473842236-28655-1-git-send-email-mszeredi@redhat.com> <1473842236-28655-7-git-send-email-mszeredi@redhat.com> <20160927034035.GG19539@ZenIV.linux.org.uk> From: Miklos Szeredi Date: Tue, 27 Sep 2016 09:34:46 +0200 Message-ID: Subject: Re: [PATCH 06/11] pipe: no need to confirm page cache buf To: Al Viro Cc: linux-fsdevel , lkml , Jens Axboe , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 27, 2016 at 5:40 AM, Al Viro wrote: > On Wed, Sep 14, 2016 at 10:37:11AM +0200, Miklos Szeredi wrote: > >> Things could happen to that page that make it not uptodate while sitting in >> the pipe, but it's questionable whether we should care about that. >> Checking for being uptodate in the face of such page state change is always >> going to be racy. > > I'm not sure it's the right thing to do here; that area looks like a victim > of serious bitrot - once upon a time it was ->map(), which used to lock > page, verity that it's valid, and kmap it. ->unmap() did kunmap + unlock. > > Then the validate part got split off (->pin(), later renamed to ->confirm()), > with lock _not_ held over the kmap/kunmap. That's the point when it got racy, > AFAICS. OTOH, I would really hate to hold a page locked over e.g. copying to > userland - too easy to get deadlocks that way. > > Jens, could you comment? Pages definitely shouldn't be getting into pipe > without having been uptodate; the question is what (if anything) should be > done about having a page go non-uptodate (on truncate, hole-punching, etc.) > while a reference to it is sitting in a pipe buffer. Truncate/holepunch is interesting. It doesn't make the page go non-uptodate, just clears page->mapping. Works like a charm, old data can be read from the pipe just fine. Partial truncate is even more interesting, because it will result in partially cleared data (race is there with plain read() as well, AFAICS, but very narrow). Page invalidation by filesystems is similar to truncate. Old data can be read from the pipe. And in fact this probably the way we want it to work, since redoing the page lookup in ->confirm() would be really messy. Also just modifying the data sitting in the pipe will also result in undefined behavior; either the old or the new data can be read out from the other end of the pipe. And while not explicitly documented, all the above cases are fine and implicit in the non-copy behavior of splice. Perhaps the man page should note that modifying data after splice but before reading from the other end of the pipe results in undefined behavior. I haven't looked at filesystems, but generic code calls ClearPageUptodate() from only a few places: end_swap_bio_read(): does it on a non-uptodate page page_endio(): AFAICS part of the page reading chain, again doing it on a non-uptodate page me_swapcache_dirty(): memory error on dirty swapcache page, this is the only one that would make sense to trigger EIO on reading the pipe buffer. But then why only the dirty swapcache case? Thanks, Miklos