From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756384AbaLHS5e (ORCPT ); Mon, 8 Dec 2014 13:57:34 -0500 Received: from mail-qg0-f44.google.com ([209.85.192.44]:51234 "EHLO mail-qg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756316AbaLHS5c (ORCPT ); Mon, 8 Dec 2014 13:57:32 -0500 MIME-Version: 1.0 In-Reply-To: <20141208184632.GG22149@ZenIV.linux.org.uk> References: <20141204202011.GO29748@ZenIV.linux.org.uk> <20141208164650.GB29028@node.dhcp.inet.fi> <20141208175805.GB22149@ZenIV.linux.org.uk> <20141208180824.GC22149@ZenIV.linux.org.uk> <20141208182012.GE22149@ZenIV.linux.org.uk> <20141208184632.GG22149@ZenIV.linux.org.uk> Date: Mon, 8 Dec 2014 10:57:31 -0800 X-Google-Sender-Auth: EmfqXeUg6T3ttVOoCmXDqW6TNkw Message-ID: Subject: Re: [RFC][PATCHES] iov_iter.c rewrite From: Linus Torvalds To: Al Viro Cc: "Kirill A. Shutemov" , Linux Kernel Mailing List , linux-fsdevel , Network Development Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 8, 2014 at 10:46 AM, Al Viro wrote: > On Mon, Dec 08, 2014 at 10:37:51AM -0800, Linus Torvalds wrote: > >> How about we make "kernel_read()" just clear O_DIRECT? Does that fix >> it to just use copies? > > Umm... clearing O_DIRECT on struct file that might have other references > to it isn't nice, to put it mildly... Yeah. > Frankly, stopping iov_iter_get_pages() on the first page in vmalloc/module > space looks like the sanest strategy anyway. We'd get the same behaviour > as we used to, and as for finit_module(2), well... put "fails if given > an O_DIRECT descriptor" and be done with that. If it used to fail, then by all means, just make this a failure in the new model. I really don't want to make core infrastructure silently just call vmalloc_to_page() to make things "work". And if it used to do "get_user_pages_fast()" then the old code really didn't work on vmalloc ranges anyway, since that one checks for not just _PAGE_PRESENT but also _PAGE_USER, which won't be set on a vmalloc() page. In fact, it should have failed on *all* kernel pages. > Alternatively, we can really go for > page = is_vmalloc_or_module_addr(addr) ? vmalloc_to_page(addr) > : virt_to_page(addr) > *pages++ = get_page(page); actually, no we cannot. Thinking some more about it, that "get_page(page)" is wrong in _all_ cases. It actually works better for vmalloc pages than for normal 1:1 pages, since it's actually seriously and *horrendously* wrong for the case of random kernel addresses which may not even be refcounted to begin with. So the whole "get_page()" thing is broken. Iterating over pages in a KVEC is simply wrong, wrong, wrong. It needs to fail. Iterating over a KVEC to *copy* data is ok. But no page lookup stuff or page reference things. The old code that apparently used "get_user_pages_fast()" was ok almost by mistake, because it fails on all kernel pages. Linus