From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753128AbZK3QMQ (ORCPT ); Mon, 30 Nov 2009 11:12:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752742AbZK3QMN (ORCPT ); Mon, 30 Nov 2009 11:12:13 -0500 Received: from mx1.redhat.com ([209.132.183.28]:11846 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752291AbZK3QMM (ORCPT ); Mon, 30 Nov 2009 11:12:12 -0500 Date: Mon, 30 Nov 2009 17:12:05 +0100 From: Andrea Arcangeli To: Nick Piggin Cc: Mark Veltzer , linux-kernel@vger.kernel.org, Hugh Dickins , Andi Kleen , KOSAKI Motohiro , Michael Kerrisk Subject: Re: get_user_pages question Message-ID: <20091130161205.GE30235@random.random> References: <200911090850.26724.mark.veltzer@gmail.com> <20091128185052.GB30235@random.random> <200911290022.17568.mark.veltzer@gmail.com> <20091130120145.GB21639@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20091130120145.GB21639@wotan.suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 30, 2009 at 01:01:45PM +0100, Nick Piggin wrote: > If you can wean O_DIRECT off get_user_pages, you'd have most of the > battle won. I don't think it's really possible though. Agreed. Not just O_DIRECT, virtualization requires it too, the kvm page fault calls get_user_pages, practically anything that uses mmu notifier also uses get_user_pages. There are things you simply can't do without it. In general if the memory doesn't need to be persistently stored on disk to survive task killage, there's not much point in using pagecache MAP_SHARED on-disk, instead of anonymous memory, this is why anonymous memory is backing malloc, and there's no reason why people should be prevented to issue disk I/O in zero-copy with anonymous memory (or tmpfs), if they know they access this data only once and they want to manage the cache in some logical form rather than in physical on-disk format (or if there are double physical caches more efficient kept elsewhere, like in KVM guest case). OTOH if you'd be using the I/O data in physical format in your userland memory, then using pagecache by mmapping the file and disabling O_DIRECT on the filesystem is surely preferred and more efficient (if nothing else, because it also provides caching just in case). For drivers (Mark's case) it depends, but if you can avoid to use get_user_pages without slowing down anything you should, that usually makes code simpler... and it won't risk to suffer from these race conditions either ;).