From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752553Ab3BOXmi (ORCPT ); Fri, 15 Feb 2013 18:42:38 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:49774 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750733Ab3BOXmh (ORCPT ); Fri, 15 Feb 2013 18:42:37 -0500 Date: Fri, 15 Feb 2013 15:42:35 -0800 From: Andrew Morton To: Johannes Weiner Cc: Rusty Russell , LKML , Nick Piggin , Stewart Smith , linux-mm@kvack.org, linux-arch@vger.kernel.org Subject: Re: [patch 1/2] mm: fincore() Message-Id: <20130215154235.0fb36f53.akpm@linux-foundation.org> In-Reply-To: <20130215231304.GB23930@cmpxchg.org> References: <87a9rbh7b4.fsf@rustcorp.com.au> <20130211162701.GB13218@cmpxchg.org> <20130211141239.f4decf03.akpm@linux-foundation.org> <20130215063450.GA24047@cmpxchg.org> <20130215132738.c85c9eda.akpm@linux-foundation.org> <20130215231304.GB23930@cmpxchg.org> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Feb 2013 18:13:04 -0500 Johannes Weiner wrote: > On Fri, Feb 15, 2013 at 01:27:38PM -0800, Andrew Morton wrote: > > On Fri, 15 Feb 2013 01:34:50 -0500 > > Johannes Weiner wrote: > > > > > + * The status is returned in a vector of bytes. The least significant > > > + * bit of each byte is 1 if the referenced page is in memory, otherwise > > > + * it is zero. > > > > Also, this is going to be dreadfully inefficient for some obvious cases. > > > > We could address that by returning the info in some more efficient > > representation. That will be run-length encoded in some fashion. > > > > The obvious way would be to populate an array of > > > > struct page_status { > > u32 present:1; > > u32 count:31; > > }; > > > > or whatever. > > I'm having a hard time seeing how this could be extended to more > status bits without stifling the optimization too much. See other email: add a syscall arg which specifies the boolean status which we're searching for. > If we just > add more status bits to one page_status, the likelihood of long runs > where all bits are in agreement decreases. But as the optimization > becomes less and less effective, we are stuck with an interface that > is more PITA than just using mmap and mincore again. > > The user has to supply a worst-case-sized vector with one struct > page_status per page in the range, but the per-page item will be > bigger than with the byte vector because of the additional run length > variable. Yes, we'd need to tell the kernel how much storage is available for the structures. > However, one struct page_status per run leaves you with a worst case > of one syscall per page in the range. Yes. > I dunno. The byte vector might not be optimal but its worst cases > seem more attractive, is just as extensible, and dead simple to use. But I think "which pages from this 4TB file are in core" will not be an uncommon usage, and writing a gig of memory to find three pages is just awful. I wonder what the most common usage would be (one should know this before merging the syscall :)). I guess "is this relatively-small range of the file in core" and/or "which pages from this relatively-small range of the file will I need to read", etc. The syscall should handle the common usages very well. But it shouldn't handle uncommon usages very badly!