From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 76A8E6B01EE for ; Mon, 12 Apr 2010 04:14:38 -0400 (EDT) Date: Mon, 12 Apr 2010 18:14:31 +1000 From: Nick Piggin Subject: Re: hugepages will matter more in the future Message-ID: <20100412081431.GT5683@laptop> References: <4BC19916.20100@redhat.com> <20100411110015.GA10149@elte.hu> <4BC1B034.4050302@redhat.com> <20100411115229.GB10952@elte.hu> <4BC1EE13.7080702@redhat.com> <4BC1F31E.2050009@redhat.com> <20100412074557.GA18485@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100412074557.GA18485@elte.hu> Sender: owner-linux-mm@kvack.org To: Ingo Molnar Cc: Avi Kivity , Linus Torvalds , Jason Garrett-Glaser , Mike Galbraith , Andrea Arcangeli , Pekka Enberg , Andrew Morton , linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Izik Eidus , Hugh Dickins , Rik van Riel , Mel Gorman , Dave Hansen , Benjamin Herrenschmidt , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , Arnd Bergmann , "Michael S. Tsirkin" , Peter Zijlstra , Johannes Weiner , Daisuke Nishimura , Arjan van de Ven List-ID: On Mon, Apr 12, 2010 at 09:45:57AM +0200, Ingo Molnar wrote: > > * Avi Kivity wrote: > > > On 04/11/2010 06:52 PM, Linus Torvalds wrote: > > > > > >On Sun, 11 Apr 2010, Avi Kivity wrote: > > >> > > >> And yet Oracle and java have options to use large pages, and we know > > >> google and HPC like 'em. Maybe they just haven't noticed the fundamental > > >> brokenness yet. > > ( Add Firefox to the mix too - it too allocates in 1MB/2MB chunks. Perhaps > Xorg as well. ) > > > > The thing is, what you are advocating is what traditional UNIX did. > > > Prioritizing the special cases rather than the generic workloads. > > > > > > And I'm telling you, it's wrong. Traditional Unix is dead, and it's dead > > > exactly _because_ it prioritized those kinds of loads. > > > > This is not a specialized workload. Plenty of sites are running java, > > plenty of sites are running Oracle (though that won't benefit from anonymous > > hugepages), and plenty of sites are running virtualization. Not everyone > > does two kernel builds before breakfast. > > Java/virtualization/DBs, and, to a certain sense Firefox have basically become > meta-kernels: they offer their own intermediate APIs to their own style of > apps - and those apps generally have no direct access to the native Linux > kernel. > > And just like the native kernel has been enjoying the benefits of 2MB pages > for more than a decade, do these other entities want to enjoy similar benefits > as well. Fair is fair. > > Like it or not, combined end-user attention/work spent in these meta-kernels > is rising steadily, while apps written in raw C are becoming the exception. > > So IMHO we really have roughly three logical choices: I don't see how these are the logical choices. I don't really see how they are even logical in some ways. Let's say that Andrea's patches offer 5% improvement in best-cases (that are not stupid microbenchmarks) and 0% in worst cases, and X% "on average" (whatever that means). Then it is simply a set of things to weigh against the added complexity (both in terms of code and performance characteristics of the system) that it is introduced. I don't really see how it is fundamentally different to any other patch that speeds things up. > 1) either we accept that the situation is the fault of our technology and > subsequently we reform and modernize the Linux syscall ABIs to be more > friendly to apps (offer built-in GC and perhaps JIT concepts, perhaps > offer a compiler, offer a wider range of libraries with better > integration, etc.) I don't see how this would bring transparent hugepages to userspace. We may offload some services to the kernel, but the *memory mappings* that get used by userspace obviously still go through TLBs. > 2) or we accept the fact that the application space is shifting to the > meta-kernels - and then we should agressively optimize Linux for those > meta-kernels and not pretend that they are 'specialized'. They literally > represent tens of thousands of applications apiece. And if meta-kernels (or whatever you want to call a common or important workload) see some speedup that is deemed to be worth the cost of the patch, then it will probably get merged. Same as anything else. > 3) or we should continue to muddle through somewhere in the middle, hoping > that the 'pure C apps' win in the end (despite 10 years of a decline) and > pretend that the meta-kernels are just 'specialized' workloads. 'pure C apps' (I don't know what you mean by this, but just non-GC memory?) can still see benefits from using hugepages. And I wouldn't say we're muddling through. Linux has been one of the if not the most successful OS kernel of the last 10 years not because of muddling. IMO in large part it is because we haven't been forced to tick boxes for marketing idiots or be pressured by special interests to the detriment of the common cases. > Right now we are doing 3) and i think it's delusive and a mistake. I think we > should be doing 1) - but failing that we have to be honest and do 2). Nothing wrong with carefully evaluating a performance improvement, but there is nothing urgent or huge fundamental reason we need to lose our heads and be irrational about it. If the world was coming to an end without hugepages, then we'd see more than 5% improvement I would have thought. Fact is that computing is based on locality of reference, and performance has continued to scale long past the big bad "memory wall" because real working set sizes (on the scale of CPU instructions, not on the scale of page reclaim) have not grown linearly with RAM sizes. Probably logarithmically or something. Sure there are some pointer chasing apps that will always (and ~have always) suck. We are also irriversibly getting into explicit parallelism (like multi core and multi threading) to work around all sorts of fundamental limits to single thread performance, not just TLB filling. So let's not be melodramatic about this :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org