From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id 6AD696B0069 for ; Mon, 2 Jul 2012 04:17:51 -0400 (EDT) Message-ID: <4FF1590D.6020805@redhat.com> Date: Mon, 02 Jul 2012 04:17:17 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: [PATCH 13/40] autonuma: CPU follow memory algorithm References: <1340888180-15355-1-git-send-email-aarcange@redhat.com> <1340888180-15355-14-git-send-email-aarcange@redhat.com> <1340895238.28750.49.camel@twins> <20120629125517.GD32637@gmail.com> <4FEDDD0C.60609@redhat.com> <1340995986.28750.114.camel@twins> <20120630012338.GY6676@redhat.com> <20120630124816.GZ6676@redhat.com> <4FEF1703.1070506@gmail.com> <4FF14F62.2040702@redhat.com> <4FF15417.8020609@gmail.com> In-Reply-To: <4FF15417.8020609@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: nai.xia@gmail.com Cc: Andrea Arcangeli , Peter Zijlstra , dlaor@redhat.com, Ingo Molnar , Hillf Danton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Dan Smith , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Johannes Weiner , Srivatsa Vaddagiri , Christoph Lameter , Alex Shi , Mauricio Faria de Oliveira , Konrad Rzeszutek Wilk , Don Morris , Benjamin Herrenschmidt On 07/02/2012 03:56 AM, Nai Xia wrote: > > > On 2012a1'07ae??02ae?JPY 15:36, Rik van Riel wrote: >> On 06/30/2012 11:10 AM, Nai Xia wrote: >> >>> Yes, pte_numa or pte_young works the same way and they both can >>> answer the problem of "which pages were accessed since last scan". >>> For LRU, it's OK, it's quite enough. But for numa balancing it's NOT. >> >> Getting LRU right may be much more important than getting >> NUMA balancing right. >> >> Retrieving wrongly evicted data from disk can be a million >> of times slower than fetching data from RAM, while the >> penalty for accessing a remote NUMA node is only 20% or so. >> >>> We also should care about the hotness of the page sets, since if the >>> workloads are complex we should NOT be expecting that "if this page >>> is accessed once, then it's always in my CPU cache during the whole >>> last scan interval". >>> >>> The difference between LRU and the problem you are trying to deal >>> with looks so obvious to me, I am so worried that you are still >>> messing them up :( >> >> For autonuma, it may be fine to have a lower likelyhood of >> obtaining an optimum result, because the penalty for getting >> it wrong is so much lower. > > I said, I am actually want to see some detailed analysis > showing that this sampling is really playing an important role > in benchmarks as it claims to be. Not a quick > "lower likelyhood than optimum" conclusion..... > > Please, Rik, I know your points, you don't have to explain > anymore. But I just cannot follow without research data. What kind of data are you looking for? I have seen a lot of generic comments in your emails, and one gut feeling about Andrea's sampling algorithm, but I seem to have missed the details of exactly what you are looking for. Btw, I share your feeling that Andrea's sampling algorithm will probably not be able to distinguish between NUMA nodes that are very frequent users of a page, and NUMA nodes that use the same page much less frequently. However, I suspect that the penalty of getting it wrong will be fairly low, while the overhead of getting access frequency information will be prohibitively high. There is a reason nobody uses LRU nowadays, but a clock style algorithm instead. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org