From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754224Ab2CWQCG (ORCPT ); Fri, 23 Mar 2012 12:02:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:63514 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751877Ab2CWQCD (ORCPT ); Fri, 23 Mar 2012 12:02:03 -0400 Date: Fri, 23 Mar 2012 17:01:29 +0100 From: Andrea Arcangeli To: Andrew Theurer Cc: Dan Smith , Peter Zijlstra , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Rik van Riel , Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC] AutoNUMA alpha6 Message-ID: <20120323160129.GG6661@redhat.com> References: <87fwd2d2kp.fsf@danplanet.com> <20120321124937.GX24602@redhat.com> <87limtboet.fsf@danplanet.com> <20120321225242.GL24602@redhat.com> <20120322001722.GQ24602@redhat.com> <873990buuy.fsf@danplanet.com> <20120322142735.GE24602@redhat.com> <20120322184925.GT24602@redhat.com> <87limsa2hm.fsf@danplanet.com> <4F6C857A.3070307@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F6C857A.3070307@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 23, 2012 at 09:15:22AM -0500, Andrew Theurer wrote: > We are working on the "more interesting benchmarks", starting with KVM > workloads. However, I must warn you all, more interesting = a lot more > time to run. These are a lot more complex in that they have real I/O, > and they can be a lot more challenging because there are response time > requirements (so fairness is an absolute requirement). We are getting a Awesome effort! The reason I intended to get THP native migration ASAP was exactly to avoid having to repeat the complex "long" benchmark later to have a more reliable figure of what is possible to achieve in the long term. For both KVM and even for AutoNUMA internals, it's very beneficial to run with THP on so please keep it on at all times. Very important: you also should make sure /sys/kernel/debug/kvm/largepages is increasing along with `grep Anon /proc/meminfo` while KVM allocates anonymous memory (the official qemu binary is still not patched to align the guest physical address space I'm afraid). I changed plans and I'm doing the cleanups and documentation first because that seems the bigger obstacle now as also pointed out by Dan. I'll submit a more documented and splitted version of AutoNUMA (autonuma-dev branch) by early next week. > baseline right now and re-running with our user-space VM-to-numa-node > placement program, which in the past achieved manual binding performance > or just slightly lower. We can then compare to these two solutions. If > there's something specific to collect (perhaps you have a lot of stats > or data in debugfs, etc) please let me know. If you get bad performance you can log debug info with: echo 1 >/sys/kernel/mm/autonuma/debug Other than that, the only tweak I would suggest for virt usage is: echo 15000 >/sys/kernel/mm/autonuma/knuma_scand/scan_sleep_pass_millisecs and if you notice the THP numbers are too low during the benchmark in `grep Anon /proc/meminfo` you can use: echo 10 >/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs With current autonuma and autonuma-dev branches, I already set the latter to 100 on NUMA hardware (upstream default was an unconditional 10000), but 10 would make khugepaged even faster at rebuilding THP. Not sure if getting as low as 10 is needed. But I mention it because 10 was used during specjbb and worked great. I would try with 100 first and lower to 10 as last resort. The workload changes for virt should not be as fast as with normal host workloads so a value of 100 should be enough. Once we get THP native migration this value can return to 10000. Thanks, Andrea