From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758766Ab2CSMAB (ORCPT ); Mon, 19 Mar 2012 08:00:01 -0400 Received: from merlin.infradead.org ([205.233.59.134]:46242 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758668Ab2CSMAA convert rfc822-to-8bit (ORCPT ); Mon, 19 Mar 2012 08:00:00 -0400 Message-ID: <1332158367.18960.308.camel@twins> Subject: Re: [RFC][PATCH 00/26] sched/numa From: Peter Zijlstra To: Avi Kivity Cc: Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Dan Smith , Bharata B Rao , Lee Schermerhorn , Andrea Arcangeli , Rik van Riel , Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org Date: Mon, 19 Mar 2012 12:59:27 +0100 In-Reply-To: <4F671B90.3010209@redhat.com> References: <20120316144028.036474157@chello.nl> <4F670325.7080700@redhat.com> <1332155527.18960.292.camel@twins> <4F671B90.3010209@redhat.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2012-03-19 at 13:42 +0200, Avi Kivity wrote: > > Now if you want to be able to scan per-thread, you need per-thread > > page-tables and I really don't want to ever see that. That will blow > > memory overhead and context switch times. > > I thought of only duplicating down to the PDE level, that gets rid of > almost all of the overhead. You still get the significant CR3 cost for thread switches. [ /me grabs the SDM to find that PDE is what we in Linux call the pmd ] That'll cut the memory overhead down but also the severely impact the accuracy. Also, I still don't see how such a scheme would correctly identify per-cpu memory in guest kernels. While less frequent its still very common to do remote access to per-cpu data. So even if you did page granularity you'd get a fair amount of pages that are accesses by all threads (vcpus) in the scan interval, even thought they're primarily accesses by just one. If you go to pmd level you get even less information. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id 31CC56B00E8 for ; Mon, 19 Mar 2012 07:59:49 -0400 (EDT) Message-ID: <1332158367.18960.308.camel@twins> Subject: Re: [RFC][PATCH 00/26] sched/numa From: Peter Zijlstra Date: Mon, 19 Mar 2012 12:59:27 +0100 In-Reply-To: <4F671B90.3010209@redhat.com> References: <20120316144028.036474157@chello.nl> <4F670325.7080700@redhat.com> <1332155527.18960.292.camel@twins> <4F671B90.3010209@redhat.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Avi Kivity Cc: Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Dan Smith , Bharata B Rao , Lee Schermerhorn , Andrea Arcangeli , Rik van Riel , Johannes Weiner , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Mon, 2012-03-19 at 13:42 +0200, Avi Kivity wrote: > > Now if you want to be able to scan per-thread, you need per-thread > > page-tables and I really don't want to ever see that. That will blow > > memory overhead and context switch times. >=20 > I thought of only duplicating down to the PDE level, that gets rid of > almost all of the overhead.=20 You still get the significant CR3 cost for thread switches.=20 [ /me grabs the SDM to find that PDE is what we in Linux call the pmd ] That'll cut the memory overhead down but also the severely impact the accuracy. Also, I still don't see how such a scheme would correctly identify per-cpu memory in guest kernels. While less frequent its still very common to do remote access to per-cpu data. So even if you did page granularity you'd get a fair amount of pages that are accesses by all threads (vcpus) in the scan interval, even thought they're primarily accesses by just one. If you go to pmd level you get even less information. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org