All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <pzijlstr@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hillf Danton <dhillf@gmail.com>,
	Andrew Jones <drjones@redhat.com>, Dan Smith <danms@us.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Turner <pjt@google.com>, Christoph Lameter <cl@linux.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Mike Galbraith <efault@gmx.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH 04/33] autonuma: define _PAGE_NUMA
Date: Thu, 11 Oct 2012 18:43:00 +0200	[thread overview]
Message-ID: <20121011164300.GN1818@redhat.com> (raw)
In-Reply-To: <20121011110137.GQ3317@csn.ul.ie>

On Thu, Oct 11, 2012 at 12:01:37PM +0100, Mel Gorman wrote:
> On Thu, Oct 04, 2012 at 01:50:46AM +0200, Andrea Arcangeli wrote:
> > The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page
> > faults to identify the per NUMA node working set of the thread at
> > runtime.
> > 
> > Arming the NUMA hinting page fault mechanism works similarly to
> > setting up a mprotect(PROT_NONE) virtual range: the present bit is
> > cleared at the same time that _PAGE_NUMA is set, so when the fault
> > triggers we can identify it as a NUMA hinting page fault.
> > 
> 
> That implies that there is an atomic update requirement or at least
> an ordering requirement -- present bit must be cleared before setting
> NUMA bit. No doubt it'll be clear later in the series how this is
> accomplished. What you propose seems ok but it all depends how it's
> implemented so I'm leaving my ack off this particular patch for now.

Correct. The switch is done atomically (clear _PAGE_PRESENT at the
same time _PAGE_NUMA is set). The tlb flush is deferred (it's batched
to avoid firing an IPI for every pte/pmd_numa we establish).

It's still similar to setting a range PROT_NONE (except the way
_PAGE_PROTNONE and _PAGE_NUMA works is the opposite, and they are
mutually exclusive, so they can easily share the same pte/pmd
bitflag). Except PROT_NONE must be synchronous, _PAGE_NUMA is set lazily.

The NUMA hinting page fault also won't require any TLB flush ever.

So the whole process (establish/teardown) has an incredibly low TLB
flushing cost.

The only fixed cost is in knuma_scand and the enter/exit kernel for
every not-shared page every 10 sec (or whatever you set the duration
of a knuma_scand pass in sysfs).

Furthermore, if the pmd_scan mode is activated, I guarantee there's at
max 1 NUMA hinting page fault every 2m virtual region (even if some
accuracy is lost). You can try to set scan_pmd = 0 in sysfs and also
to disable THP (echo never >enabled) to measure the exact cost per 4k
page. It's hardly measurable here. With THP the fault is also 1 every
2m virtual region but no accuracy is lost in that case (or more
precisely, there's no way to get more accuracy than that as we deal
with a pmd).

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <pzijlstr@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hillf Danton <dhillf@gmail.com>,
	Andrew Jones <drjones@redhat.com>, Dan Smith <danms@us.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Turner <pjt@google.com>, Christoph Lameter <cl@linux.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Mike Galbraith <efault@gmx.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH 04/33] autonuma: define _PAGE_NUMA
Date: Thu, 11 Oct 2012 18:43:00 +0200	[thread overview]
Message-ID: <20121011164300.GN1818@redhat.com> (raw)
In-Reply-To: <20121011110137.GQ3317@csn.ul.ie>

On Thu, Oct 11, 2012 at 12:01:37PM +0100, Mel Gorman wrote:
> On Thu, Oct 04, 2012 at 01:50:46AM +0200, Andrea Arcangeli wrote:
> > The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page
> > faults to identify the per NUMA node working set of the thread at
> > runtime.
> > 
> > Arming the NUMA hinting page fault mechanism works similarly to
> > setting up a mprotect(PROT_NONE) virtual range: the present bit is
> > cleared at the same time that _PAGE_NUMA is set, so when the fault
> > triggers we can identify it as a NUMA hinting page fault.
> > 
> 
> That implies that there is an atomic update requirement or at least
> an ordering requirement -- present bit must be cleared before setting
> NUMA bit. No doubt it'll be clear later in the series how this is
> accomplished. What you propose seems ok but it all depends how it's
> implemented so I'm leaving my ack off this particular patch for now.

Correct. The switch is done atomically (clear _PAGE_PRESENT at the
same time _PAGE_NUMA is set). The tlb flush is deferred (it's batched
to avoid firing an IPI for every pte/pmd_numa we establish).

It's still similar to setting a range PROT_NONE (except the way
_PAGE_PROTNONE and _PAGE_NUMA works is the opposite, and they are
mutually exclusive, so they can easily share the same pte/pmd
bitflag). Except PROT_NONE must be synchronous, _PAGE_NUMA is set lazily.

The NUMA hinting page fault also won't require any TLB flush ever.

So the whole process (establish/teardown) has an incredibly low TLB
flushing cost.

The only fixed cost is in knuma_scand and the enter/exit kernel for
every not-shared page every 10 sec (or whatever you set the duration
of a knuma_scand pass in sysfs).

Furthermore, if the pmd_scan mode is activated, I guarantee there's at
max 1 NUMA hinting page fault every 2m virtual region (even if some
accuracy is lost). You can try to set scan_pmd = 0 in sysfs and also
to disable THP (echo never >enabled) to measure the exact cost per 4k
page. It's hardly measurable here. With THP the fault is also 1 every
2m virtual region but no accuracy is lost in that case (or more
precisely, there's no way to get more accuracy than that as we deal
with a pmd).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-10-11 16:44 UTC|newest]

Thread overview: 148+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-03 23:50 [PATCH 00/33] AutoNUMA27 Andrea Arcangeli
2012-10-03 23:50 ` [PATCH 01/33] autonuma: add Documentation/vm/autonuma.txt Andrea Arcangeli
2012-10-11 10:50   ` Mel Gorman
2012-10-11 16:07     ` Andrea Arcangeli
2012-10-11 16:07       ` Andrea Arcangeli
2012-10-11 19:37       ` Mel Gorman
2012-10-11 19:37         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 02/33] autonuma: make set_pmd_at always available Andrea Arcangeli
2012-10-11 10:54   ` Mel Gorman
2012-10-03 23:50 ` [PATCH 03/33] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n Andrea Arcangeli
2012-10-11 10:54   ` Mel Gorman
2012-10-03 23:50 ` [PATCH 04/33] autonuma: define _PAGE_NUMA Andrea Arcangeli
2012-10-11 11:01   ` Mel Gorman
2012-10-11 16:43     ` Andrea Arcangeli [this message]
2012-10-11 16:43       ` Andrea Arcangeli
2012-10-11 19:48       ` Mel Gorman
2012-10-11 19:48         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 05/33] autonuma: pte_numa() and pmd_numa() Andrea Arcangeli
2012-10-11 11:15   ` Mel Gorman
2012-10-11 16:58     ` Andrea Arcangeli
2012-10-11 16:58       ` Andrea Arcangeli
2012-10-11 19:54       ` Mel Gorman
2012-10-11 19:54         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 06/33] autonuma: teach gup_fast about pmd_numa Andrea Arcangeli
2012-10-11 12:22   ` Mel Gorman
2012-10-11 17:05     ` Andrea Arcangeli
2012-10-11 17:05       ` Andrea Arcangeli
2012-10-11 20:01       ` Mel Gorman
2012-10-11 20:01         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures Andrea Arcangeli
2012-10-11 12:28   ` Mel Gorman
2012-10-11 15:24     ` Rik van Riel
2012-10-11 15:57       ` Mel Gorman
2012-10-12  0:23       ` Christoph Lameter
2012-10-12  0:52         ` Andrea Arcangeli
2012-10-12  0:52           ` Andrea Arcangeli
2012-10-11 17:15     ` Andrea Arcangeli
2012-10-11 17:15       ` Andrea Arcangeli
2012-10-11 20:06       ` Mel Gorman
2012-10-11 20:06         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 08/33] autonuma: define the autonuma flags Andrea Arcangeli
2012-10-11 13:46   ` Mel Gorman
2012-10-11 17:34     ` Andrea Arcangeli
2012-10-11 17:34       ` Andrea Arcangeli
2012-10-11 20:17       ` Mel Gorman
2012-10-11 20:17         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 09/33] autonuma: core autonuma.h header Andrea Arcangeli
2012-10-03 23:50 ` [PATCH 10/33] autonuma: CPU follows memory algorithm Andrea Arcangeli
2012-10-11 14:58   ` Mel Gorman
2012-10-12  0:25     ` Andrea Arcangeli
2012-10-12  0:25       ` Andrea Arcangeli
2012-10-12  8:29       ` Mel Gorman
2012-10-12  8:29         ` Mel Gorman
2012-10-03 23:50 ` [PATCH 11/33] autonuma: add the autonuma_last_nid in the page structure Andrea Arcangeli
2012-10-03 23:50 ` [PATCH 12/33] autonuma: Migrate On Fault per NUMA node data Andrea Arcangeli
2012-10-11 15:43   ` Mel Gorman
2012-10-03 23:50 ` [PATCH 13/33] autonuma: autonuma_enter/exit Andrea Arcangeli
2012-10-11 13:50   ` Mel Gorman
2012-10-03 23:50 ` [PATCH 14/33] autonuma: call autonuma_setup_new_exec() Andrea Arcangeli
2012-10-11 15:47   ` Mel Gorman
2012-10-03 23:50 ` [PATCH 15/33] autonuma: alloc/free/init task_autonuma Andrea Arcangeli
2012-10-11 15:53   ` Mel Gorman
2012-10-11 17:34     ` Rik van Riel
     [not found]       ` <20121011175953.GT1818@redhat.com>
2012-10-12 14:03         ` Rik van Riel
2012-10-12 14:03           ` Rik van Riel
2012-10-03 23:50 ` [PATCH 16/33] autonuma: alloc/free/init mm_autonuma Andrea Arcangeli
2012-10-03 23:50 ` [PATCH 17/33] autonuma: prevent select_task_rq_fair to return -1 Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 18/33] autonuma: teach CFS about autonuma affinity Andrea Arcangeli
2012-10-05  6:41   ` Mike Galbraith
2012-10-05 11:54     ` Andrea Arcangeli
2012-10-06  2:39       ` Mike Galbraith
2012-10-06 12:34         ` Andrea Arcangeli
2012-10-07  6:07           ` Mike Galbraith
2012-10-08  7:03             ` Mike Galbraith
2012-10-03 23:51 ` [PATCH 19/33] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection Andrea Arcangeli
2012-10-10 22:01   ` Rik van Riel
2012-10-10 22:36     ` Andrea Arcangeli
2012-10-11 18:28   ` Mel Gorman
2012-10-13 18:06   ` Srikar Dronamraju
2012-10-15  8:24     ` Srikar Dronamraju
2012-10-15  8:24       ` Srikar Dronamraju
2012-10-15  9:20       ` Mel Gorman
2012-10-15  9:20         ` Mel Gorman
2012-10-15 10:00         ` Srikar Dronamraju
2012-10-15 10:00           ` Srikar Dronamraju
2012-10-03 23:51 ` [PATCH 20/33] autonuma: default mempolicy follow AutoNUMA Andrea Arcangeli
2012-10-04 20:03   ` KOSAKI Motohiro
2012-10-11 18:32   ` Mel Gorman
2012-10-03 23:51 ` [PATCH 21/33] autonuma: call autonuma_split_huge_page() Andrea Arcangeli
2012-10-11 18:33   ` Mel Gorman
2012-10-03 23:51 ` [PATCH 22/33] autonuma: make khugepaged pte_numa aware Andrea Arcangeli
2012-10-11 18:36   ` Mel Gorman
2012-10-03 23:51 ` [PATCH 23/33] autonuma: retain page last_nid information in khugepaged Andrea Arcangeli
2012-10-11 18:44   ` Mel Gorman
2012-10-12 11:37     ` Rik van Riel
2012-10-12 12:35       ` Mel Gorman
2012-10-03 23:51 ` [PATCH 24/33] autonuma: split_huge_page: transfer the NUMA type from the pmd to the pte Andrea Arcangeli
2012-10-11 18:45   ` Mel Gorman
2012-10-03 23:51 ` [PATCH 25/33] autonuma: numa hinting page faults entry points Andrea Arcangeli
2012-10-11 18:47   ` Mel Gorman
2012-10-03 23:51 ` [PATCH 26/33] autonuma: reset autonuma page data when pages are freed Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 27/33] autonuma: link mm/autonuma.o and kernel/sched/numa.o Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 28/33] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED Andrea Arcangeli
2012-10-11 18:50   ` Mel Gorman
2012-10-03 23:51 ` [PATCH 29/33] autonuma: page_autonuma Andrea Arcangeli
2012-10-04 14:16   ` Christoph Lameter
2012-10-04 20:09   ` KOSAKI Motohiro
2012-10-05 11:31     ` Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 30/33] autonuma: bugcheck page_autonuma fields on newly allocated pages Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 31/33] autonuma: boost khugepaged scanning rate Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 32/33] autonuma: add migrate_allow_first_fault knob in sysfs Andrea Arcangeli
2012-10-03 23:51 ` [PATCH 33/33] autonuma: add mm_autonuma working set estimation Andrea Arcangeli
2012-10-04 18:39 ` [PATCH 00/33] AutoNUMA27 Andrew Morton
2012-10-04 20:49   ` Rik van Riel
2012-10-05 23:08   ` Rik van Riel
2012-10-05 23:14   ` Andi Kleen
2012-10-05 23:14     ` Andi Kleen
2012-10-05 23:57     ` Tim Chen
2012-10-05 23:57       ` Tim Chen
2012-10-06  0:11       ` Andi Kleen
2012-10-06  0:11         ` Andi Kleen
2012-10-08 13:44         ` Don Morris
2012-10-08 13:44           ` Don Morris
2012-10-08 20:34     ` Rik van Riel
2012-10-08 20:34       ` Rik van Riel
2012-10-11 10:19 ` Mel Gorman
2012-10-11 14:56   ` Andrea Arcangeli
2012-10-11 14:56     ` Andrea Arcangeli
2012-10-11 15:35     ` Mel Gorman
2012-10-11 15:35       ` Mel Gorman
2012-10-12  0:41       ` Andrea Arcangeli
2012-10-12  0:41         ` Andrea Arcangeli
2012-10-12 14:54       ` Mel Gorman
2012-10-12 14:54         ` Mel Gorman
2012-10-11 21:34 ` Mel Gorman
2012-10-12  1:45   ` Andrea Arcangeli
2012-10-12  1:45     ` Andrea Arcangeli
2012-10-12  8:46     ` Mel Gorman
2012-10-12  8:46       ` Mel Gorman
2012-10-13 18:40 ` Srikar Dronamraju
2012-10-13 18:40   ` Srikar Dronamraju
2012-10-14  4:57   ` Andrea Arcangeli
2012-10-14  4:57     ` Andrea Arcangeli
2012-10-15  8:16     ` Srikar Dronamraju
2012-10-15  8:16       ` Srikar Dronamraju
2012-10-23 16:32     ` Srikar Dronamraju
2012-10-23 16:32       ` Srikar Dronamraju
2012-10-16 13:48 ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121011164300.GN1818@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=danms@us.ibm.com \
    --cc=dhillf@gmail.com \
    --cc=drjones@redhat.com \
    --cc=efault@gmx.de \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=pzijlstr@redhat.com \
    --cc=riel@redhat.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.