All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dieter an Mey <anmey@rz.rwth-aachen.de>
To: Stefan Lankes <lankes@lfbs.rwth-aachen.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/4]: affinity-on-next-touch
Date: Mon, 11 May 2009 10:48:06 +0200	[thread overview]
Message-ID: <4A07E646.7090100@rz.rwth-aachen.de> (raw)
In-Reply-To: <000c01c9d212$4c244720$e46cd560$@rwth-aachen.de>

[-- Attachment #1: Type: text/plain, Size: 5336 bytes --]

Hello,

I am supporting Stefan's activity from the parallel programmer's 
perspective and I would be happy to provide further input, if needed.

best regards
Dieter

Stefan Lankes schrieb:
> Hello,
> 
> I wrote a patch to support the adaptive data distribution strategy
> "affinity-on-next-touch" for NUMA architectures. The patch is in an early
> state and I am interested in your comments.
> 
> The basic idea of "affinity-on-next-touch" is this: Via some runtime
> mechanism, a user-level process activates "affinity-on-next-touch" for a
> certain region of its virtual memory space. Afterwards, each page in this
> region will be migrated to that node which next tries to access it.
> Noordergraaf and van der Pas [1] have proposed to extend the OpenMP standard
> to support this strategy. Since version 9, the “affinity-on-next-touch”
> mechanism is available in Solaris and can be triggered via the madvise
> system call. Löf and Homgren [2] and Terboven et al. [3] have described
> their encouraging experiences with this implementation.
> 
> Linux does not yet support "affinity-on-next-touch". Terboven et al. [3]
> have presented a user-level implementation of this strategy for Linux. To
> realize "affinity-on-next-touch" in user space, they protect a specific
> memory area from read and write accesses and install a signal handler to
> catch access violations. If a thread accesses a page in the protected memory
> area, the signal handler migrates the page to the node which handled the
> access violation. Afterwards, the signal handler clears the page protection
> and the interrupted thread is resumed. 
> 
> Unfortunately, the overhead of this solution is very high. For instance, to
> distribute 512 MByte via "affinity-on-next-touch" the user-level solution
> needs 2518ms on our dual-socket, quad-core Opteron 2376 system with the
> current kernel (2.6.30-rc4). I evaluated this overhead with the following
> OpenMP code:
> 
>    madvise(array, sizeof(int) * SIZE, MADV_ACCESS_LWP);
>    start = omp_get_wtime();
>    #pragma omp parallel for
>    for (j = 0; j < SIZE; j += pagesize/sizeof(int))
>       array[j]++;
>    end = omp_get_wtime();
>    printf("time: %lf ms\n", (end - start) * 1000.0);
> 
> The benchmark uses 8 threads and each thread is bound to one core.
> 
> I divide my patch into the following 4 parts:
> 
> [Patch 1/4]: Extend the system call madvise with a new parameter
> MADV_ACCESS_LWP (the same as used in Solaris). The specified memory area
> then uses "affinity-on-next-touch".  In this case, madvise_access_lwp
> protects the memory area from read and write access. To avoid unnecessary
> list operations, the patch changes the permissions only in the page table
> entries and does not update the list of VMAs. Beside this, the system call
> madvise set also the new “untouched bit” of the "page" record.
> 
> [Patch 2/4]: The pte fault handler detects, via a new "untouched bit" inside
> of the "page" record, that the page which the thread tried to access uses
> “affinity-on-next-touch”. Afterwards, the kernel reads the original
> permissions from vm_area_struct, restores them in the page tables and
> migrates the page to the current node. To accelerate page migration, the
> patch avoids unnecessary calls to migrate_prep().
> 
> [Patch 3/4]: If the "untouched" bit is set, mprotect isn’t permitted to
> change the permission in the page table entry. By using of
> "affinity-on-next-touch", the access permission will be set by the pte fault
> handler.
> 
> [Patch 4/4]: This part of the patch adds some counters to detect migration
> errors and publishes these counters via /proc/vmstat. Besides this, the
> Kconfig file is extend with the parameter CONFIG_AFFINITY_ON_NEXT_TOUCH.
> 
> With this patch, the kernel reduces the overhead of page distribution via
> "affinity-on-next-touch" from 2518ms to 366ms compared to the user-level
> approach. Currently, I'm evaluating the performance of the patch with some
> other benchmarks and test applications (stream benchmark, Jacobi solver, PDE
> solver,...).
> 
> I am very interested in your comments!
> 
> Stefan
> 
> [1] Noordergraaf, L., van der Pas, R.: Performance Experiences on Suns
> WildFire Prototype. In: Proceedings of the 1999 ACM/IEEE conference on
> Supercomputing,Portland, Oregon, USA (November 1999)
> 
> [2] Löf, H., Holmgren, S.: affinity-on-next-touch: Increasing the
> Performance of an Industrial PDE Solver on a cc-NUMA System. In: Proceedings
> of the 19th Annual International Conference on Supercomputing, Cambridge,
> Massachusetts, USA (June 2005) 387–392
> 
> [3]. Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data
> and Thread Affinity in OpenMP Programs. In: Proceedings of the 2008 Workshop
> on Memory Access on future Processors: A solved problem?, ACM International
> Conference on Computing Frontiers, Ischia, Italy (May 2008) 377–384
> 
> 
> 

-- 
Dipl.-Math. Dieter an Mey, HPC Team Lead
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23, D 52074 Aachen (Germany)
Phone: + 49 241 80 24377 - Fax/UMS: + 49 241 80 624377
mailto:anmey@rz.rwth-aachen.de http://www.rz.rwth-aachen.de


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 5773 bytes --]

  reply	other threads:[~2009-05-11  8:48 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-11  8:27 [RFC PATCH 0/4]: affinity-on-next-touch Stefan Lankes
2009-05-11  8:48 ` Dieter an Mey [this message]
2009-05-11 13:22 ` Andi Kleen
2009-05-11 13:32   ` Brice Goglin
2009-05-11 14:54   ` Stefan Lankes
2009-05-11 14:54     ` Stefan Lankes
2009-05-11 16:37     ` Andi Kleen
2009-05-11 17:22       ` Stefan Lankes
2009-06-11 18:45   ` Stefan Lankes
2009-06-12 10:32     ` Andi Kleen
2009-06-12 11:46       ` Stefan Lankes
2009-06-12 12:30         ` Brice Goglin
2009-06-12 13:21           ` Stefan Lankes
2009-06-12 13:48           ` Stefan Lankes
2009-06-16  2:39         ` Lee Schermerhorn
2009-06-16 13:58           ` Stefan Lankes
2009-06-16 14:59             ` Lee Schermerhorn
2009-06-17  1:22               ` KAMEZAWA Hiroyuki
2009-06-17 12:02                 ` Lee Schermerhorn
2009-06-17  7:45               ` Stefan Lankes
2009-06-18  4:37                 ` Lee Schermerhorn
2009-06-18 19:04                   ` Lee Schermerhorn
2009-06-19 15:26                     ` Lee Schermerhorn
2009-06-19 15:41                       ` Balbir Singh
2009-06-19 15:59                         ` Lee Schermerhorn
2009-06-19 21:19                       ` Stefan Lankes
2009-06-22 12:34                   ` Brice Goglin
2009-06-22 14:24                     ` Lee Schermerhorn
2009-06-22 15:28                       ` Brice Goglin
2009-06-22 16:55                         ` Lee Schermerhorn
2009-06-22 17:06                           ` Brice Goglin
2009-06-22 17:59                             ` Stefan Lankes
2009-06-22 19:10                               ` Brice Goglin
2009-06-22 20:16                                 ` Stefan Lankes
2009-06-22 20:34                                   ` Brice Goglin
2009-06-22 14:32                     ` Stefan Lankes
2009-06-22 14:56                       ` Lee Schermerhorn
2009-06-22 15:42                         ` Stefan Lankes
2009-06-22 16:38                           ` Lee Schermerhorn
2009-06-16  2:25       ` Lee Schermerhorn
2009-06-20  7:24         ` Brice Goglin
2009-06-22 13:49           ` Lee Schermerhorn
2009-06-16  2:21     ` Lee Schermerhorn
2009-05-11 14:31 Samuel Thibault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A07E646.7090100@rz.rwth-aachen.de \
    --to=anmey@rz.rwth-aachen.de \
    --cc=lankes@lfbs.rwth-aachen.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.