I resend these because for some unknown reason they don't seem to have made it neither into the MARC archives nor into those at www.cs.helsinki.fi ---------- Resent Message ---------- Subject: [PATCH] node affine NUMA scheduler 1/5 Date: Fri, 11 Oct 2002 19:54:30 +0200 Hi, here comes the complete set of patches for the node affine NUMA scheduler. It's made of several building blocks and one can make several flavors of NUMA schedulers out of the patches. The patches are: 01-numa_sched_core-2.5.39-10.patch : Provides basic NUMA functionality. It implements CPU pools with all the mess needed to initialize them. Also it has a node aware find_busiest_queue() which first scans the own node for more loaded CPUs. If no steal candidate is found on the own node, it finds the most loaded node and tries to steal a task from it. By steal delays for remote node steals it tries to achieve equal node load. These delays can be extended to cope with multi-level node hierarchies (that patch is not included). 02-numa_sched_ilb-2.5.39-10.patch : This patch provides simple initial load balancing during exec(). It is node aware and will select the least loaded node. Also it does a round-robin initial node selection to distribute the load better across the nodes. 03-node_affine-2.5.39-10.patch : This is the heart of the node affine part of the patch. Tasks are assigned a homenode during initial load balancing and they are attracted to the homenode. 04-alloc_on_homenode.patch : Coupling with the memory allocator: for user tasks allocate memory from the homenode, no matter on which node the task is scheduled. 05-dynamic_homenode-2.5.39-10.patch : Dynamic homenode selection. When pages are allocated or freed they are tracked. The homenode is recalculated dynamically and set to the node where most of the memory of the task is allocated. Meaningfull combinations of patches are: A : numa scheduler : 01 + 02 node aware NUMA scheduler, with initial load balancing B : node affine scheduler : 01 + 02 + 03 (+04) C : node affine scheduler with dynamic homenode selection : 01 + 02 + 03 + 05 ( !exclude 04 !) The best results should be provided by C as it incorporates most of the features. The patches should run on ia32 NUMAQ and ia64 Azusa (with the topology patches applied). Other architectures just need the build_node() call similar to arch/i386/kernel/smpboot.c The issues with NUMAQ (uninitialized platform specific stuff) should be solved. Comments, flames, etc... welcome ;-) Best regards, Erich