From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263424AbTDNPHF (for ); Mon, 14 Apr 2003 11:07:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263426AbTDNPHF (for ); Mon, 14 Apr 2003 11:07:05 -0400 Received: from 217-125-129-224.uc.nombres.ttd.es ([217.125.129.224]:25585 "HELO cocodriloo.com") by vger.kernel.org with SMTP id S263424AbTDNPHD (for ); Mon, 14 Apr 2003 11:07:03 -0400 Date: Mon, 14 Apr 2003 17:29:47 +0200 From: Antonio Vargas To: "Martin J. Bligh" Cc: Timothy Miller , linux-kernel@vger.kernel.org, nicoya@apia.dhs.org Subject: Re: Quick question about hyper-threading (also some NUMA stuff) Message-ID: <20030414152947.GB14552@wind.cocodriloo.com> References: <001301c3028a$25374f30$6801a8c0@epimetheus> <10760000.1050332136@[10.10.2.4]> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <10760000.1050332136@[10.10.2.4]> User-Agent: Mutt/1.3.28i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 14, 2003 at 07:55:37AM -0700, Martin J. Bligh wrote: > > This sounds like the most sensible approach. I like considering the > > extremes of performance, but sometimes, the time for math required for some > > optimization can be worse than any benefit you get out of it. Your > > suggestion is simple. It increases the likelihood (10% better for little > > extra effort is better than 10% worse) of related processes being run on the > > same node, while not impacting the system's ability to balance load. This, > > as you say, is also very important for NUMA. > > See my earlier email - rebalance_node() does this, and it's very cheap, as > we just SMP balance *within* the node - the cross node rebalancer is a > separate tunable background process. > > > Does the NUMA support migrate pages to the node which is running a process? > > Or would processes jump nodes often enough to make that not worth the > > effort? > > No, we don't do page migration as yet. Andi is playing with a homenode > concept that makes pages allocate from a predefined "home node" always, > instead of their current node. Last time I benchmarked that concept it > sucked, but the advent of the per-cpu, per-zone hot/cold page cache, and > the fact that he's using hardware with totally different NUMA characteristics > may well change that conclusion. > > We don't normally migrate stuff around much on the higher-ration NUMA > machines. With AMD Hammer or whatever, that may change. > > > In order for page migration to be worth it, node affinity would have to be > > fairly strong. It's particularly important when a process maps pages which > > belong to another node. Is there any logic there to duplicate pages in > > cases where there is enough free memory for it? We'd have to tag the pages > > as duplicates so the VM could reclaim them. > > Right - we're looking at read only text replication, first for the kernel > (which ia64 has already), then for shared libs and program text. It's a > good concept, provided you have plenty of RAM (which big NUMA boxes tend > to). Probably needs hooking into the address space structure, and to be > thrown away just like anything else that's unused under memory pressure > from the per-node LRU lists. Though it'd be nice to mark them as particularly > cheap to retrieve, and had a reference count (a node bitmap?) and to > retrieve them from another node, not from disk. Perhaps it would be good to un-COW pages: 1. fork process 2. if current node is not loaded, continue as usual 3. if current node is loaded: 3a. pick unloaded node 4b. don't do COW for data pages, but simply copy them to node-local memory This way, read-write sharings would be replicated for each node. Also, keeping an per-node active-page-list and then forcefully copying the page to a node-local page-frame when accesing a page which is active on another node could be good. Hmm, the un-COW system could be implemented in terms of the second one, isn't it? Greets, Antonio.