From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53684) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X1gHe-0004QL-0u for qemu-devel@nongnu.org; Mon, 30 Jun 2014 14:26:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X1gHU-0003bI-FE for qemu-devel@nongnu.org; Mon, 30 Jun 2014 14:26:29 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:45939) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X1gHU-0003bA-7l for qemu-devel@nongnu.org; Mon, 30 Jun 2014 14:26:20 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 30 Jun 2014 12:26:18 -0600 Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id EFC9B19D8040 for ; Mon, 30 Jun 2014 12:26:06 -0600 (MDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by b03cxnp08027.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s5UIP9K77930114 for ; Mon, 30 Jun 2014 20:25:09 +0200 Received: from d03av01.boulder.ibm.com (localhost [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s5UIQFJ0005786 for ; Mon, 30 Jun 2014 12:26:15 -0600 Date: Mon, 30 Jun 2014 11:26:10 -0700 From: Nishanth Aravamudan Message-ID: <20140630182610.GA15697@linux.vnet.ibm.com> References: <20140623193310.GD4323@linux.vnet.ibm.com> <20140624004825.GE4323@linux.vnet.ibm.com> <20140624174038.GK4323@linux.vnet.ibm.com> <20140626193705.GJ3222@otherpad.lan.raisama.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140626193705.GJ3222@otherpad.lan.raisama.net> Subject: Re: [Qemu-devel] [RFC PATCH v3] numa: enable sparse node numbering List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: "Michael S. Tsirkin" , Alexey Kardashevskiy , Hu Tao , qemu-devel@nongnu.org, Anton Blanchard , David Rientjes , Igor Mammedov On 26.06.2014 [16:37:05 -0300], Eduardo Habkost wrote: > On Tue, Jun 24, 2014 at 10:40:38AM -0700, Nishanth Aravamudan wrote: > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h > > index 277230d..b90bf66 100644 > > --- a/include/sysemu/sysemu.h > > +++ b/include/sysemu/sysemu.h > > @@ -145,11 +145,13 @@ extern int mem_prealloc; > > */ > > #define MAX_CPUMASK_BITS 255 > > > > -extern int nb_numa_nodes; > > +extern int nb_numa_nodes; /* Number of NUMA nodes */ > > I would rename it to num_numa_nodes, so we can easily ensure all code > using nb_numa_nodes will be converted appropriately. > > > +extern int max_numa_node; /* Highest specified NUMA node ID */ > > I would rename it max_numa_nodeid, to make it clear it is the maximum > ID, not the maximum number of nodes. Thanks, I'm rebasing onto your series now. > > int numa_init_func(QemuOpts *opts, void *opaque) > > @@ -155,7 +162,7 @@ void set_numa_nodes(void) > > { > > if (nb_numa_nodes > 0) { > > uint64_t numa_total; > > - int i; > > + int i, j = -1; > > Can you please initialize j closer to the loop where it is used? Yep. > > /* On Linux, the each node's border has to be 8MB aligned, > > * the final node gets the rest. > > */ > > - for (i = 0; i < nb_numa_nodes - 1; i++) { > > - numa_info[i].node_mem = (ram_size / nb_numa_nodes) & > > - ~((1 << 23UL) - 1); > > - usedmem += numa_info[i].node_mem; > > + for (i = 0; i < max_numa_node - 1; i++) { > > + if (numa_info[i].present) { > > + numa_info[i].node_mem = (ram_size / nb_numa_nodes) & > > + ~((1 << 23UL) - 1); > > + usedmem += numa_info[i].node_mem; > > + } > > } > > This part is tricky: the following line works only because > numa_info[max_numa_node-1] is always present: > > > numa_info[i].node_mem = ram_size - usedmem; > > So, what about adding assert(numa_info[i].present) here? Yep. > > @@ -203,9 +213,12 @@ void set_numa_nodes(void) > > * must cope with this anyway, because there are BIOSes out there in > > * real machines which also use this scheme. > > */ > > - if (i == nb_numa_nodes) { > > + if (i == max_numa_node) { > > for (i = 0; i < max_cpus; i++) { > > - set_bit(i, numa_info[i % nb_numa_nodes].node_cpu); > > + do { > > + j = (j + 1) % max_numa_node; > > + } while (!numa_info[j].present); > > If you change it from "do { } while" to "while { }", you don't need to > initialize j to -1. I don't think that's quite as simple as you make it out to be. If you use a while() loop, we won't always increment j, which means once we've found a present node, we'll always use that node? j here basically represents the *last* used nodeid, which we don't want to use again when we re-enter the for-loop, we want to use the next present nodeid. It seems like the do {} while() does this fine? I could use a while() if I added another increment outside the loop, as follows: if (i == max_numa_nodeid) { for (i = 0, j = 0; i < max_cpus; i++) { while (!numa_info[j].present) { j = (j + 1) % (max_numa_nodeid); } set_bit(i, numa_info[j].node_cpu); j = (j + 1) % (max_numa_nodeid); } } If you think that is cleaner, I'll use that version. Thanks, Nish