From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50137) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wwbz5-0007Nr-8W for qemu-devel@nongnu.org; Mon, 16 Jun 2014 14:50:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wwbyq-0003mY-8E for qemu-devel@nongnu.org; Mon, 16 Jun 2014 14:50:23 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:34842) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wwbyq-0003mH-1A for qemu-devel@nongnu.org; Mon, 16 Jun 2014 14:50:08 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 16 Jun 2014 12:50:06 -0600 Date: Mon, 16 Jun 2014 11:49:46 -0700 From: Nishanth Aravamudan Message-ID: <20140616184945.GI16644@linux.vnet.ibm.com> References: <1402905233-26510-1-git-send-email-aik@ozlabs.ru> <1402905233-26510-8-git-send-email-aik@ozlabs.ru> <20140616161500.GD3222@otherpad.lan.raisama.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140616161500.GD3222@otherpad.lan.raisama.net> Subject: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: Alexey Kardashevskiy , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Alexander Graf On 16.06.2014 [13:15:00 -0300], Eduardo Habkost wrote: > On Mon, Jun 16, 2014 at 05:53:53PM +1000, Alexey Kardashevskiy wrote: > > Currently NUMA nodes must go consequently and QEMU ignores nodes > > with @nodeid bigger than the number of NUMA nodes in the command line. > > Why would somebody need a NUMA node with nodeid bigger than the number > of NUMA nodes? NUMA node IDs must be in the [0, numa_nodes-1] range. That is not how the code works currently. vl.c::numa_add() ... if (get_param_value(option, 128, "nodeid", optarg) == 0) { nodenr = nb_numa_nodes; } else { if (parse_uint_full(option, &nodenr, 10) < 0) { fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option); exit(1); } } ... if (get_param_value(option, 128, "mem", optarg) == 0) { node_mem[nodenr] = 0; } else { int64_t sval; sval = strtosz(option, &endptr); if (sval < 0 || *endptr) { fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg); exit(1); } node_mem[nodenr] = sval; } ... nb_numa_nodes++; ... So if a user passes nodeid= to the NUMA node definition, that entry in node_mem is set to the appropriate value, but nb_numa_nodes, which is used to bound the iteration of that array is not bumped appropriately. So we end up looking at arbitrary indices in the node_mem array, which are often 0. Note also that means that we can't generically differentiate here between a user-defined memoryless node and one that happens to be 0 because the particular nodeid was not specified on the command-line. Alexey, how do you differentiate between these two cases after your patches? In patch 3, I see you check (and skip in the loop) explicitly if !node_mem[nodeid], but I'm not sure how that check can differentiate between the statically 0 (from main's intialization loop) and when a user says a node's memory is 0. Probably something obvious I'm missing (it is Monday after all)... > > This prevents us from creating memory-less nodes which is possible > > situation in POWERPC under pHyp or Sapphire. > > Why? If I recall correctly, nodes without any CPUs or any memory are > already allowed. They are allowed, but it seems like the code throughout qemu (where it's relevant) assumes that NUMA nodes are sequential and continuous, but that's not required (nor is it enforced on the command-line). > How exactly would this patch help you? How do you expect the > command-line to look like for your use case? Alexey has replied with that, it looks like. > > This makes nb_numa_nodes a total number of nodes or the biggest node > > number + 1 whichever is greater. > > > > Signed-off-by: Alexey Kardashevskiy > > --- > > vl.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/vl.c b/vl.c > > index ac0e3d7..f1b75cb 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -1356,7 +1356,7 @@ static void numa_add(const char *optarg) > > if (get_param_value(option, 128, "cpus", optarg) != 0) { > > numa_node_parse_cpus(nodenr, option); > > } > > - nb_numa_nodes++; > > + nb_numa_nodes = MAX(nb_numa_nodes + 1, nodenr + 1); > > I would instead suggest that if any node in the [0, max_node_id] range > is not present on the command-line, QEMU should instead reject the > command-line. We already check two things: Too many nodes (meaning we've filled the array): if (nb_numa_nodes >= MAX_NODES) { fprintf(stderr, "qemu: too many NUMA nodes\n"); exit(1); } Node ID itself is out of range (due to the use of an array): if (nodenr >= MAX_NODES) { fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr); exit(1); } But that doesn't prevent one from having *sparse* NUMA node IDs. And, as far as I can tell, this is allowed by the spec, but isn't properly supported by qemu. Thanks, Nish