From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37634) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WwdG8-00013w-RO for qemu-devel@nongnu.org; Mon, 16 Jun 2014 16:12:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WwdG3-0006TF-Mm for qemu-devel@nongnu.org; Mon, 16 Jun 2014 16:12:04 -0400 Date: Mon, 16 Jun 2014 17:11:40 -0300 From: Eduardo Habkost Message-ID: <20140616201140.GB8629@otherpad.lan.raisama.net> References: <1402905233-26510-1-git-send-email-aik@ozlabs.ru> <1402905233-26510-8-git-send-email-aik@ozlabs.ru> <20140616161500.GD3222@otherpad.lan.raisama.net> <20140616184945.GI16644@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140616184945.GI16644@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nishanth Aravamudan Cc: Alexey Kardashevskiy , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Alexander Graf On Mon, Jun 16, 2014 at 11:49:46AM -0700, Nishanth Aravamudan wrote: > On 16.06.2014 [13:15:00 -0300], Eduardo Habkost wrote: > > On Mon, Jun 16, 2014 at 05:53:53PM +1000, Alexey Kardashevskiy wrote: > > > Currently NUMA nodes must go consequently and QEMU ignores nodes > > > with @nodeid bigger than the number of NUMA nodes in the command line. > > > > Why would somebody need a NUMA node with nodeid bigger than the number > > of NUMA nodes? NUMA node IDs must be in the [0, numa_nodes-1] range. > > That is not how the code works currently. > > vl.c::numa_add() > ... > if (get_param_value(option, 128, "nodeid", optarg) == 0) { > nodenr = nb_numa_nodes; > } else { > if (parse_uint_full(option, &nodenr, 10) < 0) { > fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option); > exit(1); > } > } > ... > if (get_param_value(option, 128, "mem", optarg) == 0) { > node_mem[nodenr] = 0; > } else { > int64_t sval; > sval = strtosz(option, &endptr); > if (sval < 0 || *endptr) { > fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg); > exit(1); > } > node_mem[nodenr] = sval; > } > ... > nb_numa_nodes++; > ... > > So if a user passes nodeid= to the NUMA node definition, that entry in > node_mem is set to the appropriate value, but nb_numa_nodes, which is > used to bound the iteration of that array is not bumped appropriately. > So we end up looking at arbitrary indices in the node_mem array, which > are often 0. Yes. What I claim is: the bug is that the code is not validating nodeids, which must be all in the [0, num_nodes-1] range. Alexey, why do you believe you need non-contiguous nodeids to declare memory-less nodes at all? Just use "-numa node" with no arguments, and you will get a node with node_mem==0. In other words, if you are trying to do: -numa node,id=1,mem=X -numa node,id=3,mem=Y (so you get nodes 0 and 2 with no memory). You just need to do this instead: -numa node,id=0 -numa node,id=1,mem=X -numa node,id=2 -numa node,id=3,mem=Y > > Note also that means that we can't generically differentiate here > between a user-defined memoryless node and one that happens to be 0 > because the particular nodeid was not specified on the command-line. That's because all nodeids must be specified in the command-line. Accepting omitted nodes is a bug which should be fixed. > Alexey, how do you differentiate between these two cases after your > patches? In patch 3, I see you check (and skip in the loop) explicitly > if !node_mem[nodeid], but I'm not sure how that check can differentiate > between the statically 0 (from main's intialization loop) and when a > user says a node's memory is 0. Probably something obvious I'm missing > (it is Monday after all)... > > > > This prevents us from creating memory-less nodes which is possible > > > situation in POWERPC under pHyp or Sapphire. > > > > Why? If I recall correctly, nodes without any CPUs or any memory are > > already allowed. > > They are allowed, but it seems like the code throughout qemu (where it's > relevant) assumes that NUMA nodes are sequential and continuous, but > that's not required (nor is it enforced on the command-line). It is not being enforced, but you will get weird bugs if you don't do it. What I suggest is that we start enforcing it instead of magically crating new nodes when a wrong (too high) ID is provided. > > > How exactly would this patch help you? How do you expect the > > command-line to look like for your use case? > > Alexey has replied with that, it looks like. Where? I don't see a reply. > > > > This makes nb_numa_nodes a total number of nodes or the biggest node > > > number + 1 whichever is greater. > > > > > > Signed-off-by: Alexey Kardashevskiy > > > --- > > > vl.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/vl.c b/vl.c > > > index ac0e3d7..f1b75cb 100644 > > > --- a/vl.c > > > +++ b/vl.c > > > @@ -1356,7 +1356,7 @@ static void numa_add(const char *optarg) > > > if (get_param_value(option, 128, "cpus", optarg) != 0) { > > > numa_node_parse_cpus(nodenr, option); > > > } > > > - nb_numa_nodes++; > > > + nb_numa_nodes = MAX(nb_numa_nodes + 1, nodenr + 1); > > > > I would instead suggest that if any node in the [0, max_node_id] range > > is not present on the command-line, QEMU should instead reject the > > command-line. > > We already check two things: > > Too many nodes (meaning we've filled the array): > if (nb_numa_nodes >= MAX_NODES) { > fprintf(stderr, "qemu: too many NUMA nodes\n"); > exit(1); > } > > Node ID itself is out of range (due to the use of an array): > if (nodenr >= MAX_NODES) { > fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr); > exit(1); > } > > But that doesn't prevent one from having *sparse* NUMA node IDs. And, as > far as I can tell, this is allowed by the spec, but isn't properly > supported by qemu. Wait, is the node ID visible to the guest at all? I believe it is a QEMU-internal thing, just to allow the NUMA nodes to be ordered in the command-line. I would even claim that the parameter is useless and shouldn't have been introduced in the first place. What I don't se is: why you need the command-line to look like: -numa node,id=1,mem=X when you can simply write it as: -numa node,id=0 -numa node,id=1,mem=X -- Eduardo