From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35515) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzH6g-0004Ws-9Q for qemu-devel@nongnu.org; Mon, 23 Jun 2014 23:09:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WzH6X-0007MW-26 for qemu-devel@nongnu.org; Mon, 23 Jun 2014 23:09:14 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:56638) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzH6W-0007MM-Tt for qemu-devel@nongnu.org; Mon, 23 Jun 2014 23:09:04 -0400 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 23 Jun 2014 23:09:04 -0400 Date: Mon, 23 Jun 2014 20:08:56 -0700 From: Nishanth Aravamudan Message-ID: <20140624030856.GA29537@linux.vnet.ibm.com> References: <1402905233-26510-1-git-send-email-aik@ozlabs.ru> <1402905233-26510-4-git-send-email-aik@ozlabs.ru> <20140620225528.GA2008@linux.vnet.ibm.com> <53A4F6CD.3040600@ozlabs.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53A4F6CD.3040600@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Alexander Graf On 21.06.2014 [13:06:53 +1000], Alexey Kardashevskiy wrote: > On 06/21/2014 08:55 AM, Nishanth Aravamudan wrote: > > On 16.06.2014 [17:53:49 +1000], Alexey Kardashevskiy wrote: > >> Current QEMU does not support memoryless NUMA nodes. > >> This prepares SPAPR for that. > >> > >> This moves 2 calls of spapr_populate_memory_node() into > >> the existing loop which handles nodes other than than > >> the first one. > >> > >> Signed-off-by: Alexey Kardashevskiy > >> --- > >> hw/ppc/spapr.c | 31 +++++++++++-------------------- > >> 1 file changed, 11 insertions(+), 20 deletions(-) > >> > >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >> index cb3a10a..666b676 100644 > >> --- a/hw/ppc/spapr.c > >> +++ b/hw/ppc/spapr.c > >> @@ -689,28 +689,13 @@ static void spapr_populate_memory_node(void *fdt, int nodeid, hwaddr start, > >> > >> static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt) > >> { > >> - hwaddr node0_size, mem_start, node_size; > >> + hwaddr mem_start, node_size; > >> int i; > >> > >> - /* memory node(s) */ > >> - if (nb_numa_nodes > 1 && node_mem[0] < ram_size) { > >> - node0_size = node_mem[0]; > >> - } else { > >> - node0_size = ram_size; > >> - } > >> - > >> - /* RMA */ > >> - spapr_populate_memory_node(fdt, 0, 0, spapr->rma_size); > >> - > >> - /* RAM: Node 0 */ > >> - if (node0_size > spapr->rma_size) { > >> - spapr_populate_memory_node(fdt, 0, spapr->rma_size, > >> - node0_size - spapr->rma_size); > >> - } > >> - > >> - /* RAM: Node 1 and beyond */ > >> - mem_start = node0_size; > >> - for (i = 1; i < nb_numa_nodes; i++) { > >> + for (i = 0, mem_start = 0; i < nb_numa_nodes; ++i) { > >> + if (!node_mem[i]) { > >> + continue; > >> + } > > > > Doesn't this skip memoryless nodes? What actually puts the memoryless > > node in the device-tree? > > It does skip. > > > And if you were to put them in, wouldn't spapr_populate_memory_node() > > fail because we'd be creating two nodes with memory@XXX where XXX is the > > same (starting address) for both? > > I cannot do this now - there is no way to tell from the command line where > I want NUMA node memory start from so I'll end up with multiple nodes with > the same name and QEMU won't start. When NUMA fixes reach upstream, I'll > try to work out something on top of that. Ah I got something here. With the patches I just sent to enable sparse NUMA nodes, plus your series rebased on top, here's what I see in a Linux LPAR: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off -m 4096 -realtime mlock=off -numa node,nodeid=3,mem=4096,cpus=2-3 -numa node,nodeid=2,mem=0,cpus=0-1 -smp 4 info numa 2 nodes node 2 cpus: 0 1 node 2 size: 0 MB node 3 cpus: 2 3 node 3 size: 4096 MB numactl --hardware available: 3 nodes (0,2-3) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 2 3 node 3 size: 4073 MB node 3 free: 3830 MB node distances: node 0 2 3 0: 10 40 40 2: 40 10 40 3: 40 40 10 The trick, it seems, is if you have a memoryless node, it needs to have CPUs assigned to it. The CPU's "ibm,associativity" property will make Linux set up the proper NUMA topology. Thoughts? Should there be a check that every "present" NUMA node at least either has CPUs or memory. It seems like if neither are present, we can just hotplug them later? Does qemu support topology for PCI devices? Thanks, Nish