From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: [Hackathon minutes] PV frontends/backends and NUMA machines Date: Mon, 20 May 2013 14:44:20 +0100 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hi all, these are my notes from the discussion that we had at the Hackathon regarding PV frontends and backends running on NUMA machines. --- The problem: how can we make sure that frontends and backends run in the same NUMA node? We would need to run one backend kthread per NUMA node: we have already one kthread per netback vif (one per guest), we could pin each of them on a different NUMA node, the same one the frontend is running on. But that means that dom0 would be running on several NUMA nodes at once, how much of a performance penalty would that be? We would need to export NUMA information to dom0, so that dom0 can make smart decisions on memory allocations and we would also need to allocate memory for dom0 from multiple nodes. We need a way to automatically allocate the initial dom0 memory in Xen in a NUMA-aware way and we need Xen to automatically create one dom0 vcpu per NUMA node. After dom0 boots, the toolstack is going to decide where to place any new guests: it allocates the memory from the NUMA node it wants to run the guest on and it is going to ask dom0 to allocate the kthread from that node too. (Maybe writing the NUMA node on xenstore.) We need to make sure that the interrupts/MSIs coming from the NIC arrive on the same pcpu that is running the vcpu that needs to receive it. We need to do irqbalacing in dom0, then Xen automatically will make the physical MSIs follow the vcpu automatically. If the card is multiqueue we need to make sure that we use the multiple queues so that we can have difference sources of interrupts/MSIs for each vif. This allows us to independently notify each dom0 vcpu.