From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36953) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UnG4c-00020h-Eo for qemu-devel@nongnu.org; Thu, 13 Jun 2013 18:32:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UnG4b-0005X8-G0 for qemu-devel@nongnu.org; Thu, 13 Jun 2013 18:32:54 -0400 Received: from mail-qc0-x22e.google.com ([2607:f8b0:400d:c01::22e]:61379) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UnG4b-0005X0-BU for qemu-devel@nongnu.org; Thu, 13 Jun 2013 18:32:53 -0400 Received: by mail-qc0-f174.google.com with SMTP id m15so4952124qcq.19 for ; Thu, 13 Jun 2013 15:32:52 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <51BA4891.6020108@redhat.com> Date: Thu, 13 Jun 2013 18:32:49 -0400 From: Paolo Bonzini MIME-Version: 1.0 References: <1370404705-4620-1-git-send-email-gaowanlong@cn.fujitsu.com> <1370404705-4620-2-git-send-email-gaowanlong@cn.fujitsu.com> <20130605134505.GS2580@otherpad.lan.raisama.net> <51B6D025.3040606@cn.fujitsu.com> <20130611134017.GC2895@otherpad.lan.raisama.net> <51B922FE.8090109@cn.fujitsu.com> <20130613125019.GI2895@otherpad.lan.raisama.net> In-Reply-To: <20130613125019.GI2895@otherpad.lan.raisama.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: andre.przywara@amd.com, aliguori@us.ibm.com, qemu-devel@nongnu.org, Wanlong Gao Il 13/06/2013 08:50, Eduardo Habkost ha scritto: > I believe an interface based on guest physical memory addresses is more > flexible (and even simpler!) than one that only allows binding of whole > virtual NUMA nodes. And "-numa node" is already one, what about just adding "mem-path=/foo" or "host_node=NN" suboptions? Then "-mem-path /foo" would be a shortcut for "-numa node,mem-path=/foo". I even had patches to convert -numa to QemuOpts, I can dig them out if your interested. Paolo > (And I still don't understand why you are exposing QEMU virtual memory > addresses in the new command, if they are useless). > > >>> >>> >>>>> * The correspondence between guest physical address ranges and ranges >>>>> inside the mapped files (so external tools could set the policy on >>>>> those files instead of requiring QEMU to set it directly) >>>>> >>>>> I understand that your use case may require additional information and >>>>> additional interfaces. But if we provide the information above we will >>>>> allow external components set the policy on the hugetlbfs files before >>>>> we add new interfaces required for your use case. >>>> >>>> But the file backed memory is not good for the host which has many >>>> virtual machines, in this situation, we can't handle anon THP yet. >>> >>> I don't understand what you mean, here. What prevents someone from using >>> file-backed memory with multiple virtual machines? >> >> While if we use hugetlbfs backed memory, we should know how many virtual machines, >> how much memory each vm will use, then reserve these pages for them. And even >> should reserve more pages for external tools(numactl) to set memory polices. >> Even the memory reservation also has it's own memory policies. It's very hard >> to control it to what we want to set. > > Well, it's hard because we don't even have tools to help on that, yet. > > Anyway, I understand that you want to make it work with THP as well. But > if THP works with tmpfs (does it?), people then could use exactly the > same file-based mechanisms with tmpfs and keep THP working. > > (Right now I am doing some experiments to understand how the system > behaves when using numactl on hugetlbfs and tmpfs, before and after > getting the files mapped). > > >>> >>>> >>>> And as I mentioned, the cross numa node access performance regression >>>> is caused by pci-passthrough, it's a very long time bug, we should >>>> back port the host memory pinning patch to old QEMU to resolve this performance >>>> problem, too. >>> >>> If it's a regression, what's the last version of QEMU where the bug >>> wasn't present? >>> >> >> As QEMU doesn't support host memory binding, I think >> this was present since we support guest NUMA, and the pci-passthrough made >> it even worse. > > If the problem was always present, it is not a regression, is it? >