All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: andre.przywara@amd.com, aliguori@us.ibm.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com,
	Wanlong Gao <gaowanlong@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes
Date: Thu, 13 Jun 2013 09:40:14 +0800	[thread overview]
Message-ID: <51B922FE.8090109@cn.fujitsu.com> (raw)
In-Reply-To: <20130611134017.GC2895@otherpad.lan.raisama.net>

On 06/11/2013 09:40 PM, Eduardo Habkost wrote:
> On Tue, Jun 11, 2013 at 03:22:13PM +0800, Wanlong Gao wrote:
>> On 06/05/2013 09:46 PM, Eduardo Habkost wrote:
>>> On Wed, Jun 05, 2013 at 11:58:25AM +0800, Wanlong Gao wrote:
>>>> Add monitor command mem-nodes to show the huge mapped
>>>> memory nodes locations.
>>>>
>>>
>>> This is for machine consumption, so we need a QMP command.
>>>
>>>> (qemu) info mem-nodes
>>>> /proc/14132/fd/13: 00002aaaaac00000-00002aaaeac00000: node0
>>>> /proc/14132/fd/13: 00002aaaeac00000-00002aab2ac00000: node1
>>>> /proc/14132/fd/14: 00002aab2ac00000-00002aab2b000000: node0
>>>> /proc/14132/fd/14: 00002aab2b000000-00002aab2b400000: node1
>>>
>>> Are node0/node1 _host_ nodes?
>>>
>>> How do I know what's the _guest_ address/node corresponding to each
>>> file/range above?
>>>
>>> What I am really looking for is:
>>>
>>>  * The correspondence between guest (virtual) NUMA nodes and guest
>>>    physical address ranges (it could be provided by the QMP version of
>>>    "info numa")
>>
>> AFAIK, the guest NUMA nodes and guest physical address ranges are set
>> by seabios, we can't get this information from QEMU,
> 
> QEMU _has_ to know about it, otherwise we would never be able to know
> which virtual addresses inside the QEMU process (or offsets inside the
> backing files) belong to which virtual NUMA node.

Nope, if I'm right, actually it's linear except that there are holes in
the physical address spaces. So we can know which node the guest virtual
address is included just by each numa node size. It's enough for us if we
can provide a QMP interface from QEMU to let external tools like libvirt
set the host memory binding polices according to the QMP interface, and
we can also provide the QEMU command line option to be able to set host
bindings through command line options before we start QEMU process.

> 
> (After all, the NUMA wiring is a hardware feature, not something that
> the BIOS can decide)

But this is ACPI table which wrote by seabios now. AFAIK, there is no
unified idea about moving this part to QEMU with the QEMU interfaces
for seabios removed or just stay it there.


> 
> 
>> and I think this
>> information is useless for pinning memory range to host.
> 
> Well, we have to somehow identify each region of guest memory when
> deciding how to pin it. How would you identify it without using guest
> physical addresses? Guest physical addresses are more meaningful than
> the QEMU virtual addresses your patch exposes (that are meaningless
> outside QEMU).

As I mentioned above, we can know this just by the guest node memory size,
and can set the host bindings by treating this sizes as offsets.
And I think we only need to set the host memory binding polices to each
guest numa nodes. It's unnecessary to set polices to each region as you
said.

> 
> 
> 
>>>  * The correspondence between guest physical address ranges and ranges
>>>    inside the mapped files (so external tools could set the policy on
>>>    those files instead of requiring QEMU to set it directly)
>>>
>>> I understand that your use case may require additional information and
>>> additional interfaces. But if we provide the information above we will
>>> allow external components set the policy on the hugetlbfs files before
>>> we add new interfaces required for your use case.
>>
>> But the file backed memory is not good for the host which has many
>> virtual machines, in this situation, we can't handle anon THP yet.
> 
> I don't understand what you mean, here. What prevents someone from using
> file-backed memory with multiple virtual machines?

While if we use hugetlbfs backed memory, we should know how many virtual machines,
how much memory each vm will use, then reserve these pages for them. And even
should reserve more pages for external tools(numactl) to set memory polices.
Even the memory reservation also has it's own memory policies. It's very hard
to control it to what we want to set.


> 
>>
>> And as I mentioned, the cross numa node access performance regression
>> is caused by pci-passthrough, it's a very long time bug, we should
>> back port the host memory pinning patch to old QEMU to resolve this performance
>> problem, too.
> 
> If it's a regression, what's the last version of QEMU where the bug
> wasn't present?
> 

 As QEMU doesn't support host memory binding, I think
this was present since we support guest NUMA, and the pci-passthrough made
it even worse.


Thanks,
Wanlong Gao

  reply	other threads:[~2013-06-13  1:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05  3:58 [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Wanlong Gao
2013-06-05  3:58 ` [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes Wanlong Gao
2013-06-05 12:39   ` Eric Blake
2013-06-05 12:57   ` Anthony Liguori
2013-06-05 15:54     ` Eduardo Habkost
2013-06-06  9:30       ` Wanlong Gao
2013-06-06 16:15         ` Eduardo Habkost
2013-06-14  1:04       ` Anthony Liguori
2013-06-14 13:56         ` Eduardo Habkost
2013-06-05 13:46   ` Eduardo Habkost
2013-06-11  7:22     ` Wanlong Gao
2013-06-11 13:40       ` Eduardo Habkost
2013-06-13  1:40         ` Wanlong Gao [this message]
2013-06-13 12:50           ` Eduardo Habkost
2013-06-13 22:32             ` Paolo Bonzini
2013-06-14  1:05               ` Anthony Liguori
2013-06-14  1:16                 ` Wanlong Gao
2013-06-15 17:23                   ` Paolo Bonzini
2013-06-05 10:02 ` [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B922FE.8090109@cn.fujitsu.com \
    --to=gaowanlong@cn.fujitsu.com \
    --cc=aliguori@us.ibm.com \
    --cc=andre.przywara@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.