All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Wanlong Gao <gaowanlong@cn.fujitsu.com>,
	andre.przywara@amd.com
Subject: Re: [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes
Date: Fri, 14 Jun 2013 09:16:26 +0800	[thread overview]
Message-ID: <51BA6EEA.20901@cn.fujitsu.com> (raw)
In-Reply-To: <87mwqttsbl.fsf@codemonkey.ws>

On 06/14/2013 09:05 AM, Anthony Liguori wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> Il 13/06/2013 08:50, Eduardo Habkost ha scritto:
>>> I believe an interface based on guest physical memory addresses is more
>>> flexible (and even simpler!) than one that only allows binding of whole
>>> virtual NUMA nodes.
>>
>> And "-numa node" is already one, what about just adding "mem-path=/foo"
>> or "host_node=NN" suboptions?  Then "-mem-path /foo" would be a shortcut
>> for "-numa node,mem-path=/foo".
>>
>> I even had patches to convert -numa to QemuOpts, I can dig them out if
>> your interested.
> 
> Ack.  This is a very reasonable thing to add.

How about making "-numa node,membind=0" like options, and also provide a
QMP interface "numa_set guest_node_id mempolicy". So that we can set
the mempolicy not only for file backed memory but also for anon mapped
guest numa node. This is full numa support in QEMU as you said. I'm making
the patches now.


Thanks,
Wanlong Gao

> 
> Regards,
> 
> Anthony Liguori
> 
>>
>> Paolo
>>
>>> (And I still don't understand why you are exposing QEMU virtual memory
>>> addresses in the new command, if they are useless).
>>>
>>>
>>>>>
>>>>>
>>>>>>>  * The correspondence between guest physical address ranges and ranges
>>>>>>>    inside the mapped files (so external tools could set the policy on
>>>>>>>    those files instead of requiring QEMU to set it directly)
>>>>>>>
>>>>>>> I understand that your use case may require additional information and
>>>>>>> additional interfaces. But if we provide the information above we will
>>>>>>> allow external components set the policy on the hugetlbfs files before
>>>>>>> we add new interfaces required for your use case.
>>>>>>
>>>>>> But the file backed memory is not good for the host which has many
>>>>>> virtual machines, in this situation, we can't handle anon THP yet.
>>>>>
>>>>> I don't understand what you mean, here. What prevents someone from using
>>>>> file-backed memory with multiple virtual machines?
>>>>
>>>> While if we use hugetlbfs backed memory, we should know how many virtual machines,
>>>> how much memory each vm will use, then reserve these pages for them. And even
>>>> should reserve more pages for external tools(numactl) to set memory polices.
>>>> Even the memory reservation also has it's own memory policies. It's very hard
>>>> to control it to what we want to set.
>>>
>>> Well, it's hard because we don't even have tools to help on that, yet.
>>>
>>> Anyway, I understand that you want to make it work with THP as well. But
>>> if THP works with tmpfs (does it?), people then could use exactly the
>>> same file-based mechanisms with tmpfs and keep THP working.
>>>
>>> (Right now I am doing some experiments to understand how the system
>>> behaves when using numactl on hugetlbfs and tmpfs, before and after
>>> getting the files mapped).
>>>
>>>
>>>>>
>>>>>>
>>>>>> And as I mentioned, the cross numa node access performance regression
>>>>>> is caused by pci-passthrough, it's a very long time bug, we should
>>>>>> back port the host memory pinning patch to old QEMU to resolve this performance
>>>>>> problem, too.
>>>>>
>>>>> If it's a regression, what's the last version of QEMU where the bug
>>>>> wasn't present?
>>>>>
>>>>
>>>>  As QEMU doesn't support host memory binding, I think
>>>> this was present since we support guest NUMA, and the pci-passthrough made
>>>> it even worse.
>>>
>>> If the problem was always present, it is not a regression, is it?
>>>
> 
> 

  reply	other threads:[~2013-06-14  1:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05  3:58 [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Wanlong Gao
2013-06-05  3:58 ` [Qemu-devel] [PATCH 2/2] Add monitor command mem-nodes Wanlong Gao
2013-06-05 12:39   ` Eric Blake
2013-06-05 12:57   ` Anthony Liguori
2013-06-05 15:54     ` Eduardo Habkost
2013-06-06  9:30       ` Wanlong Gao
2013-06-06 16:15         ` Eduardo Habkost
2013-06-14  1:04       ` Anthony Liguori
2013-06-14 13:56         ` Eduardo Habkost
2013-06-05 13:46   ` Eduardo Habkost
2013-06-11  7:22     ` Wanlong Gao
2013-06-11 13:40       ` Eduardo Habkost
2013-06-13  1:40         ` Wanlong Gao
2013-06-13 12:50           ` Eduardo Habkost
2013-06-13 22:32             ` Paolo Bonzini
2013-06-14  1:05               ` Anthony Liguori
2013-06-14  1:16                 ` Wanlong Gao [this message]
2013-06-15 17:23                   ` Paolo Bonzini
2013-06-05 10:02 ` [Qemu-devel] [PATCH 1/2] Add Linux libnuma detection Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51BA6EEA.20901@cn.fujitsu.com \
    --to=gaowanlong@cn.fujitsu.com \
    --cc=aliguori@us.ibm.com \
    --cc=andre.przywara@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.