All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Ani Sinha" <ani@anisinha.ca>, "Peter Xu" <peterx@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"Hui Zhu" <teawater@gmail.com>,
	"Sebastien Boeuf" <sebastien.boeuf@intel.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots
Date: Sun, 7 Nov 2021 11:53:34 +0100	[thread overview]
Message-ID: <41f72294-b449-2a42-d8b8-cf3de9314066@redhat.com> (raw)
In-Reply-To: <20211107051832-mutt-send-email-mst@kernel.org>

On 07.11.21 11:21, Michael S. Tsirkin wrote:
> On Sun, Nov 07, 2021 at 10:21:33AM +0100, David Hildenbrand wrote:
>> Let's not focus on b), a) is the primary goal of this series:
>>
>> "
>> a) Reduce the metadata overhead, including bitmap sizes inside KVM but
>> also inside QEMU KVM code where possible.
>> "
>>
>> Because:
>>
>> "
>> For example, when starting a VM with a 1 TiB virtio-mem device that only
>> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
>> in order to hotplug more memory later, we waste a lot of memory on
>> metadata for KVM memory slots (> 2 GiB!) and accompanied bitmaps.
>> "
>>
>> Partially tackling b) is just a nice side effect of this series. In the
>> long term, we'll want userfaultfd-based protection, and I'll do a
>> performance evaluation then, how userfaultf vs. !userfaultfd compares
>> (boot time, run time, THP consumption).
>>
>> I'll adjust the cover letter for the next version to make this clearer.
> 
> So given this is short-term, and long term we'll use uffd possibly with
> some extension (a syscall to populate 1G in one go?) isn't there some
> way to hide this from management? It's a one way street: once we get
> management involved in playing with memory slots we no longer can go
> back and control them ourselves. Not to mention it's a lot of
> complexity to push out to management.

For b) userfaultfd + optimizatons is the way to go long term.
For a) userfaultfd does not help in any way, and that's what I currently
care about most.

1) For the management layer it will be as simple as providing a
"memslots" parameter to the user. I don't expect management to do manual
memslot detection+calculation -- management layer is the wrong place
because it has limited insight. Either QEMU will do it automatically or
the user will do it manually. For QEMU to do it reliably, we'll have to
teach the management layer to specify any vhost* devices before
virtio-mem* devices on the QEMU cmdline -- that is the only real
complexity I see.

2) "control them ourselves" will essentially be enabled via "memslots=0"
(auto-detect mode". The user has to opt in.

"memslots" is a pure optimization mechanism. While I'd love to hide this
complexity from user space and always use the auto-detect mode,
especially hotplug of vhost devices is a real problem and requires users
to opt-in.

I assume once we have "memslots=0" (auto-detect) mode, most people will:
* Set "memslots=0" to enable the optimization and essentially let QEMU
  control it. Will work in most cases and we can document perfectly
  where it won't. We'll always fail gracefully.
* Leave "memslots=1" if they don't care about the optimization or run a
  problematic setup.
* Set "memslots=X if they run a problemantic setup in still care about
  the optimization.


To be precise, we could have a "memslots-optimiation=true|false" toggle
instead. IMHO that could be limiting for these corner case setups where
auto-detection is problematic and users still want to optimize --
especially eventually hotplugging vhost devices. But as I assume
99.9999% of all setups will enable auto-detect mode, I don't have a
strong opinion.

-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	kvm@vger.kernel.org,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Sebastien Boeuf" <sebastien.boeuf@intel.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Ani Sinha" <ani@anisinha.ca>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Hui Zhu" <teawater@gmail.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>
Subject: Re: [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots
Date: Sun, 7 Nov 2021 11:53:34 +0100	[thread overview]
Message-ID: <41f72294-b449-2a42-d8b8-cf3de9314066@redhat.com> (raw)
In-Reply-To: <20211107051832-mutt-send-email-mst@kernel.org>

On 07.11.21 11:21, Michael S. Tsirkin wrote:
> On Sun, Nov 07, 2021 at 10:21:33AM +0100, David Hildenbrand wrote:
>> Let's not focus on b), a) is the primary goal of this series:
>>
>> "
>> a) Reduce the metadata overhead, including bitmap sizes inside KVM but
>> also inside QEMU KVM code where possible.
>> "
>>
>> Because:
>>
>> "
>> For example, when starting a VM with a 1 TiB virtio-mem device that only
>> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
>> in order to hotplug more memory later, we waste a lot of memory on
>> metadata for KVM memory slots (> 2 GiB!) and accompanied bitmaps.
>> "
>>
>> Partially tackling b) is just a nice side effect of this series. In the
>> long term, we'll want userfaultfd-based protection, and I'll do a
>> performance evaluation then, how userfaultf vs. !userfaultfd compares
>> (boot time, run time, THP consumption).
>>
>> I'll adjust the cover letter for the next version to make this clearer.
> 
> So given this is short-term, and long term we'll use uffd possibly with
> some extension (a syscall to populate 1G in one go?) isn't there some
> way to hide this from management? It's a one way street: once we get
> management involved in playing with memory slots we no longer can go
> back and control them ourselves. Not to mention it's a lot of
> complexity to push out to management.

For b) userfaultfd + optimizatons is the way to go long term.
For a) userfaultfd does not help in any way, and that's what I currently
care about most.

1) For the management layer it will be as simple as providing a
"memslots" parameter to the user. I don't expect management to do manual
memslot detection+calculation -- management layer is the wrong place
because it has limited insight. Either QEMU will do it automatically or
the user will do it manually. For QEMU to do it reliably, we'll have to
teach the management layer to specify any vhost* devices before
virtio-mem* devices on the QEMU cmdline -- that is the only real
complexity I see.

2) "control them ourselves" will essentially be enabled via "memslots=0"
(auto-detect mode". The user has to opt in.

"memslots" is a pure optimization mechanism. While I'd love to hide this
complexity from user space and always use the auto-detect mode,
especially hotplug of vhost devices is a real problem and requires users
to opt-in.

I assume once we have "memslots=0" (auto-detect) mode, most people will:
* Set "memslots=0" to enable the optimization and essentially let QEMU
  control it. Will work in most cases and we can document perfectly
  where it won't. We'll always fail gracefully.
* Leave "memslots=1" if they don't care about the optimization or run a
  problematic setup.
* Set "memslots=X if they run a problemantic setup in still care about
  the optimization.


To be precise, we could have a "memslots-optimiation=true|false" toggle
instead. IMHO that could be limiting for these corner case setups where
auto-detection is problematic and users still want to optimize --
especially eventually hotplugging vhost devices. But as I assume
99.9999% of all setups will enable auto-detect mode, I don't have a
strong opinion.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-11-07 10:53 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27 12:45 [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots David Hildenbrand
2021-10-27 12:45 ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 01/12] kvm: Return number of free memslots David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 02/12] vhost: " David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 13:36   ` Philippe Mathieu-Daudé
2021-10-27 13:36     ` Philippe Mathieu-Daudé
2021-10-27 13:37     ` David Hildenbrand
2021-10-27 13:37       ` David Hildenbrand
2021-10-27 14:04     ` David Hildenbrand
2021-10-27 14:04       ` David Hildenbrand
2021-10-27 14:11       ` Philippe Mathieu-Daudé
2021-10-27 14:11         ` Philippe Mathieu-Daudé
2021-10-27 15:33         ` Michael S. Tsirkin
2021-10-27 15:33           ` Michael S. Tsirkin
2021-10-27 15:45           ` David Hildenbrand
2021-10-27 15:45             ` David Hildenbrand
2021-10-27 16:11             ` Philippe Mathieu-Daudé
2021-10-27 16:11               ` Philippe Mathieu-Daudé
2021-10-27 16:51               ` David Hildenbrand
2021-10-27 16:51                 ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 03/12] memory: Allow for marking memory region aliases unmergeable David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 04/12] vhost: Don't merge unmergeable memory sections David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 05/12] memory-device: Move memory_device_check_addable() directly into memory_device_pre_plug() David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 06/12] memory-device: Generalize memory_device_used_region_size() David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 07/12] memory-device: Support memory devices that dynamically consume multiple memslots David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 08/12] vhost: Respect reserved memslots for memory devices when realizing a vhost device David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 09/12] memory: Drop mapping check from memory_region_get_ram_discard_manager() David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 10/12] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2022-12-28 14:05   ` Philippe Mathieu-Daudé
2022-12-28 14:06     ` David Hildenbrand
2022-12-28 14:07       ` Philippe Mathieu-Daudé
2021-10-27 12:45 ` [PATCH v1 11/12] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 12/12] virtio-mem: Expose device memory via multiple memslots David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-11-01 22:15 ` [PATCH v1 00/12] " Michael S. Tsirkin
2021-11-01 22:15   ` Michael S. Tsirkin
2021-11-02  8:33   ` David Hildenbrand
2021-11-02  8:33     ` David Hildenbrand
2021-11-02 11:35     ` Michael S. Tsirkin
2021-11-02 11:35       ` Michael S. Tsirkin
2021-11-02 11:55       ` David Hildenbrand
2021-11-02 11:55         ` David Hildenbrand
2021-11-02 17:06         ` Michael S. Tsirkin
2021-11-02 17:06           ` Michael S. Tsirkin
2021-11-02 17:10           ` David Hildenbrand
2021-11-02 17:10             ` David Hildenbrand
2021-11-07  8:14             ` Michael S. Tsirkin
2021-11-07  8:14               ` Michael S. Tsirkin
2021-11-07  9:21               ` David Hildenbrand
2021-11-07  9:21                 ` David Hildenbrand
2021-11-07 10:21                 ` Michael S. Tsirkin
2021-11-07 10:21                   ` Michael S. Tsirkin
2021-11-07 10:53                   ` David Hildenbrand [this message]
2021-11-07 10:53                     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41f72294-b449-2a42-d8b8-cf3de9314066@redhat.com \
    --to=david@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sebastien.boeuf@intel.com \
    --cc=stefanha@redhat.com \
    --cc=teawater@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.