All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PULL 13/41] virtio-mem: Paravirtualized memory hot(un)plug
Date: Fri, 3 Jul 2020 12:24:06 +0200	[thread overview]
Message-ID: <c7fdc0ec-c5df-7712-287c-86c8d022cbc0@redhat.com> (raw)
In-Reply-To: <20200703062243-mutt-send-email-mst@kernel.org>

On 03.07.20 12:23, Michael S. Tsirkin wrote:
> On Fri, Jul 03, 2020 at 11:18:42AM +0200, David Hildenbrand wrote:
>> On 03.07.20 11:04, Michael S. Tsirkin wrote:
>>> From: David Hildenbrand <david@redhat.com>
>>>
>>> This is the very basic/initial version of virtio-mem. An introduction to
>>> virtio-mem can be found in the Linux kernel driver [1]. While it can be
>>> used in the current state for hotplug of a smaller amount of memory, it
>>> will heavily benefit from resizeable memory regions in the future.
>>>
>>> Each virtio-mem device manages a memory region (provided via a memory
>>> backend). After requested by the hypervisor ("requested-size"), the
>>> guest can try to plug/unplug blocks of memory within that region, in order
>>> to reach the requested size. Initially, and after a reboot, all memory is
>>> unplugged (except in special cases - reboot during postcopy).
>>>
>>> The guest may only try to plug/unplug blocks of memory within the usable
>>> region size. The usable region size is a little bigger than the
>>> requested size, to give the device driver some flexibility. The usable
>>> region size will only grow, except on reboots or when all memory is
>>> requested to get unplugged. The guest can never plug more memory than
>>> requested. Unplugged memory will get zapped/discarded, similar to in a
>>> balloon device.
>>>
>>> The block size is variable, however, it is always chosen in a way such that
>>> THP splits are avoided (e.g., 2MB). The state of each block
>>> (plugged/unplugged) is tracked in a bitmap.
>>>
>>> As virtio-mem devices (e.g., virtio-mem-pci) will be memory devices, we now
>>> expose "VirtioMEMDeviceInfo" via "query-memory-devices".
>>>
>>> --------------------------------------------------------------------------
>>>
>>> There are two important follow-up items that are in the works:
>>> 1. Resizeable memory regions: Use resizeable allocations/RAM blocks to
>>>    grow/shrink along with the usable region size. This avoids creating
>>>    initially very big VMAs, RAM blocks, and KVM slots.
>>> 2. Protection of unplugged memory: Make sure the gust cannot actually
>>>    make use of unplugged memory.
>>>
>>> Other follow-up items that are in the works:
>>> 1. Exclude unplugged memory during migration (via precopy notifier).
>>> 2. Handle remapping of memory.
>>> 3. Support for other architectures.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> Example usage (virtio-mem-pci is introduced in follow-up patches):
>>>
>>> Start QEMU with two virtio-mem devices (one per NUMA node):
>>>  $ qemu-system-x86_64 -m 4G,maxmem=20G \
>>>   -smp sockets=2,cores=2 \
>>>   -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \
>>>   [...]
>>>   -object memory-backend-ram,id=mem0,size=8G \
>>>   -device virtio-mem-pci,id=vm0,memdev=mem0,node=0,requested-size=0M \
>>>   -object memory-backend-ram,id=mem1,size=8G \
>>>   -device virtio-mem-pci,id=vm1,memdev=mem1,node=1,requested-size=1G
>>>
>>> Query the configuration:
>>>  (qemu) info memory-devices
>>>  Memory device [virtio-mem]: "vm0"
>>>    memaddr: 0x140000000
>>>    node: 0
>>>    requested-size: 0
>>>    size: 0
>>>    max-size: 8589934592
>>>    block-size: 2097152
>>>    memdev: /objects/mem0
>>>  Memory device [virtio-mem]: "vm1"
>>>    memaddr: 0x340000000
>>>    node: 1
>>>    requested-size: 1073741824
>>>    size: 1073741824
>>>    max-size: 8589934592
>>>    block-size: 2097152
>>>    memdev: /objects/mem1
>>>
>>> Add some memory to node 0:
>>>  (qemu) qom-set vm0 requested-size 500M
>>>
>>> Remove some memory from node 1:
>>>  (qemu) qom-set vm1 requested-size 200M
>>>
>>> Query the configuration again:
>>>  (qemu) info memory-devices
>>>  Memory device [virtio-mem]: "vm0"
>>>    memaddr: 0x140000000
>>>    node: 0
>>>    requested-size: 524288000
>>>    size: 524288000
>>>    max-size: 8589934592
>>>    block-size: 2097152
>>>    memdev: /objects/mem0
>>>  Memory device [virtio-mem]: "vm1"
>>>    memaddr: 0x340000000
>>>    node: 1
>>>    requested-size: 209715200
>>>    size: 209715200
>>>    max-size: 8589934592
>>>    block-size: 2097152
>>>    memdev: /objects/mem1
>>>
>>> [1] https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com
>>>
>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>>> Cc: Eric Blake <eblake@redhat.com>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>> Cc: Igor Mammedov <imammedo@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> Message-Id: <20200626072248.78761-11-david@redhat.com>
>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>  qapi/misc.json                 |  39 +-
>>>  include/hw/virtio/virtio-mem.h |  78 ++++
>>>  hw/virtio/virtio-mem.c         | 724 +++++++++++++++++++++++++++++++++
>>>  hw/virtio/Kconfig              |  11 +
>>>  hw/virtio/Makefile.objs        |   1 +
>>>  5 files changed, 852 insertions(+), 1 deletion(-)
>>>  create mode 100644 include/hw/virtio/virtio-mem.h
>>>  create mode 100644 hw/virtio/virtio-mem.c
>>>
>>> diff --git a/qapi/misc.json b/qapi/misc.json
>>> index a5a0beb902..65ca3edf32 100644
>>> --- a/qapi/misc.json
>>> +++ b/qapi/misc.json
>>> @@ -1356,19 +1356,56 @@
>>>            }
>>>  }
>>>  
>>> +##
>>> +# @VirtioMEMDeviceInfo:
>>> +#
>>> +# VirtioMEMDevice state information
>>> +#
>>> +# @id: device's ID
>>> +#
>>> +# @memaddr: physical address in memory, where device is mapped
>>> +#
>>> +# @requested-size: the user requested size of the device
>>> +#
>>> +# @size: the (current) size of memory that the device provides
>>> +#
>>> +# @max-size: the maximum size of memory that the device can provide
>>> +#
>>> +# @block-size: the block size of memory that the device provides
>>> +#
>>> +# @node: NUMA node number where device is assigned to
>>> +#
>>> +# @memdev: memory backend linked with the region
>>> +#
>>> +# Since: 5.1
>>> +##
>>> +{ 'struct': 'VirtioMEMDeviceInfo',
>>> +  'data': { '*id': 'str',
>>> +            'memaddr': 'size',
>>> +            'requested-size': 'size',
>>> +            'size': 'size',
>>> +            'max-size': 'size',
>>> +            'block-size': 'size',
>>> +            'node': 'int',
>>> +            'memdev': 'str'
>>> +          }
>>> +}
>>> +
>>>  ##
>>>  # @MemoryDeviceInfo:
>>>  #
>>>  # Union containing information about a memory device
>>>  #
>>>  # nvdimm is included since 2.12. virtio-pmem is included since 4.1.
>>> +# virtio-mem is included since 5.1.
>>>  #
>>>  # Since: 2.1
>>>  ##
>>>  { 'union': 'MemoryDeviceInfo',
>>>    'data': { 'dimm': 'PCDIMMDeviceInfo',
>>>              'nvdimm': 'PCDIMMDeviceInfo',
>>> -            'virtio-pmem': 'VirtioPMEMDeviceInfo'
>>> +            'virtio-pmem': 'VirtioPMEMDeviceInfo',
>>> +            'virtio-mem': 'VirtioMEMDeviceInfo'
>>>            }
>>>  }
>>>  
>>> diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
>>> new file mode 100644
>>> index 0000000000..6981096f7c
>>> --- /dev/null
>>> +++ b/include/hw/virtio/virtio-mem.h
>>> @@ -0,0 +1,78 @@
>>> +/*
>>> + * Virtio MEM device
>>> + *
>>> + * Copyright (C) 2020 Red Hat, Inc.
>>> + *
>>> + * Authors:
>>> + *  David Hildenbrand <david@redhat.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#ifndef HW_VIRTIO_MEM_H
>>> +#define HW_VIRTIO_MEM_H
>>> +
>>> +#include "standard-headers/linux/virtio_mem.h"
>>> +#include "hw/virtio/virtio.h"
>>> +#include "qapi/qapi-types-misc.h"
>>> +#include "sysemu/hostmem.h"
>>> +
>>> +#define TYPE_VIRTIO_MEM "virtio-mem"
>>> +
>>> +#define VIRTIO_MEM(obj) \
>>> +        OBJECT_CHECK(VirtIOMEM, (obj), TYPE_VIRTIO_MEM)
>>> +#define VIRTIO_MEM_CLASS(oc) \
>>> +        OBJECT_CLASS_CHECK(VirtIOMEMClass, (oc), TYPE_VIRTIO_MEM)
>>> +#define VIRTIO_MEM_GET_CLASS(obj) \
>>> +        OBJECT_GET_CLASS(VirtIOMEMClass, (obj), TYPE_VIRTIO_MEM)
>>> +
>>> +#define VIRTIO_MEM_MEMDEV_PROP "memdev"
>>> +#define VIRTIO_MEM_NODE_PROP "node"
>>> +#define VIRTIO_MEM_SIZE_PROP "size"
>>> +#define VIRTIO_MEM_REQUESTED_SIZE_PROP "requested-size"
>>> +#define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size"
>>> +#define VIRTIO_MEM_ADDR_PROP "memaddr"
>>> +
>>> +typedef struct VirtIOMEM {
>>> +    VirtIODevice parent_obj;
>>> +
>>> +    /* guest -> host request queue */
>>> +    VirtQueue *vq;
>>> +
>>> +    /* bitmap used to track unplugged memory */
>>> +    int32_t bitmap_size;
>>> +    unsigned long *bitmap;
>>> +
>>> +    /* assigned memory backend and memory region */
>>> +    HostMemoryBackend *memdev;
>>> +
>>> +    /* NUMA node */
>>> +    uint32_t node;
>>> +
>>> +    /* assigned address of the region in guest physical memory */
>>> +    uint64_t addr;
>>> +
>>> +    /* usable region size (<= region_size) */
>>> +    uint64_t usable_region_size;
>>> +
>>> +    /* actual size (how much the guest plugged) */
>>> +    uint64_t size;
>>> +
>>> +    /* requested size */
>>> +    uint64_t requested_size;
>>> +
>>> +    /* block size and alignment */
>>> +    uint64_t block_size;
>>> +} VirtIOMEM;
>>> +
>>> +typedef struct VirtIOMEMClass {
>>> +    /* private */
>>> +    VirtIODevice parent;
>>> +
>>> +    /* public */
>>> +    void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *vi);
>>> +    MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp);
>>> +} VirtIOMEMClass;
>>> +
>>> +#endif
>>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>>> new file mode 100644
>>> index 0000000000..d8a0c974d3
>>> --- /dev/null
>>> +++ b/hw/virtio/virtio-mem.c
>>> @@ -0,0 +1,724 @@
>>> +/*
>>> + * Virtio MEM device
>>> + *
>>> + * Copyright (C) 2020 Red Hat, Inc.
>>> + *
>>> + * Authors:
>>> + *  David Hildenbrand <david@redhat.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu-common.h"
>>> +#include "qemu/iov.h"
>>> +#include "qemu/cutils.h"
>>> +#include "qemu/error-report.h"
>>> +#include "qemu/units.h"
>>> +#include "sysemu/numa.h"
>>> +#include "sysemu/sysemu.h"
>>> +#include "sysemu/reset.h"
>>> +#include "hw/virtio/virtio.h"
>>> +#include "hw/virtio/virtio-bus.h"
>>> +#include "hw/virtio/virtio-access.h"
>>> +#include "hw/virtio/virtio-mem.h"
>>> +#include "qapi/error.h"
>>> +#include "qapi/visitor.h"
>>> +#include "exec/ram_addr.h"
>>> +#include "migration/misc.h"
>>> +#include "hw/boards.h"
>>> +#include "hw/qdev-properties.h"
>>> +#include "config-devices.h"
>>> +
>>> +/*
>>> + * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
>>> + * memory (e.g., 2MB on x86_64).
>>> + */
>>> +#define VIRTIO_MEM_MIN_BLOCK_SIZE QEMU_VMALLOC_ALIGN
>>> +/*
>>> + * Size the usable region bigger than the requested size if possible. Esp.
>>> + * Linux guests will only add (aligned) memory blocks in case they fully
>>> + * fit into the usable region, but plug+online only a subset of the pages.
>>> + * The memory block size corresponds mostly to the section size.
>>> + *
>>> + * This allows e.g., to add 20MB with a section size of 128MB on x86_64, and
>>> + * a section size of 1GB on arm64 (as long as the start address is properly
>>> + * aligned, similar to ordinary DIMMs).
>>> + *
>>> + * We can change this at any time and maybe even make it configurable if
>>> + * necessary (as the section size can change). But it's more likely that the
>>> + * section size will rather get smaller and not bigger over time.
>>> + */
>>> +#if defined(__x86_64__)
>>> +#define VIRTIO_MEM_USABLE_EXTENT (2 * (128 * MiB))
>>
>> I just did a cross-compile on s390x and noticed that this should be
>> guarded by defined(TARGET_X86_64) (it's target dependent).
>>
>> Sorry for the noise.
>>
>> -- 
>> Thanks,
>>
>> David / dhildenb
> 
> 
> OK - can you post a fixup patch pls?

Yep, thanks!

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2020-07-03 10:25 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03  9:03 [PULL 00/41] virtio,acpi: features, fixes, cleanups Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 01/41] tests: disassemble-aml.sh: generate AML in readable format Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 02/41] Revert "tests/migration: Reduce autoconverge initial bandwidth" Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 03/41] virtio-balloon: always indicate S_DONE when migration fails Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 04/41] pc: Support coldplugging of virtio-pmem-pci devices on all buses Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 05/41] exec: Introduce ram_block_discard_(disable|require)() Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 06/41] vfio: Convert to ram_block_discard_disable() Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 07/41] accel/kvm: " Michael S. Tsirkin
2020-07-03  9:03   ` Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 08/41] s390x/pv: " Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 09/41] virtio-balloon: Rip out qemu_balloon_inhibit() Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 10/41] target/i386: sev: Use ram_block_discard_disable() Michael S. Tsirkin
2020-07-03  9:03 ` [PULL 11/41] migration/rdma: " Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 12/41] migration/colo: " Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 13/41] virtio-mem: Paravirtualized memory hot(un)plug Michael S. Tsirkin
2020-07-03  9:18   ` David Hildenbrand
2020-07-03  9:32     ` David Hildenbrand
2020-07-03 10:23     ` Michael S. Tsirkin
2020-07-03 10:24       ` David Hildenbrand [this message]
2020-07-03  9:04 ` [PULL 14/41] virtio-pci: Proxy for virtio-mem Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 15/41] MAINTAINERS: Add myself as virtio-mem maintainer Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 16/41] hmp: Handle virtio-mem when printing memory device info Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 17/41] numa: Handle virtio-mem in NUMA stats Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 18/41] pc: Support for virtio-mem-pci Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 19/41] virtio-mem: Allow notifiers for size changes Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 20/41] virtio-pci: Send qapi events when the virtio-mem " Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 21/41] virtio-mem: Migration sanity checks Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 22/41] virtio-mem: Add trace events Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 23/41] virtio-mem: Exclude unplugged memory during migration Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 24/41] numa: Auto-enable NUMA when any memory devices are possible Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 25/41] tests/acpi: remove stale allowed tables Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 26/41] docs: vhost-user: add Virtio status protocol feature Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 27/41] MAINTAINERS: add VT-d entry Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 28/41] net: introduce qemu_get_peer Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 29/41] vhost_net: use the function qemu_get_peer Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 30/41] virtio-bus: introduce queue_enabled method Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 31/41] virtio-pci: implement " Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 32/41] vhost: check the existence of vhost_set_iotlb_callback Michael S. Tsirkin
2020-07-03  9:04 ` [PULL 33/41] vhost: introduce new VhostOps vhost_dev_start Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 34/41] vhost: implement vhost_dev_start method Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 35/41] vhost: introduce new VhostOps vhost_vq_get_addr Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 36/41] vhost: implement vhost_vq_get_addr method Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 37/41] vhost: introduce new VhostOps vhost_force_iommu Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 38/41] vhost: implement vhost_force_iommu method Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 39/41] vhost_net: introduce set_config & get_config Michael S. Tsirkin
2020-07-03  9:05 ` [PULL 40/41] vhost-vdpa: introduce vhost-vdpa backend Michael S. Tsirkin
2020-07-08  0:07   ` Bruce Rogers
2020-07-08  4:17     ` Cindy Lu
2020-07-03  9:05 ` [PULL 41/41] vhost-vdpa: introduce vhost-vdpa net client Michael S. Tsirkin
2020-07-03  9:31 ` [PULL 00/41] virtio,acpi: features, fixes, cleanups no-reply
2020-07-03 11:58 ` Michael S. Tsirkin
2020-07-04 14:05 ` Peter Maydell
2020-07-04 18:36   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7fdc0ec-c5df-7712-287c-86c8d022cbc0@redhat.com \
    --to=david@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.