All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/12] virtio-mem: vfio support
@ 2021-02-22 11:56 David Hildenbrand
  2021-02-22 11:56 ` [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, David Hildenbrand, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Peter Xu, Pankaj Gupta, Auger Eric,
	Alex Williamson, teawater, Paolo Bonzini, Igor Mammedov,
	Marek Kedzierski

@Paolo, I have acks for most things - what's missing seems to be an ack
from you on patch #1. As I want to make use of RamDiscardMgr also to handle
virtio-mem+guest-memory-dump/tpm [1] and properly handle all kinds of
migration (precopy, background snapshots, postcopy), could you have a look
at patch #1? Thanks!

I think it would be best to take this via @MST.

[1] https://lkml.kernel.org/r/20210210171537.32932-1-david@redhat.com

---

A virtio-mem device manages a memory region in guest physical address
space, represented as a single (currently large) memory region in QEMU,
mapped into system memory address space. Before the guest is allowed to use
memory blocks, it must coordinate with the hypervisor (plug blocks). After
a reboot, all memory is usually unplugged - when the guest comes up, it
detects the virtio-mem device and selects memory blocks to plug (based on
resize requests from the hypervisor).

Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem
device (triggered by the guest). When unplugging blocks, we discard the
memory - similar to memory balloon inflation. In contrast to memory
ballooning, we always know which memory blocks a guest may actually use -
especially during a reboot, after a crash, or after kexec (and during
hibernation as well). Guests agreed to not access unplugged memory again,
especially not via DMA.

The issue with vfio is, that it cannot deal with random discards - for this
reason, virtio-mem and vfio can currently only run mutually exclusive.
Especially, vfio would currently map the whole memory region (with possible
only little/no plugged blocks), resulting in all pages getting pinned and
therefore resulting in a higher memory consumption than expected (turning
virtio-mem basically useless in these environments).

To make vfio work nicely with virtio-mem, we have to map only the plugged
blocks, and map/unmap properly when plugging/unplugging blocks (including
discarding of RAM when unplugging). We achieve that by using a new notifier
mechanism that communicates changes.

It's important to map memory in the granularity in which we could see
unmaps again (-> virtio-mem block size) - so when e.g., plugging
consecutive 100 MB with a block size of 2 MB, we need 50 mappings. When
unmapping, we can use a single vfio_unmap call for the applicable range.
We expect that the block size of virtio-mem devices will be fairly large
in the future (to not run out of mappings and to improve hot(un)plug
performance), configured by the user, when used with vfio (e.g., 128MB,
1G, ...), but it will depend on the setup.

More info regarding virtio-mem can be found at:
    https://virtio-mem.gitlab.io/

v6 is located at:
  git@github.com:davidhildenbrand/qemu.git virtio-mem-vfio-v6

v5 -> v6:
- "memory: Introduce RamDiscardMgr for RAM memory regions"
-- Fix variable names in one prototype.
- "virtio-mem: Don't report errors when ram_block_discard_range() fails"
-- Added
- "virtio-mem: Implement RamDiscardMgr interface"
-- Don't report an error if discarding fails
- Rebased and retested

v4 -> v5:
- "vfio: Support for RamDiscardMgr in the !vIOMMU case"
-- Added more assertions for granularity vs. iommu supported pagesize
- "vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr"
-- Fix accounting of mappings
- "vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus"
-- Fence off SPAPR and add some comments regarding future support.
-- Tweak patch description
- Rebase and retest

v3 -> v4:
- "vfio: Query and store the maximum number of DMA mappings
-- Limit the patch to querying and storing only
-- Renamed to "vfio: Query and store the maximum number of possible DMA
   mappings"
- "vfio: Support for RamDiscardMgr in the !vIOMMU case"
-- Remove sanity checks / warning the user
- "vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr"
-- Perform sanity checks by looking at the number of memslots and all
   registered RamDiscardMgr sections
- Rebase and retest
- Reshuffled the patches slightly

v2 -> v3:
- Rebased + retested
- Fixed some typos
- Added RB's

v1 -> v2:
- "memory: Introduce RamDiscardMgr for RAM memory regions"
-- Fix some errors in the documentation
-- Make register_listener() notify about populated parts and
   unregister_listener() notify about discarding populated parts, to
   simplify future locking inside virtio-mem, when handling requests via a
   separate thread.
- "vfio: Query and store the maximum number of DMA mappings"
-- Query number of mappings and track mappings (except for vIOMMU)
- "vfio: Support for RamDiscardMgr in the !vIOMMU case"
-- Adapt to RamDiscardMgr changes and warn via generic DMA reservation
- "vfio: Support for RamDiscardMgr in the vIOMMU case"
-- Use vmstate priority to handle migration dependencies

RFC - v1:
- VFIO migration code. Due to missing kernel support, I cannot really test
  if that part works.
- Understand/test/document vIOMMU implications, also regarding migration
- Nicer ram_block_discard_disable/require handling.
- s/SparseRAMHandler/RamDiscardMgr/, refactorings, cleanups, documentation,
  testing, ...

David Hildenbrand (12):
  memory: Introduce RamDiscardMgr for RAM memory regions
  virtio-mem: Factor out traversing unplugged ranges
  virtio-mem: Don't report errors when ram_block_discard_range() fails
  virtio-mem: Implement RamDiscardMgr interface
  vfio: Support for RamDiscardMgr in the !vIOMMU case
  vfio: Query and store the maximum number of possible DMA mappings
  vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr
  vfio: Support for RamDiscardMgr in the vIOMMU case
  softmmu/physmem: Don't use atomic operations in
    ram_block_discard_(disable|require)
  softmmu/physmem: Extend ram_block_discard_(require|disable) by two
    discard types
  virtio-mem: Require only coordinated discards
  vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus

 hw/vfio/common.c               | 348 ++++++++++++++++++++++++++++++--
 hw/virtio/virtio-mem.c         | 350 ++++++++++++++++++++++++++++-----
 include/exec/memory.h          | 249 ++++++++++++++++++++++-
 include/hw/vfio/vfio-common.h  |  13 ++
 include/hw/virtio/virtio-mem.h |   3 +
 include/migration/vmstate.h    |   1 +
 softmmu/memory.c               |  22 +++
 softmmu/physmem.c              | 108 +++++++---
 8 files changed, 1000 insertions(+), 94 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
@ 2021-02-22 11:56 ` David Hildenbrand
  2021-02-22 13:27   ` Paolo Bonzini
  2021-02-22 11:56 ` [PATCH v6 02/12] virtio-mem: Factor out traversing unplugged ranges David Hildenbrand
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Alex Williamson, Peter Xu, Dr . David Alan Gilbert, Auger Eric,
	Pankaj Gupta, teawater, Igor Mammedov, Paolo Bonzini,
	Marek Kedzierski

We have some special RAM memory regions (managed by virtio-mem), whereby
the guest agreed to only use selected memory ranges. "unused" parts are
discarded so they won't consume memory - to logically unplug these memory
ranges. Before the VM is allowed to use such logically unplugged memory
again, coordination with the hypervisor is required.

This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only
coordinated parts are valid to be used/accessed by the VM.

In most cases, we don't care about that - e.g., in KVM, we simply have a
single KVM memory slot. However, in case of vfio, registering the
whole region with the kernel results in all pages getting pinned, and
therefore an unexpected high memory consumption - discarding of RAM in
that context is broken.

Let's introduce a way to coordinate discarding/populating memory within a
RAM memory region with such special consumers of RAM memory regions: they
can register as listeners and get updates on memory getting discarded and
populated. Using this machinery, vfio will be able to map only the
currently populated parts, resulting in discarded parts not getting pinned
and not consuming memory.

A RamDiscardMgr has to be set for a memory region before it is getting
mapped, and cannot change while the memory region is mapped.

Note: At some point, we might want to let RAMBlock users (esp. vfio used
for nvme://) consume this interface as well. We'll need RAMBlock notifier
calls when a RAMBlock is getting mapped/unmapped (via the corresponding
memory region), so we can properly register a listener there as well.

Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/memory.h | 231 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/memory.c      |  22 ++++
 2 files changed, 253 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c6fb714e49..6132910767 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -42,6 +42,12 @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass;
 DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass,
                      IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION)
 
+#define TYPE_RAM_DISCARD_MGR "qemu:ram-discard-mgr"
+typedef struct RamDiscardMgrClass RamDiscardMgrClass;
+typedef struct RamDiscardMgr RamDiscardMgr;
+DECLARE_OBJ_CHECKERS(RamDiscardMgr, RamDiscardMgrClass, RAM_DISCARD_MGR,
+                     TYPE_RAM_DISCARD_MGR);
+
 #ifdef CONFIG_FUZZ
 void fuzz_dma_read_cb(size_t addr,
                       size_t len,
@@ -124,6 +130,66 @@ typedef struct IOMMUTLBEvent {
     IOMMUTLBEntry entry;
 } IOMMUTLBEvent;
 
+struct RamDiscardListener;
+typedef int (*NotifyRamPopulate)(struct RamDiscardListener *rdl,
+                                 const MemoryRegion *mr, ram_addr_t offset,
+                                 ram_addr_t size);
+typedef void (*NotifyRamDiscard)(struct RamDiscardListener *rdl,
+                                 const MemoryRegion *mr, ram_addr_t offset,
+                                 ram_addr_t size);
+typedef void (*NotifyRamDiscardAll)(struct RamDiscardListener *rdl,
+                                    const MemoryRegion *mr);
+
+typedef struct RamDiscardListener {
+    /*
+     * @notify_populate:
+     *
+     * Notification that previously discarded memory is about to get populated.
+     * Listeners are able to object. If any listener objects, already
+     * successfully notified listeners are notified about a discard again.
+     *
+     * @rdl: the #RamDiscardListener getting notified
+     * @mr: the relevant #MemoryRegion
+     * @offset: offset into the #MemoryRegion, aligned to minimum granularity of
+     *          the #RamDiscardMgr
+     * @size: the size, aligned to minimum granularity of the #RamDiscardMgr
+     *
+     * Returns 0 on success. If the notification is rejected by the listener,
+     * an error is returned.
+     */
+    NotifyRamPopulate notify_populate;
+
+    /*
+     * @notify_discard:
+     *
+     * Notification that previously populated memory was discarded successfully
+     * and listeners should drop all references to such memory and prevent
+     * new population (e.g., unmap).
+     *
+     * @rdl: the #RamDiscardListener getting notified
+     * @mr: the relevant #MemoryRegion
+     * @offset: offset into the #MemoryRegion, aligned to minimum granularity of
+     *          the #RamDiscardMgr
+     * @size: the size, aligned to minimum granularity of the #RamDiscardMgr
+     */
+    NotifyRamDiscard notify_discard;
+
+    /*
+     * @notify_discard_all:
+     *
+     * Notification that all previously populated memory was discarded
+     * successfully.
+     *
+     * Note: this callback is optional. If not set, individual notify_populate()
+     * notifications are triggered.
+     *
+     * @rdl: the #RamDiscardListener getting notified
+     * @mr: the relevant #MemoryRegion
+     */
+    NotifyRamDiscardAll notify_discard_all;
+    QLIST_ENTRY(RamDiscardListener) next;
+} RamDiscardListener;
+
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC   (1 << 0)
 
@@ -167,6 +233,16 @@ static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
     n->iommu_idx = iommu_idx;
 }
 
+static inline void ram_discard_listener_init(RamDiscardListener *rdl,
+                                             NotifyRamPopulate populate_fn,
+                                             NotifyRamDiscard discard_fn,
+                                             NotifyRamDiscardAll discard_all_fn)
+{
+    rdl->notify_populate = populate_fn;
+    rdl->notify_discard = discard_fn;
+    rdl->notify_discard_all = discard_all_fn;
+}
+
 /*
  * Memory region callbacks
  */
@@ -441,6 +517,126 @@ struct IOMMUMemoryRegionClass {
                                      Error **errp);
 };
 
+/*
+ * RamDiscardMgrClass:
+ *
+ * A #RamDiscardMgr coordinates which parts of specific RAM #MemoryRegion
+ * regions are currently populated to be used/accessed by the VM, notifying
+ * after parts were discarded (freeing up memory) and before parts will be
+ * populated (consuming memory), to be used/acessed by the VM.
+ *
+ * A #RamDiscardMgr can only be set for a RAM #MemoryRegion while the
+ * #MemoryRegion isn't mapped yet; it cannot change while the #MemoryRegion is
+ * mapped.
+ *
+ * The #RamDiscardMgr is intended to be used by technologies that are
+ * incompatible with discarding of RAM (e.g., VFIO, which may pin all
+ * memory inside a #MemoryRegion), and require proper coordination to only
+ * map the currently populated parts, to hinder parts that are expected to
+ * remain discarded from silently getting populated and consuming memory.
+ * Technologies that support discarding of RAM don't have to bother and can
+ * simply map the whole #MemoryRegion.
+ *
+ * An example #RamDiscardMgr is virtio-mem, which logically (un)plugs
+ * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
+ * Logically unplugging memory consists of discarding RAM. The VM agreed to not
+ * access unplugged (discarded) memory - especially via DMA. virtio-mem will
+ * properly coordinate with listeners before memory is plugged (populated),
+ * and after memory is unplugged (discarded).
+ *
+ * Listeners are called in multiples of the minimum granularity and changes are
+ * aligned to the minimum granularity within the #MemoryRegion. Listeners have
+ * to prepare for memory becomming discarded in a different granularity than it
+ * was populated and the other way around.
+ */
+struct RamDiscardMgrClass {
+    /* private */
+    InterfaceClass parent_class;
+
+    /* public */
+
+    /**
+     * @get_min_granularity:
+     *
+     * Get the minimum granularity in which listeners will get notified
+     * about changes within the #MemoryRegion via the #RamDiscardMgr.
+     *
+     * @rdm: the #RamDiscardMgr
+     * @mr: the #MemoryRegion
+     *
+     * Returns the minimum granularity.
+     */
+    uint64_t (*get_min_granularity)(const RamDiscardMgr *rdm,
+                                    const MemoryRegion *mr);
+
+    /**
+     * @is_populated:
+     *
+     * Check whether the given range within the #MemoryRegion is completely
+     * populated (i.e., no parts are currently discarded). There are no
+     * alignment requirements for the range.
+     *
+     * @rdm: the #RamDiscardMgr
+     * @mr: the #MemoryRegion
+     * @offset: offset into the #MemoryRegion
+     * @size: size in the #MemoryRegion
+     *
+     * Returns whether the given range is completely populated.
+     */
+    bool (*is_populated)(const RamDiscardMgr *rdm, const MemoryRegion *mr,
+                         ram_addr_t offset, ram_addr_t size);
+
+    /**
+     * @register_listener:
+     *
+     * Register a #RamDiscardListener for a #MemoryRegion via the
+     * #RamDiscardMgr and immediately notify the #RamDiscardListener about all
+     * populated parts within the #MemoryRegion via the #RamDiscardMgr.
+     *
+     * In case any notification fails, no further notifications are triggered
+     * and an error is logged.
+     *
+     * @rdm: the #RamDiscardMgr
+     * @mr: the #MemoryRegion
+     * @rdl: the #RamDiscardListener
+     */
+    void (*register_listener)(RamDiscardMgr *rdm, const MemoryRegion *mr,
+                              RamDiscardListener *rdl);
+
+    /**
+     * @unregister_listener:
+     *
+     * Unregister a previously registered #RamDiscardListener for a
+     * #MemoryRegion via the #RamDiscardMgr after notifying the
+     * #RamDiscardListener about all populated parts becoming unpopulated
+     * within the #MemoryRegion via the #RamDiscardMgr.
+     *
+     * @rdm: the #RamDiscardMgr
+     * @mr: the #MemoryRegion
+     * @rdl: the #RamDiscardListener
+     */
+    void (*unregister_listener)(RamDiscardMgr *rdm, const MemoryRegion *mr,
+                                RamDiscardListener *rdl);
+
+    /**
+     * @replay_populated:
+     *
+     * Notify the #RamDiscardListener about all populated parts within the
+     * #MemoryRegion via the #RamDiscardMgr.
+     *
+     * In case any notification fails, no further notifications are triggered.
+     * The listener is not required to be registered.
+     *
+     * @rdm: the #RamDiscardMgr
+     * @mr: the #MemoryRegion
+     * @rdl: the #RamDiscardListener
+     *
+     * Returns 0 on success, or a negative error if any notification failed.
+     */
+    int (*replay_populated)(const RamDiscardMgr *rdm, const MemoryRegion *mr,
+                            RamDiscardListener *rdl);
+};
+
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
 
@@ -487,6 +683,7 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
+    RamDiscardMgr *rdm; /* Only for RAM */
 };
 
 struct IOMMUMemoryRegion {
@@ -1979,6 +2176,40 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr);
  */
 bool memory_region_is_mapped(MemoryRegion *mr);
 
+/**
+ * memory_region_get_ram_discard_mgr: get the #RamDiscardMgr for a
+ * #MemoryRegion
+ *
+ * The #RamDiscardMgr cannot change while a memory region is mapped.
+ *
+ * @mr: the #MemoryRegion
+ */
+RamDiscardMgr *memory_region_get_ram_discard_mgr(MemoryRegion *mr);
+
+/**
+ * memory_region_has_ram_discard_mgr: check whether a #MemoryRegion has a
+ * #RamDiscardMgr assigned
+ *
+ * @mr: the #MemoryRegion
+ */
+static inline bool memory_region_has_ram_discard_mgr(MemoryRegion *mr)
+{
+    return !!memory_region_get_ram_discard_mgr(mr);
+}
+
+/**
+ * memory_region_set_ram_discard_mgr: set the #RamDiscardMgr for a
+ * #MemoryRegion
+ *
+ * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion
+ * that does not cover RAM, or a #MemoryRegion that already has a
+ * #RamDiscardMgr assigned.
+ *
+ * @mr: the #MemoryRegion
+ * @urn: #RamDiscardMgr to set
+ */
+void memory_region_set_ram_discard_mgr(MemoryRegion *mr, RamDiscardMgr *rdm);
+
 /**
  * memory_region_find: translate an address/size relative to a
  * MemoryRegion into a #MemoryRegionSection.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 874a8fccde..5e7bea7661 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2029,6 +2029,21 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr)
     return imrc->num_indexes(iommu_mr);
 }
 
+RamDiscardMgr *memory_region_get_ram_discard_mgr(MemoryRegion *mr)
+{
+    if (!memory_region_is_mapped(mr) || !memory_region_is_ram(mr)) {
+        return NULL;
+    }
+    return mr->rdm;
+}
+
+void memory_region_set_ram_discard_mgr(MemoryRegion *mr, RamDiscardMgr *rdm)
+{
+    g_assert(memory_region_is_ram(mr) && !memory_region_is_mapped(mr));
+    g_assert(!rdm || !mr->rdm);
+    mr->rdm = rdm;
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
     uint8_t mask = 1 << client;
@@ -3309,10 +3324,17 @@ static const TypeInfo iommu_memory_region_info = {
     .abstract           = true,
 };
 
+static const TypeInfo ram_discard_mgr_info = {
+    .parent             = TYPE_INTERFACE,
+    .name               = TYPE_RAM_DISCARD_MGR,
+    .class_size         = sizeof(RamDiscardMgrClass),
+};
+
 static void memory_register_types(void)
 {
     type_register_static(&memory_region_info);
     type_register_static(&iommu_memory_region_info);
+    type_register_static(&ram_discard_mgr_info);
 }
 
 type_init(memory_register_types)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 02/12] virtio-mem: Factor out traversing unplugged ranges
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
  2021-02-22 11:56 ` [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
@ 2021-02-22 11:56 ` David Hildenbrand
  2021-02-22 11:56 ` [PATCH v6 03/12] virtio-mem: Don't report errors when ram_block_discard_range() fails David Hildenbrand
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Alex Williamson, Peter Xu, Dr . David Alan Gilbert, Auger Eric,
	Pankaj Gupta, teawater, Igor Mammedov, Paolo Bonzini,
	Marek Kedzierski

Let's factor out the core logic, to be reused soon.

Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 86 ++++++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 37 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 655824ff81..471e464171 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -145,6 +145,33 @@ static bool virtio_mem_is_busy(void)
     return migration_in_incoming_postcopy() || !migration_is_idle();
 }
 
+typedef int (*virtio_mem_range_cb)(const VirtIOMEM *vmem, void *arg,
+                                   uint64_t offset, uint64_t size);
+
+static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg,
+                                               virtio_mem_range_cb cb)
+{
+    unsigned long first_zero_bit, last_zero_bit;
+    uint64_t offset, size;
+    int ret = 0;
+
+    first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size);
+    while (first_zero_bit < vmem->bitmap_size) {
+        offset = first_zero_bit * vmem->block_size;
+        last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
+                                      first_zero_bit + 1) - 1;
+        size = (last_zero_bit - first_zero_bit + 1) * vmem->block_size;
+
+        ret = cb(vmem, arg, offset, size);
+        if (ret) {
+            break;
+        }
+        first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
+                                            last_zero_bit + 2);
+    }
+    return ret;
+}
+
 static bool virtio_mem_test_bitmap(VirtIOMEM *vmem, uint64_t start_gpa,
                                    uint64_t size, bool plugged)
 {
@@ -594,33 +621,27 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     ram_block_discard_require(false);
 }
 
-static int virtio_mem_restore_unplugged(VirtIOMEM *vmem)
+static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg,
+                                       uint64_t offset, uint64_t size)
 {
     RAMBlock *rb = vmem->memdev->mr.ram_block;
-    unsigned long first_zero_bit, last_zero_bit;
-    uint64_t offset, length;
     int ret;
 
-    /* Find consecutive unplugged blocks and discard the consecutive range. */
-    first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size);
-    while (first_zero_bit < vmem->bitmap_size) {
-        offset = first_zero_bit * vmem->block_size;
-        last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
-                                      first_zero_bit + 1) - 1;
-        length = (last_zero_bit - first_zero_bit + 1) * vmem->block_size;
-
-        ret = ram_block_discard_range(rb, offset, length);
-        if (ret) {
-            error_report("Unexpected error discarding RAM: %s",
-                         strerror(-ret));
-            return -EINVAL;
-        }
-        first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
-                                            last_zero_bit + 2);
+    ret = ram_block_discard_range(rb, offset, size);
+    if (ret) {
+        error_report("Unexpected error discarding RAM: %s", strerror(-ret));
+        return -EINVAL;
     }
     return 0;
 }
 
+static int virtio_mem_restore_unplugged(VirtIOMEM *vmem)
+{
+    /* Make sure all memory is really discarded after migration. */
+    return virtio_mem_for_each_unplugged_range(vmem, NULL,
+                                               virtio_mem_discard_range_cb);
+}
+
 static int virtio_mem_post_load(void *opaque, int version_id)
 {
     if (migration_in_incoming_postcopy()) {
@@ -872,28 +893,19 @@ static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name,
     vmem->block_size = value;
 }
 
-static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem)
+static int virtio_mem_precopy_exclude_range_cb(const VirtIOMEM *vmem, void *arg,
+                                               uint64_t offset, uint64_t size)
 {
     void * const host = qemu_ram_get_host_addr(vmem->memdev->mr.ram_block);
-    unsigned long first_zero_bit, last_zero_bit;
-    uint64_t offset, length;
 
-    /*
-     * Find consecutive unplugged blocks and exclude them from migration.
-     *
-     * Note: Blocks cannot get (un)plugged during precopy, no locking needed.
-     */
-    first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size);
-    while (first_zero_bit < vmem->bitmap_size) {
-        offset = first_zero_bit * vmem->block_size;
-        last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
-                                      first_zero_bit + 1) - 1;
-        length = (last_zero_bit - first_zero_bit + 1) * vmem->block_size;
+    qemu_guest_free_page_hint(host + offset, size);
+    return 0;
+}
 
-        qemu_guest_free_page_hint(host + offset, length);
-        first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
-                                            last_zero_bit + 2);
-    }
+static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem)
+{
+    virtio_mem_for_each_unplugged_range(vmem, NULL,
+                                        virtio_mem_precopy_exclude_range_cb);
 }
 
 static int virtio_mem_precopy_notify(NotifierWithReturn *n, void *data)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 03/12] virtio-mem: Don't report errors when ram_block_discard_range() fails
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
  2021-02-22 11:56 ` [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
  2021-02-22 11:56 ` [PATCH v6 02/12] virtio-mem: Factor out traversing unplugged ranges David Hildenbrand
@ 2021-02-22 11:56 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 04/12] virtio-mem: Implement RamDiscardMgr interface David Hildenbrand
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S. Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

Any errors are unexpected and ram_block_discard_range() already properly
prints errors. Let's stop manually reporting errors.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 471e464171..bbe42ad83b 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -246,17 +246,14 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
                                       uint64_t size, bool plug)
 {
     const uint64_t offset = start_gpa - vmem->addr;
-    int ret;
+    RAMBlock *rb = vmem->memdev->mr.ram_block;
 
     if (virtio_mem_is_busy()) {
         return -EBUSY;
     }
 
     if (!plug) {
-        ret = ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size);
-        if (ret) {
-            error_report("Unexpected error discarding RAM: %s",
-                         strerror(-ret));
+        if (ram_block_discard_range(rb, offset, size)) {
             return -EBUSY;
         }
     }
@@ -345,15 +342,12 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *vmem,
 static int virtio_mem_unplug_all(VirtIOMEM *vmem)
 {
     RAMBlock *rb = vmem->memdev->mr.ram_block;
-    int ret;
 
     if (virtio_mem_is_busy()) {
         return -EBUSY;
     }
 
-    ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
-    if (ret) {
-        error_report("Unexpected error discarding RAM: %s", strerror(-ret));
+    if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
         return -EBUSY;
     }
     bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
@@ -625,14 +619,8 @@ static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg,
                                        uint64_t offset, uint64_t size)
 {
     RAMBlock *rb = vmem->memdev->mr.ram_block;
-    int ret;
 
-    ret = ram_block_discard_range(rb, offset, size);
-    if (ret) {
-        error_report("Unexpected error discarding RAM: %s", strerror(-ret));
-        return -EINVAL;
-    }
-    return 0;
+    return ram_block_discard_range(rb, offset, size) ? -EINVAL : 0;
 }
 
 static int virtio_mem_restore_unplugged(VirtIOMEM *vmem)
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 04/12] virtio-mem: Implement RamDiscardMgr interface
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-02-22 11:56 ` [PATCH v6 03/12] virtio-mem: Don't report errors when ram_block_discard_range() fails David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case David Hildenbrand
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

Let's properly notify when (un)plugging blocks, after discarding memory
and before allowing the guest to consume memory. Handle errors from
notifiers gracefully (e.g., no remaining VFIO mappings) when plugging,
rolling back the change and telling the guest that the VM is busy.

One special case to take care of is replaying all notifications after
restoring the vmstate. The device starts out with all memory discarded,
so after loading the vmstate, we have to notify about all plugged
blocks.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c         | 253 ++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-mem.h |   3 +
 2 files changed, 253 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index bbe42ad83b..3132e4b2b1 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -172,7 +172,105 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg,
     return ret;
 }
 
-static bool virtio_mem_test_bitmap(VirtIOMEM *vmem, uint64_t start_gpa,
+static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg,
+                                             virtio_mem_range_cb cb)
+{
+    unsigned long first_bit, last_bit;
+    uint64_t offset, size;
+    int ret = 0;
+
+    first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size);
+    while (first_bit < vmem->bitmap_size) {
+        offset = first_bit * vmem->block_size;
+        last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size,
+                                      first_bit + 1) - 1;
+        size = (last_bit - first_bit + 1) * vmem->block_size;
+
+        ret = cb(vmem, arg, offset, size);
+        if (ret) {
+            break;
+        }
+        first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size,
+                                  last_bit + 2);
+    }
+    return ret;
+}
+
+static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset,
+                                     uint64_t size)
+{
+    RamDiscardListener *rdl;
+
+    QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        rdl->notify_discard(rdl, &vmem->memdev->mr, offset, size);
+    }
+}
+
+static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset,
+                                  uint64_t size)
+{
+    RamDiscardListener *rdl, *rdl2;
+    int ret = 0;
+
+    QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        ret = rdl->notify_populate(rdl, &vmem->memdev->mr, offset, size);
+        if (ret) {
+            break;
+        }
+    }
+
+    if (ret) {
+        /* Could be a mapping attempt resulted in memory getting populated. */
+        ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size);
+
+        /* Notify all already-notified listeners. */
+        QLIST_FOREACH(rdl2, &vmem->rdl_list, next) {
+            if (rdl2 == rdl) {
+                break;
+            }
+            rdl2->notify_discard(rdl2, &vmem->memdev->mr, offset, size);
+        }
+    }
+    return ret;
+}
+
+static int virtio_mem_notify_discard_range_cb(const VirtIOMEM *vmem, void *arg,
+                                              uint64_t offset, uint64_t size)
+{
+    RamDiscardListener *rdl;
+
+    QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        if (!rdl->notify_discard_all) {
+            rdl->notify_discard(rdl, &vmem->memdev->mr, offset, size);
+        }
+    }
+    return 0;
+}
+
+static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem)
+{
+    bool individual_calls = false;
+    RamDiscardListener *rdl;
+
+    if (!vmem->size) {
+        return;
+    }
+
+    QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        if (rdl->notify_discard_all) {
+            rdl->notify_discard_all(rdl, &vmem->memdev->mr);
+        } else {
+            individual_calls = true;
+        }
+    }
+
+    if (individual_calls) {
+        virtio_mem_for_each_unplugged_range(vmem, NULL,
+                                            virtio_mem_notify_discard_range_cb);
+    }
+}
+
+static bool virtio_mem_test_bitmap(const VirtIOMEM *vmem, uint64_t start_gpa,
                                    uint64_t size, bool plugged)
 {
     const unsigned long first_bit = (start_gpa - vmem->addr) / vmem->block_size;
@@ -225,7 +323,8 @@ static void virtio_mem_send_response_simple(VirtIOMEM *vmem,
     virtio_mem_send_response(vmem, elem, &resp);
 }
 
-static bool virtio_mem_valid_range(VirtIOMEM *vmem, uint64_t gpa, uint64_t size)
+static bool virtio_mem_valid_range(const VirtIOMEM *vmem, uint64_t gpa,
+                                   uint64_t size)
 {
     if (!QEMU_IS_ALIGNED(gpa, vmem->block_size)) {
         return false;
@@ -256,6 +355,9 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
         if (ram_block_discard_range(rb, offset, size)) {
             return -EBUSY;
         }
+        virtio_mem_notify_unplug(vmem, offset, size);
+    } else if (virtio_mem_notify_plug(vmem, offset, size)) {
+        return -EBUSY;
     }
     virtio_mem_set_bitmap(vmem, start_gpa, size, plug);
     return 0;
@@ -350,6 +452,8 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
     if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
         return -EBUSY;
     }
+    virtio_mem_notify_unplug_all(vmem);
+
     bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
     if (vmem->size) {
         vmem->size = 0;
@@ -598,6 +702,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
     qemu_register_reset(virtio_mem_system_reset, vmem);
     precopy_add_notifier(&vmem->precopy_notifier);
+
+    /*
+     * Set ourselves as RamDiscardMgr before the plug handler maps the memory
+     * region and exposes it via an address space.
+     */
+    memory_region_set_ram_discard_mgr(&vmem->memdev->mr, RAM_DISCARD_MGR(vmem));
 }
 
 static void virtio_mem_device_unrealize(DeviceState *dev)
@@ -605,6 +715,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOMEM *vmem = VIRTIO_MEM(dev);
 
+    /*
+     * The unplug handler unmapped the memory region, it cannot be
+     * found via an address space anymore. Unset ourselves.
+     */
+    memory_region_set_ram_discard_mgr(&vmem->memdev->mr, NULL);
     precopy_remove_notifier(&vmem->precopy_notifier);
     qemu_unregister_reset(virtio_mem_system_reset, vmem);
     vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
@@ -630,13 +745,41 @@ static int virtio_mem_restore_unplugged(VirtIOMEM *vmem)
                                                virtio_mem_discard_range_cb);
 }
 
+static int virtio_mem_post_load_replay_cb(const VirtIOMEM *vmem, void *arg,
+                                          uint64_t offset, uint64_t size)
+{
+    RamDiscardListener *rdl;
+    int ret = 0;
+
+    QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
+        ret = rdl->notify_populate(rdl, &vmem->memdev->mr, offset, size);
+        if (ret) {
+            break;
+        }
+    }
+    return ret;
+}
+
 static int virtio_mem_post_load(void *opaque, int version_id)
 {
+    VirtIOMEM *vmem = VIRTIO_MEM(opaque);
+    int ret;
+
+    /*
+     * We started out with all memory discarded and our memory region is mapped
+     * into an address space. Replay, now that we updated the bitmap.
+     */
+    ret = virtio_mem_for_each_plugged_range(vmem, NULL,
+                                            virtio_mem_post_load_replay_cb);
+    if (ret) {
+        return ret;
+    }
+
     if (migration_in_incoming_postcopy()) {
         return 0;
     }
 
-    return virtio_mem_restore_unplugged(VIRTIO_MEM(opaque));
+    return virtio_mem_restore_unplugged(vmem);
 }
 
 typedef struct VirtIOMEMMigSanityChecks {
@@ -921,6 +1064,7 @@ static void virtio_mem_instance_init(Object *obj)
 
     notifier_list_init(&vmem->size_change_notifiers);
     vmem->precopy_notifier.notify = virtio_mem_precopy_notify;
+    QLIST_INIT(&vmem->rdl_list);
 
     object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size,
                         NULL, NULL, NULL);
@@ -940,11 +1084,104 @@ static Property virtio_mem_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardMgr *rdm,
+                                                   const MemoryRegion *mr)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+
+    g_assert(mr == &vmem->memdev->mr);
+    return vmem->block_size;
+}
+
+static bool virtio_mem_rdm_is_populated(const RamDiscardMgr *rdm,
+                                        const MemoryRegion *mr,
+                                        ram_addr_t offset, ram_addr_t size)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    uint64_t start_gpa = QEMU_ALIGN_DOWN(vmem->addr + offset, vmem->block_size);
+    uint64_t end_gpa = QEMU_ALIGN_UP(vmem->addr + offset + size,
+                                     vmem->block_size);
+
+    g_assert(mr == &vmem->memdev->mr);
+    if (!virtio_mem_valid_range(vmem, start_gpa, end_gpa - start_gpa)) {
+        return false;
+    }
+
+    return virtio_mem_test_bitmap(vmem, start_gpa, end_gpa - start_gpa, true);
+}
+
+static int virtio_mem_notify_populate_range_single_cb(const VirtIOMEM *vmem,
+                                                      void *arg,
+                                                      uint64_t offset,
+                                                      uint64_t size)
+{
+    RamDiscardListener *rdl = arg;
+
+    return rdl->notify_populate(rdl, &vmem->memdev->mr, offset, size);
+}
+
+static int virtio_mem_notify_discard_range_single_cb(const VirtIOMEM *vmem,
+                                                     void *arg,
+                                                     uint64_t offset,
+                                                     uint64_t size)
+{
+    RamDiscardListener *rdl = arg;
+
+    rdl->notify_discard(rdl, &vmem->memdev->mr, offset, size);
+    return 0;
+}
+
+static void virtio_mem_rdm_register_listener(RamDiscardMgr *rdm,
+                                             const MemoryRegion *mr,
+                                             RamDiscardListener *rdl)
+{
+    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+    int ret;
+
+    g_assert(mr == &vmem->memdev->mr);
+    QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next);
+    ret = virtio_mem_for_each_plugged_range(vmem, rdl,
+                                    virtio_mem_notify_populate_range_single_cb);
+    if (ret) {
+        error_report("%s: Replaying plugged ranges failed: %s", __func__,
+                     strerror(-ret));
+    }
+}
+
+static void virtio_mem_rdm_unregister_listener(RamDiscardMgr *rdm,
+                                               const MemoryRegion *mr,
+                                               RamDiscardListener *rdl)
+{
+    VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+
+    g_assert(mr == &vmem->memdev->mr);
+    if (rdl->notify_discard_all) {
+        rdl->notify_discard_all(rdl, &vmem->memdev->mr);
+    } else {
+        virtio_mem_for_each_plugged_range(vmem, rdl,
+                                     virtio_mem_notify_discard_range_single_cb);
+
+    }
+    QLIST_REMOVE(rdl, next);
+}
+
+static int virtio_mem_rdm_replay_populated(const RamDiscardMgr *rdm,
+                                           const MemoryRegion *mr,
+                                           RamDiscardListener *rdl)
+{
+    const VirtIOMEM *vmem = VIRTIO_MEM(rdm);
+
+    g_assert(mr == &vmem->memdev->mr);
+    return virtio_mem_for_each_plugged_range(vmem, rdl,
+                                    virtio_mem_notify_populate_range_single_cb);
+}
+
 static void virtio_mem_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
     VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass);
+    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_CLASS(klass);
 
     device_class_set_props(dc, virtio_mem_properties);
     dc->vmsd = &vmstate_virtio_mem;
@@ -960,6 +1197,12 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data)
     vmc->get_memory_region = virtio_mem_get_memory_region;
     vmc->add_size_change_notifier = virtio_mem_add_size_change_notifier;
     vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier;
+
+    rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity;
+    rdmc->is_populated = virtio_mem_rdm_is_populated;
+    rdmc->register_listener = virtio_mem_rdm_register_listener;
+    rdmc->unregister_listener = virtio_mem_rdm_unregister_listener;
+    rdmc->replay_populated = virtio_mem_rdm_replay_populated;
 }
 
 static const TypeInfo virtio_mem_info = {
@@ -969,6 +1212,10 @@ static const TypeInfo virtio_mem_info = {
     .instance_init = virtio_mem_instance_init,
     .class_init = virtio_mem_class_init,
     .class_size = sizeof(VirtIOMEMClass),
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_RAM_DISCARD_MGR },
+        { }
+    },
 };
 
 static void virtio_register_types(void)
diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h
index 4eeb82d5dd..9a6e348fa2 100644
--- a/include/hw/virtio/virtio-mem.h
+++ b/include/hw/virtio/virtio-mem.h
@@ -67,6 +67,9 @@ struct VirtIOMEM {
 
     /* don't migrate unplugged memory */
     NotifierWithReturn precopy_notifier;
+
+    /* listeners to notify on plug/unplug activity. */
+    QLIST_HEAD(, RamDiscardListener) rdl_list;
 };
 
 struct VirtIOMEMClass {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (3 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 04/12] virtio-mem: Implement RamDiscardMgr interface David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 13:20   ` Paolo Bonzini
  2021-02-22 11:57 ` [PATCH v6 06/12] vfio: Query and store the maximum number of possible DMA mappings David Hildenbrand
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

Implement support for RamDiscardMgr, to prepare for virtio-mem
support. Instead of mapping the whole memory section, we only map
"populated" parts and update the mapping when notified about
discarding/population of memory via the RamDiscardListener. Similarly, when
syncing the dirty bitmaps, sync only the actually mapped (populated) parts
by replaying via the notifier.

Using virtio-mem with vfio is still blocked via
ram_block_discard_disable()/ram_block_discard_require() after this patch.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/vfio/common.c              | 203 ++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio-common.h |  12 ++
 2 files changed, 215 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6ff1daa763..f68370de6c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -654,6 +654,139 @@ out:
     rcu_read_unlock();
 }
 
+static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
+                                            const MemoryRegion *mr,
+                                            ram_addr_t offset, ram_addr_t size)
+{
+    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
+                                                listener);
+    const hwaddr mr_start = MAX(offset, vrdl->offset_within_region);
+    const hwaddr mr_end = MIN(offset + size,
+                              vrdl->offset_within_region + vrdl->size);
+    const hwaddr iova = mr_start - vrdl->offset_within_region +
+                        vrdl->offset_within_address_space;
+    int ret;
+
+    if (mr_start >= mr_end) {
+        return;
+    }
+
+    /* Unmap with a single call. */
+    ret = vfio_dma_unmap(vrdl->container, iova, mr_end - mr_start, NULL);
+    if (ret) {
+        error_report("%s: vfio_dma_unmap() failed: %s", __func__,
+                     strerror(-ret));
+    }
+}
+
+static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+                                            const MemoryRegion *mr,
+                                            ram_addr_t offset, ram_addr_t size)
+{
+    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
+                                                listener);
+    const hwaddr mr_end = MIN(offset + size,
+                              vrdl->offset_within_region + vrdl->size);
+    hwaddr mr_start = MAX(offset, vrdl->offset_within_region);
+    hwaddr mr_next, iova;
+    void *vaddr;
+    int ret;
+
+    /*
+     * Map in (aligned within memory region) minimum granularity, so we can
+     * unmap in minimum granularity later.
+     */
+    for (; mr_start < mr_end; mr_start = mr_next) {
+        mr_next = ROUND_UP(mr_start + 1, vrdl->granularity);
+        mr_next = MIN(mr_next, mr_end);
+
+        iova = mr_start - vrdl->offset_within_region +
+               vrdl->offset_within_address_space;
+        vaddr = memory_region_get_ram_ptr(vrdl->mr) + mr_start;
+
+        ret = vfio_dma_map(vrdl->container, iova, mr_next - mr_start,
+                           vaddr, mr->readonly);
+        if (ret) {
+            /* Rollback */
+            vfio_ram_discard_notify_discard(rdl, mr, offset, size);
+            return ret;
+        }
+    }
+    return 0;
+}
+
+static void vfio_ram_discard_notify_discard_all(RamDiscardListener *rdl,
+                                                const MemoryRegion *mr)
+{
+    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
+                                                listener);
+    int ret;
+
+    /* Unmap with a single call. */
+    ret = vfio_dma_unmap(vrdl->container, vrdl->offset_within_address_space,
+                         vrdl->size, NULL);
+    if (ret) {
+        error_report("%s: vfio_dma_unmap() failed: %s", __func__,
+                     strerror(-ret));
+    }
+}
+
+static void vfio_register_ram_discard_notifier(VFIOContainer *container,
+                                               MemoryRegionSection *section)
+{
+    RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
+    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
+    VFIORamDiscardListener *vrdl;
+
+    vrdl = g_new0(VFIORamDiscardListener, 1);
+    vrdl->container = container;
+    vrdl->mr = section->mr;
+    vrdl->offset_within_region = section->offset_within_region;
+    vrdl->offset_within_address_space = section->offset_within_address_space;
+    vrdl->size = int128_get64(section->size);
+    vrdl->granularity = rdmc->get_min_granularity(rdm, section->mr);
+
+    g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
+    g_assert(vrdl->granularity >= 1 << ctz64(container->pgsizes));
+
+    /* Ignore some corner cases not relevant in practice. */
+    g_assert(QEMU_IS_ALIGNED(vrdl->offset_within_region, TARGET_PAGE_SIZE));
+    g_assert(QEMU_IS_ALIGNED(vrdl->offset_within_address_space,
+                             TARGET_PAGE_SIZE));
+    g_assert(QEMU_IS_ALIGNED(vrdl->size, TARGET_PAGE_SIZE));
+
+    ram_discard_listener_init(&vrdl->listener,
+                              vfio_ram_discard_notify_populate,
+                              vfio_ram_discard_notify_discard,
+                              vfio_ram_discard_notify_discard_all);
+    rdmc->register_listener(rdm, section->mr, &vrdl->listener);
+    QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next);
+}
+
+static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
+                                                 MemoryRegionSection *section)
+{
+    RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
+    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
+    VFIORamDiscardListener *vrdl = NULL;
+
+    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
+        if (vrdl->mr == section->mr &&
+            vrdl->offset_within_region == section->offset_within_region) {
+            break;
+        }
+    }
+
+    if (!vrdl) {
+        hw_error("vfio: Trying to unregister missing RAM discard listener");
+    }
+
+    rdmc->unregister_listener(rdm, section->mr, &vrdl->listener);
+    QLIST_REMOVE(vrdl, next);
+
+    g_free(vrdl);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
@@ -814,6 +947,16 @@ static void vfio_listener_region_add(MemoryListener *listener,
 
     /* Here we assume that memory_region_is_ram(section->mr)==true */
 
+    /*
+     * For RAM memory regions with a RamDiscardMgr, we only want to map the
+     * actually populated parts - and update the mapping whenever we're notified
+     * about changes.
+     */
+    if (memory_region_has_ram_discard_mgr(section->mr)) {
+        vfio_register_ram_discard_notifier(container, section);
+        return;
+    }
+
     vaddr = memory_region_get_ram_ptr(section->mr) +
             section->offset_within_region +
             (iova - section->offset_within_address_space);
@@ -950,6 +1093,10 @@ static void vfio_listener_region_del(MemoryListener *listener,
 
         pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
         try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+    } else if (memory_region_has_ram_discard_mgr(section->mr)) {
+        vfio_unregister_ram_discard_listener(container, section);
+        /* Unregistering will trigger an unmap. */
+        try_unmap = false;
     }
 
     if (try_unmap) {
@@ -1077,6 +1224,59 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     rcu_read_unlock();
 }
 
+static int vfio_ram_discard_notify_dirty_bitmap(RamDiscardListener *rdl,
+                                                const MemoryRegion *mr,
+                                                ram_addr_t offset,
+                                                ram_addr_t size)
+{
+    VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
+                                                listener);
+    const hwaddr mr_start = MAX(offset, vrdl->offset_within_region);
+    const hwaddr mr_end = MIN(offset + size,
+                              vrdl->offset_within_region + vrdl->size);
+    const hwaddr iova = mr_start - vrdl->offset_within_region +
+                        vrdl->offset_within_address_space;
+    ram_addr_t ram_addr;
+    int ret;
+
+    if (mr_start >= mr_end) {
+        return 0;
+    }
+
+    /*
+     * Sync the whole mapped region (spanning multiple individual mappings)
+     * in one go.
+     */
+    ram_addr = memory_region_get_ram_addr(vrdl->mr) + mr_start;
+    ret = vfio_get_dirty_bitmap(vrdl->container, iova, mr_end - mr_start,
+                                ram_addr);
+    return ret;
+}
+
+static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
+                                                   MemoryRegionSection *section)
+{
+    RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
+    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
+    VFIORamDiscardListener tmp_vrdl, *vrdl = NULL;
+
+    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
+        if (vrdl->mr == section->mr &&
+            vrdl->offset_within_region == section->offset_within_region) {
+            break;
+        }
+    }
+
+    if (!vrdl) {
+        hw_error("vfio: Trying to sync missing RAM discard listener");
+    }
+
+    tmp_vrdl = *vrdl;
+    ram_discard_listener_init(&tmp_vrdl.listener,
+                              vfio_ram_discard_notify_dirty_bitmap, NULL, NULL);
+    return rdmc->replay_populated(rdm, section->mr, &tmp_vrdl.listener);
+}
+
 static int vfio_sync_dirty_bitmap(VFIOContainer *container,
                                   MemoryRegionSection *section)
 {
@@ -1108,6 +1308,8 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
             }
         }
         return 0;
+    } else if (memory_region_has_ram_discard_mgr(section->mr)) {
+        return vfio_sync_ram_discard_listener_dirty_bitmap(container, section);
     }
 
     ram_addr = memory_region_get_ram_addr(section->mr) +
@@ -1737,6 +1939,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->dirty_pages_supported = false;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->hostwin_list);
+    QLIST_INIT(&container->vrdl_list);
 
     ret = vfio_init_container(container, group->fd, errp);
     if (ret) {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6141162d7a..af6f8d1b22 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -91,6 +91,7 @@ typedef struct VFIOContainer {
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
+    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     QLIST_ENTRY(VFIOContainer) next;
 } VFIOContainer;
 
@@ -102,6 +103,17 @@ typedef struct VFIOGuestIOMMU {
     QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
+typedef struct VFIORamDiscardListener {
+    VFIOContainer *container;
+    MemoryRegion *mr;
+    hwaddr offset_within_region;
+    hwaddr offset_within_address_space;
+    hwaddr size;
+    uint64_t granularity;
+    RamDiscardListener listener;
+    QLIST_ENTRY(VFIORamDiscardListener) next;
+} VFIORamDiscardListener;
+
 typedef struct VFIOHostDMAWindow {
     hwaddr min_iova;
     hwaddr max_iova;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 06/12] vfio: Query and store the maximum number of possible DMA mappings
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (4 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 07/12] vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr David Hildenbrand
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

Let's query the maximum number of possible DMA mappings by querying the
available mappings when creating the container (before any mappings are
created). We'll use this informaton soon to perform some sanity checks
and warn the user.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/vfio/common.c              | 4 ++++
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f68370de6c..78be813a53 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1937,6 +1937,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->fd = fd;
     container->error = NULL;
     container->dirty_pages_supported = false;
+    container->dma_max_mappings = 0;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->hostwin_list);
     QLIST_INIT(&container->vrdl_list);
@@ -1968,7 +1969,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
         container->pgsizes = info->iova_pgsizes;
 
+        /* The default in the kernel ("dma_entry_limit") is 65535. */
+        container->dma_max_mappings = 65535;
         if (!ret) {
+            vfio_get_info_dma_avail(info, &container->dma_max_mappings);
             vfio_get_iommu_info_migration(container, info);
         }
         g_free(info);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index af6f8d1b22..4b28c6e8ac 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -88,6 +88,7 @@ typedef struct VFIOContainer {
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
+    unsigned int dma_max_mappings;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 07/12] vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (5 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 06/12] vfio: Query and store the maximum number of possible DMA mappings David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 08/12] vfio: Support for RamDiscardMgr in the vIOMMU case David Hildenbrand
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

Although RamDiscardMgr can handle running into the maximum number of
DMA mappings by propagating errors when creating a DMA mapping, we want
to sanity check and warn the user early that there is a theoretical setup
issue and that virtio-mem might not be able to provide as much memory
towards a VM as desired.

As suggested by Alex, let's use the number of KVM memory slots to guess
how many other mappings we might see over time.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/vfio/common.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 78be813a53..166ec6ec62 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -761,6 +761,49 @@ static void vfio_register_ram_discard_notifier(VFIOContainer *container,
                               vfio_ram_discard_notify_discard_all);
     rdmc->register_listener(rdm, section->mr, &vrdl->listener);
     QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next);
+
+    /*
+     * Sanity-check if we have a theoretically problematic setup where we could
+     * exceed the maximum number of possible DMA mappings over time. We assume
+     * that each mapped section in the same address space as a RamDiscardMgr
+     * section consumes exactly one DMA mapping, with the exception of
+     * RamDiscardMgr sections; i.e., we don't expect to have gIOMMU sections in
+     * the same address space as RamDiscardMgr sections.
+     *
+     * We assume that each section in the address space consumes one memslot.
+     * We take the number of KVM memory slots as a best guess for the maximum
+     * number of sections in the address space we could have over time,
+     * also consuming DMA mappings.
+     */
+    if (container->dma_max_mappings) {
+        unsigned int vrdl_count = 0, vrdl_mappings = 0, max_memslots = 512;
+
+#ifdef CONFIG_KVM
+        if (kvm_enabled()) {
+            max_memslots = kvm_get_max_memslots();
+        }
+#endif
+
+        QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
+            hwaddr start, end;
+
+            start = QEMU_ALIGN_DOWN(vrdl->offset_within_address_space,
+                                    vrdl->granularity);
+            end = ROUND_UP(vrdl->offset_within_address_space + vrdl->size,
+                           vrdl->granularity);
+            vrdl_mappings += (end - start) / vrdl->granularity;
+            vrdl_count++;
+        }
+
+        if (vrdl_mappings + max_memslots - vrdl_count >
+            container->dma_max_mappings) {
+            warn_report("%s: possibly running out of DMA mappings. E.g., try"
+                        " increasing the 'block-size' of virtio-mem devies."
+                        " Maximum possible DMA mappings: %d, Maximum possible"
+                        " memslots: %d", __func__, container->dma_max_mappings,
+                        max_memslots);
+        }
+    }
 }
 
 static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 08/12] vfio: Support for RamDiscardMgr in the vIOMMU case
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (6 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 07/12] vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) David Hildenbrand
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

vIOMMU support works already with RamDiscardMgr as long as guests only
map populated memory. Both, populated and discarded memory is mapped
into &address_space_memory, where vfio_get_xlat_addr() will find that
memory, to create the vfio mapping.

Sane guests will never map discarded memory (e.g., unplugged memory
blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while
memory is getting discarded. However, there are two cases where a malicious
guests could trigger pinning of more memory than intended.

One case is easy to handle: the guest trying to map discarded memory
into an IOMMU.

The other case is harder to handle: the guest keeping memory mapped in
the IOMMU while it is getting discarded. We would have to walk over all
mappings when discarding memory and identify if any mapping would be a
violation. Let's keep it simple for now and print a warning, indicating
that setting RLIMIT_MEMLOCK can mitigate such attacks.

We have to take care of incoming migration: at the point the
IOMMUs get restored and start creating mappings in vfio, RamDiscardMgr
implementations might not be back up and running yet: let's add runstate
priorities to enforce the order when restoring.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/vfio/common.c            | 35 +++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-mem.c      |  1 +
 include/migration/vmstate.h |  1 +
 3 files changed, 37 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 166ec6ec62..15ecd05a4b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -36,6 +36,7 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
+#include "sysemu/runstate.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
@@ -574,6 +575,40 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
         error_report("iommu map to non memory area %"HWADDR_PRIx"",
                      xlat);
         return false;
+    } else if (memory_region_has_ram_discard_mgr(mr)) {
+        RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(mr);
+        RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
+
+        /*
+         * Malicious VMs can map memory into the IOMMU, which is expected
+         * to remain discarded. vfio will pin all pages, populating memory.
+         * Disallow that. vmstate priorities make sure any RamDiscardMgr were
+         * already restored before IOMMUs are restored.
+         */
+        if (!rdmc->is_populated(rdm, mr, xlat, len)) {
+            error_report("iommu map to discarded memory (e.g., unplugged via"
+                         " virtio-mem): %"HWADDR_PRIx"",
+                         iotlb->translated_addr);
+            return false;
+        }
+
+        /*
+         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
+         * pages will remain pinned inside vfio until unmapped, resulting in a
+         * higher memory consumption than expected. If memory would get
+         * populated again later, there would be an inconsistency between pages
+         * pinned by vfio and pages seen by QEMU. This is the case until
+         * unmapped from the IOMMU (e.g., during device reset).
+         *
+         * With malicious guests, we really only care about pinning more memory
+         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
+         * exceeded and can be used to mitigate this problem.
+         */
+        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
+                         " RAM (e.g., virtio-mem) works, however, malicious"
+                         " guests can trigger pinning of more memory than"
+                         " intended via an IOMMU. It's possible to mitigate "
+                         " by setting/adjusting RLIMIT_MEMLOCK.");
     }
 
     /*
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 3132e4b2b1..194fb56a9a 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -854,6 +854,7 @@ static const VMStateDescription vmstate_virtio_mem_device = {
     .name = "virtio-mem-device",
     .minimum_version_id = 1,
     .version_id = 1,
+    .priority = MIG_PRI_VIRTIO_MEM,
     .post_load = virtio_mem_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 075ee80096..3bf58ff043 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -153,6 +153,7 @@ typedef enum {
     MIG_PRI_DEFAULT = 0,
     MIG_PRI_IOMMU,              /* Must happen before PCI devices */
     MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
+    MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
     MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
     MIG_PRI_GICV3,              /* Must happen before the ITS */
     MIG_PRI_MAX,
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (7 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 08/12] vfio: Support for RamDiscardMgr in the vIOMMU case David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 13:14   ` Paolo Bonzini
  2021-02-22 11:57 ` [PATCH v6 10/12] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types David Hildenbrand
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Alex Williamson, Peter Xu, Dr . David Alan Gilbert, Auger Eric,
	Pankaj Gupta, teawater, Igor Mammedov, Paolo Bonzini,
	Marek Kedzierski

We have users in migration context that don't hold the BQL (when
finishing migration). To prepare for further changes, use a dedicated mutex
instead of atomic operations. Keep using qatomic_read ("READ_ONCE") for the
functions that only extract the current state (e.g., used by
virtio-balloon), locking isn't necessary.

While at it, split up the counter into two variables to make it easier
to understand.

Suggested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/physmem.c | 70 ++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 19e0aa9836..6550217c26 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3677,56 +3677,64 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
     }
 }
 
-/*
- * If positive, discarding RAM is disabled. If negative, discarding RAM is
- * required to work and cannot be disabled.
- */
-static int ram_block_discard_disabled;
+static unsigned int ram_block_discard_requirers;
+static unsigned int ram_block_discard_disablers;
+static QemuMutex ram_block_discard_disable_mutex;
+
+static void ram_block_discard_disable_mutex_lock(void)
+{
+    static gsize initialized;
+
+    if (g_once_init_enter(&initialized)) {
+        qemu_mutex_init(&ram_block_discard_disable_mutex);
+        g_once_init_leave(&initialized, 1);
+    }
+    qemu_mutex_lock(&ram_block_discard_disable_mutex);
+}
+
+static void ram_block_discard_disable_mutex_unlock(void)
+{
+    qemu_mutex_unlock(&ram_block_discard_disable_mutex);
+}
 
 int ram_block_discard_disable(bool state)
 {
-    int old;
+    int ret = 0;
 
+    ram_block_discard_disable_mutex_lock();
     if (!state) {
-        qatomic_dec(&ram_block_discard_disabled);
-        return 0;
+        ram_block_discard_disablers--;
+    } else if (!ram_block_discard_requirers) {
+        ram_block_discard_disablers++;
+    } else {
+        ret = -EBUSY;
     }
-
-    do {
-        old = qatomic_read(&ram_block_discard_disabled);
-        if (old < 0) {
-            return -EBUSY;
-        }
-    } while (qatomic_cmpxchg(&ram_block_discard_disabled,
-                             old, old + 1) != old);
-    return 0;
+    ram_block_discard_disable_mutex_unlock();
+    return ret;
 }
 
 int ram_block_discard_require(bool state)
 {
-    int old;
+    int ret = 0;
 
+    ram_block_discard_disable_mutex_lock();
     if (!state) {
-        qatomic_inc(&ram_block_discard_disabled);
-        return 0;
+        ram_block_discard_requirers--;
+    } else if (!ram_block_discard_disablers) {
+        ram_block_discard_requirers++;
+    } else {
+        ret = -EBUSY;
     }
-
-    do {
-        old = qatomic_read(&ram_block_discard_disabled);
-        if (old > 0) {
-            return -EBUSY;
-        }
-    } while (qatomic_cmpxchg(&ram_block_discard_disabled,
-                             old, old - 1) != old);
-    return 0;
+    ram_block_discard_disable_mutex_unlock();
+    return ret;
 }
 
 bool ram_block_discard_is_disabled(void)
 {
-    return qatomic_read(&ram_block_discard_disabled) > 0;
+    return qatomic_read(&ram_block_discard_disablers);
 }
 
 bool ram_block_discard_is_required(void)
 {
-    return qatomic_read(&ram_block_discard_disabled) < 0;
+    return qatomic_read(&ram_block_discard_requirers);
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 10/12] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (8 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 11/12] virtio-mem: Require only coordinated discards David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 12/12] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus David Hildenbrand
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Alex Williamson, Peter Xu, Dr . David Alan Gilbert, Auger Eric,
	Pankaj Gupta, teawater, Igor Mammedov, Paolo Bonzini,
	Marek Kedzierski

We want to separate the two cases whereby we discard ram
- uncoordinated: e.g., virito-balloon
- coordinated: e.g., virtio-mem coordinated via the RamDiscardMgr

Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/exec/memory.h | 18 +++++++++++++--
 softmmu/physmem.c     | 54 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6132910767..fa41c1aee8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2818,6 +2818,12 @@ static inline MemOp devend_memop(enum device_endian end)
  */
 int ram_block_discard_disable(bool state);
 
+/*
+ * See ram_block_discard_disable(): only disable uncoordinated discards,
+ * keeping coordinated discards (via the RamDiscardMgr) enabled.
+ */
+int ram_block_uncoordinated_discard_disable(bool state);
+
 /*
  * Inhibit technologies that disable discarding of pages in RAM blocks.
  *
@@ -2827,12 +2833,20 @@ int ram_block_discard_disable(bool state);
 int ram_block_discard_require(bool state);
 
 /*
- * Test if discarding of memory in ram blocks is disabled.
+ * See ram_block_discard_require(): only inhibit technologies that disable
+ * uncoordinated discarding of pages in RAM blocks, allowing co-existance with
+ * technologies that only inhibit uncoordinated discards (via the
+ * RamDiscardMgr).
+ */
+int ram_block_coordinated_discard_require(bool state);
+
+/*
+ * Test if any discarding of memory in ram blocks is disabled.
  */
 bool ram_block_discard_is_disabled(void);
 
 /*
- * Test if discarding of memory in ram blocks is required to work reliably.
+ * Test if any discarding of memory in ram blocks is required to work reliably.
  */
 bool ram_block_discard_is_required(void);
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 6550217c26..19f19ad3a8 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3677,8 +3677,14 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
     }
 }
 
+/* Require any discards to work. */
 static unsigned int ram_block_discard_requirers;
+/* Require only coordinated discards to work. */
+static unsigned int ram_block_coordinated_discard_requirers;
+/* Disable any discards. */
 static unsigned int ram_block_discard_disablers;
+/* Disable only uncoordinated discards. */
+static unsigned int ram_block_uncoordinated_discard_disablers;
 static QemuMutex ram_block_discard_disable_mutex;
 
 static void ram_block_discard_disable_mutex_lock(void)
@@ -3704,10 +3710,27 @@ int ram_block_discard_disable(bool state)
     ram_block_discard_disable_mutex_lock();
     if (!state) {
         ram_block_discard_disablers--;
-    } else if (!ram_block_discard_requirers) {
-        ram_block_discard_disablers++;
+    } else if (ram_block_discard_requirers ||
+               ram_block_coordinated_discard_requirers) {
+        ret = -EBUSY;
     } else {
+        ram_block_discard_disablers++;
+    }
+    ram_block_discard_disable_mutex_unlock();
+    return ret;
+}
+
+int ram_block_uncoordinated_discard_disable(bool state)
+{
+    int ret = 0;
+
+    ram_block_discard_disable_mutex_lock();
+    if (!state) {
+        ram_block_uncoordinated_discard_disablers--;
+    } else if (ram_block_discard_requirers) {
         ret = -EBUSY;
+    } else {
+        ram_block_uncoordinated_discard_disablers++;
     }
     ram_block_discard_disable_mutex_unlock();
     return ret;
@@ -3720,10 +3743,27 @@ int ram_block_discard_require(bool state)
     ram_block_discard_disable_mutex_lock();
     if (!state) {
         ram_block_discard_requirers--;
-    } else if (!ram_block_discard_disablers) {
-        ram_block_discard_requirers++;
+    } else if (ram_block_discard_disablers ||
+               ram_block_uncoordinated_discard_disablers) {
+        ret = -EBUSY;
     } else {
+        ram_block_discard_requirers++;
+    }
+    ram_block_discard_disable_mutex_unlock();
+    return ret;
+}
+
+int ram_block_coordinated_discard_require(bool state)
+{
+    int ret = 0;
+
+    ram_block_discard_disable_mutex_lock();
+    if (!state) {
+        ram_block_coordinated_discard_requirers--;
+    } else if (ram_block_discard_disablers) {
         ret = -EBUSY;
+    } else {
+        ram_block_coordinated_discard_requirers++;
     }
     ram_block_discard_disable_mutex_unlock();
     return ret;
@@ -3731,10 +3771,12 @@ int ram_block_discard_require(bool state)
 
 bool ram_block_discard_is_disabled(void)
 {
-    return qatomic_read(&ram_block_discard_disablers);
+    return qatomic_read(&ram_block_discard_disablers) ||
+           qatomic_read(&ram_block_uncoordinated_discard_disablers);
 }
 
 bool ram_block_discard_is_required(void)
 {
-    return qatomic_read(&ram_block_discard_requirers);
+    return qatomic_read(&ram_block_discard_requirers) ||
+           qatomic_read(&ram_block_coordinated_discard_requirers);
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 11/12] virtio-mem: Require only coordinated discards
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (9 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 10/12] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  2021-02-22 11:57 ` [PATCH v6 12/12] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus David Hildenbrand
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Alex Williamson, Peter Xu, Dr . David Alan Gilbert, Auger Eric,
	Pankaj Gupta, teawater, Igor Mammedov, Paolo Bonzini,
	Marek Kedzierski

We implement the RamDiscardMgr interface and only require coordinated
discarding of RAM to work.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 194fb56a9a..9c36e7f96d 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -676,7 +676,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    if (ram_block_discard_require(true)) {
+    if (ram_block_coordinated_discard_require(true)) {
         error_setg(errp, "Discarding RAM is disabled");
         return;
     }
@@ -684,7 +684,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
     ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
     if (ret) {
         error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
-        ram_block_discard_require(false);
+        ram_block_coordinated_discard_require(false);
         return;
     }
 
@@ -727,7 +727,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
     virtio_del_queue(vdev, 0);
     virtio_cleanup(vdev);
     g_free(vmem->bitmap);
-    ram_block_discard_require(false);
+    ram_block_coordinated_discard_require(false);
 }
 
 static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg,
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 12/12] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus
  2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
                   ` (10 preceding siblings ...)
  2021-02-22 11:57 ` [PATCH v6 11/12] virtio-mem: Require only coordinated discards David Hildenbrand
@ 2021-02-22 11:57 ` David Hildenbrand
  11 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, David Hildenbrand,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Paolo Bonzini, Marek Kedzierski

We support coordinated discarding of RAM using the RamDiscardMgr for
the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards,
keeping uncoordinated discards (e.g., via virtio-balloon) disabled if
possible.

This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://"
by the block layer has to be implemented/unlocked separately. For now,
virtio-mem only supports x86-64; we don't restrict RamDiscardMgr to x86-64,
though: arm64 and s390x are supposed to work as well, and we'll test
once unlocking virtio-mem support. The spapr IOMMUs will need special
care, to be tackled later, e.g.., once supporting virtio-mem.

Note: The block size of a virtio-mem device has to be set to sane sizes,
depending on the maximum hotplug size - to not run out of vfio mappings.
The default virtio-mem block size is usually in the range of a couple of
MBs. The maximum number of mapping is 64k, shared with other users.
Assume you want to hotplug 256GB using virtio-mem - the block size would
have to be set to at least 8 MiB (resulting in 32768 separate mappings).

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/vfio/common.c | 63 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 12 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 15ecd05a4b..d879b8ab92 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -135,6 +135,27 @@ static const char *index_to_str(VFIODevice *vbasedev, int index)
     }
 }
 
+static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
+{
+    switch (container->iommu_type) {
+    case VFIO_TYPE1v2_IOMMU:
+    case VFIO_TYPE1_IOMMU:
+        /* We support coordinated discarding of RAM via the RamDiscardMgr. */
+        return ram_block_uncoordinated_discard_disable(state);
+    default:
+        /*
+         * VFIO_SPAPR_TCE_IOMMU most probably works just fine with
+         * RamDiscardMgr, however, it is completely untested.
+         *
+         * VFIO_SPAPR_TCE_v2_IOMMU with "DMA memory preregistering" does
+         * completely the opposite of managing mapping/pinning dynamically as
+         * required by RamDiscardMgr. We would have to special-case sections
+         * with a RamDiscardMgr.
+         */
+        return ram_block_discard_disable(state);
+    }
+}
+
 int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
                            int action, int fd, Error **errp)
 {
@@ -1979,15 +2000,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
      * new memory, it will not yet set ram_block_discard_set_required() and
      * therefore, neither stops us here or deals with the sudden memory
      * consumption of inflated memory.
+     *
+     * We do support discarding of memory coordinated via the RamDiscardMgr
+     * with some IOMMU types. vfio_ram_block_discard_disable() handles the
+     * details once we know which type of IOMMU we are using.
      */
-    ret = ram_block_discard_disable(true);
-    if (ret) {
-        error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
-        return ret;
-    }
 
     QLIST_FOREACH(container, &space->containers, next) {
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            ret = vfio_ram_block_discard_disable(container, true);
+            if (ret) {
+                error_setg_errno(errp, -ret,
+                                 "Cannot set discarding of RAM broken");
+                if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER,
+                          &container->fd)) {
+                    error_report("vfio: error disconnecting group %d from"
+                                 " container", group->groupid);
+                }
+                return ret;
+            }
             group->container = container;
             QLIST_INSERT_HEAD(&container->group_list, group, container_next);
             vfio_kvm_device_add_group(group);
@@ -2025,6 +2056,12 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         goto free_container_exit;
     }
 
+    ret = vfio_ram_block_discard_disable(container, true);
+    if (ret) {
+        error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
+        goto free_container_exit;
+    }
+
     switch (container->iommu_type) {
     case VFIO_TYPE1v2_IOMMU:
     case VFIO_TYPE1_IOMMU:
@@ -2072,7 +2109,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             if (ret) {
                 error_setg_errno(errp, errno, "failed to enable container");
                 ret = -errno;
-                goto free_container_exit;
+                goto enable_discards_exit;
             }
         } else {
             container->prereg_listener = vfio_prereg_listener;
@@ -2084,7 +2121,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                 ret = -1;
                 error_propagate_prepend(errp, container->error,
                     "RAM memory listener initialization failed: ");
-                goto free_container_exit;
+                goto enable_discards_exit;
             }
         }
 
@@ -2097,7 +2134,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             if (v2) {
                 memory_listener_unregister(&container->prereg_listener);
             }
-            goto free_container_exit;
+            goto enable_discards_exit;
         }
 
         if (v2) {
@@ -2112,7 +2149,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             if (ret) {
                 error_setg_errno(errp, -ret,
                                  "failed to remove existing window");
-                goto free_container_exit;
+                goto enable_discards_exit;
             }
         } else {
             /* The default table uses 4K pages */
@@ -2153,6 +2190,9 @@ listener_release_exit:
     vfio_kvm_device_del_group(group);
     vfio_listener_release(container);
 
+enable_discards_exit:
+    vfio_ram_block_discard_disable(container, false);
+
 free_container_exit:
     g_free(container);
 
@@ -2160,7 +2200,6 @@ close_fd_exit:
     close(fd);
 
 put_space_exit:
-    ram_block_discard_disable(false);
     vfio_put_address_space(space);
 
     return ret;
@@ -2282,7 +2321,7 @@ void vfio_put_group(VFIOGroup *group)
     }
 
     if (!group->ram_block_discard_allowed) {
-        ram_block_discard_disable(false);
+        vfio_ram_block_discard_disable(group->container, false);
     }
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
@@ -2336,7 +2375,7 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 
         if (!group->ram_block_discard_allowed) {
             group->ram_block_discard_allowed = true;
-            ram_block_discard_disable(false);
+            vfio_ram_block_discard_disable(group->container, false);
         }
     }
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 11:57 ` [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) David Hildenbrand
@ 2021-02-22 13:14   ` Paolo Bonzini
  2021-02-22 13:33     ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 13:14 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 12:57, David Hildenbrand wrote:
> 
>  
> -/*
> - * If positive, discarding RAM is disabled. If negative, discarding RAM is
> - * required to work and cannot be disabled.
> - */
> -static int ram_block_discard_disabled;
> +static unsigned int ram_block_discard_requirers;
> +static unsigned int ram_block_discard_disablers;

Requirer is not an English word, so perhaps use required_cnt and 
disabled_cnt?

Also, uncoordinated require is unused, and therefore uncoordinated 
disable is also never going to block anything.  Does it make sense to 
keep it in the API?

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case
  2021-02-22 11:57 ` [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case David Hildenbrand
@ 2021-02-22 13:20   ` Paolo Bonzini
  2021-02-22 14:43     ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 13:20 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 12:57, David Hildenbrand wrote:
> 
> +static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
> +                                                   MemoryRegionSection *section)
> +{
> +    RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
> +    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
> +    VFIORamDiscardListener tmp_vrdl, *vrdl = NULL;
> +
> +    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
> +        if (vrdl->mr == section->mr &&
> +            vrdl->offset_within_region == section->offset_within_region) {
> +            break;
> +        }
> +    }
> +
> +    if (!vrdl) {
> +        hw_error("vfio: Trying to sync missing RAM discard listener");
> +    }
> +
> +    tmp_vrdl = *vrdl;
> +    ram_discard_listener_init(&tmp_vrdl.listener,
> +                              vfio_ram_discard_notify_dirty_bitmap, NULL, NULL);
> +    return rdmc->replay_populated(rdm, section->mr, &tmp_vrdl.listener);
> +}
> +

Can you explain why this is related to the sync_dirty_bitmap call?  This 
needs a comment in vfio_sync_dirty_bitmap.

Also, why can't you just pass vrdl to the call?

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 11:56 ` [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
@ 2021-02-22 13:27   ` Paolo Bonzini
  2021-02-22 14:03     ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 13:27 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 12:56, David Hildenbrand wrote:
> 
> +    /**
> +     * @get_min_granularity:
> +     *
> +     * Get the minimum granularity in which listeners will get notified
> +     * about changes within the #MemoryRegion via the #RamDiscardMgr.
> +     *
> +     * @rdm: the #RamDiscardMgr
> +     * @mr: the #MemoryRegion
> +     *
> +     * Returns the minimum granularity.
> +     */
> +    uint64_t (*get_min_granularity)(const RamDiscardMgr *rdm,
> +                                    const MemoryRegion *mr);
> +
> +    /**
> +     * @is_populated:
> +     *
> +     * Check whether the given range within the #MemoryRegion is completely
> +     * populated (i.e., no parts are currently discarded). There are no
> +     * alignment requirements for the range.
> +     *
> +     * @rdm: the #RamDiscardMgr
> +     * @mr: the #MemoryRegion
> +     * @offset: offset into the #MemoryRegion
> +     * @size: size in the #MemoryRegion
> +     *
> +     * Returns whether the given range is completely populated.
> +     */
> +    bool (*is_populated)(const RamDiscardMgr *rdm, const MemoryRegion *mr,
> +                         ram_addr_t offset, ram_addr_t size);
> +
> +    /**
> +     * @register_listener:
> +     *
> +     * Register a #RamDiscardListener for a #MemoryRegion via the
> +     * #RamDiscardMgr and immediately notify the #RamDiscardListener about all
> +     * populated parts within the #MemoryRegion via the #RamDiscardMgr.
> +     *
> +     * In case any notification fails, no further notifications are triggered
> +     * and an error is logged.
> +     *
> +     * @rdm: the #RamDiscardMgr
> +     * @mr: the #MemoryRegion
> +     * @rdl: the #RamDiscardListener
> +     */
> +    void (*register_listener)(RamDiscardMgr *rdm, const MemoryRegion *mr,
> +                              RamDiscardListener *rdl);
> +
> +    /**
> +     * @unregister_listener:
> +     *
> +     * Unregister a previously registered #RamDiscardListener for a
> +     * #MemoryRegion via the #RamDiscardMgr after notifying the
> +     * #RamDiscardListener about all populated parts becoming unpopulated
> +     * within the #MemoryRegion via the #RamDiscardMgr.
> +     *
> +     * @rdm: the #RamDiscardMgr
> +     * @mr: the #MemoryRegion
> +     * @rdl: the #RamDiscardListener
> +     */
> +    void (*unregister_listener)(RamDiscardMgr *rdm, const MemoryRegion *mr,
> +                                RamDiscardListener *rdl);
> +
> +    /**
> +     * @replay_populated:
> +     *
> +     * Notify the #RamDiscardListener about all populated parts within the
> +     * #MemoryRegion via the #RamDiscardMgr.
> +     *
> +     * In case any notification fails, no further notifications are triggered.
> +     * The listener is not required to be registered.
> +     *
> +     * @rdm: the #RamDiscardMgr
> +     * @mr: the #MemoryRegion
> +     * @rdl: the #RamDiscardListener
> +     *
> +     * Returns 0 on success, or a negative error if any notification failed.
> +     */
> +    int (*replay_populated)(const RamDiscardMgr *rdm, const MemoryRegion *mr,
> +                            RamDiscardListener *rdl);

If this function is only going to use a single callback, just pass it 
(together with a void *opaque) as the argument.
> +};
> +
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>  typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>  
> @@ -487,6 +683,7 @@ struct MemoryRegion {
>      const char *name;
>      unsigned ioeventfd_nb;
>      MemoryRegionIoeventfd *ioeventfds;
> +    RamDiscardMgr *rdm; /* Only for RAM */
>  };


The idea of sending discard notifications is obviously good.  I have a 
couple of questions on the design that you used for the interface; I'm 
not saying that it should be done differently, I would only like to 
understand the trade offs that you chose:

1) can the RamDiscardManager (no abbreviations :)) be just the owner of 
the memory region (obj->parent)?  Alternatively, if you want to make it 
separate from the owner, does it make sense for it to be a separate 
reusable object, sitting between virtio-mem and the MemoryRegion, so 
that the implementation can be reused?

2) why have the new RamDiscardListener instead of just extending 
MemoryListener? Conveniently, vfio already has a MemoryListener that can 
be extended, and you wouldn't need the list of RamDiscardListeners. 
There is already a precedent of replaying the current state when a 
listener is added (see listener_add_address_space), so this is not 
something different between ML and RDL.

Also, if you add a new interface, you should have "method call" wrappers 
similar to user_creatable_complete or user_creatable_can_be_deleted.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 13:14   ` Paolo Bonzini
@ 2021-02-22 13:33     ` David Hildenbrand
  2021-02-22 14:02       ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 13:33 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 14:14, Paolo Bonzini wrote:
> On 22/02/21 12:57, David Hildenbrand wrote:
>>
>>   
>> -/*
>> - * If positive, discarding RAM is disabled. If negative, discarding RAM is
>> - * required to work and cannot be disabled.
>> - */
>> -static int ram_block_discard_disabled;
>> +static unsigned int ram_block_discard_requirers;
>> +static unsigned int ram_block_discard_disablers;
> 
> Requirer is not an English word, so perhaps use required_cnt and
> disabled_cnt?

I did a internet search back then and was not completely sure if it's 
okay. See

https://en.wiktionary.org/wiki/requirer

for example (not 100% trustworthy of course).

No strong opinion on the name (required_cnt/disabled_cnt is clearer).

> 
> Also, uncoordinated require is unused, and therefore uncoordinated
> disable is also never going to block anything.  Does it make sense to
> keep it in the API?

Right, "ram_block_discard_require()" is not used yet. I am planning on 
using it in virtio-balloon context at some point, but can remove it for 
now to simplify.

ram_block_uncoordinated_discard_disable(), however, will block 
virtio-balloon already via ram_block_discard_is_disabled(). (yes, 
virtio-balloon is ugly)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 13:33     ` David Hildenbrand
@ 2021-02-22 14:02       ` Paolo Bonzini
  2021-02-22 15:38         ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 14:02 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 14:33, David Hildenbrand wrote:
>> Also, uncoordinated require is unused, and therefore uncoordinated
>> disable is also never going to block anything.  Does it make sense to
>> keep it in the API?
> 
> Right, "ram_block_discard_require()" is not used yet. I am planning on 
> using it in virtio-balloon context at some point, but can remove it for 
> now to simplify.
> 
> ram_block_uncoordinated_discard_disable(), however, will block 
> virtio-balloon already via ram_block_discard_is_disabled(). (yes, 
> virtio-balloon is ugly)

Oops, I missed that API.

Does it make sense to turn the API inside out, with the 
coordinated/uncoordinated choice as an argument and the start/finish 
choice in the name?

enum {
     RAM_DISCARD_ALLOW_COORDINATED = 1,
};

bool ram_discard_disable(int flags, Error **errp);
void ram_discard_enable(int flags);
int ram_discard_start(bool coordinated, Error **errp);
void ram_discard_finish(bool coordinated);

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 13:27   ` Paolo Bonzini
@ 2021-02-22 14:03     ` David Hildenbrand
  2021-02-22 14:18       ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 14:03 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski


>> +    /**
>> +     * @replay_populated:
>> +     *
>> +     * Notify the #RamDiscardListener about all populated parts within the
>> +     * #MemoryRegion via the #RamDiscardMgr.
>> +     *
>> +     * In case any notification fails, no further notifications are triggered.
>> +     * The listener is not required to be registered.
>> +     *
>> +     * @rdm: the #RamDiscardMgr
>> +     * @mr: the #MemoryRegion
>> +     * @rdl: the #RamDiscardListener
>> +     *
>> +     * Returns 0 on success, or a negative error if any notification failed.
>> +     */
>> +    int (*replay_populated)(const RamDiscardMgr *rdm, const MemoryRegion *mr,
>> +                            RamDiscardListener *rdl);
> 
> If this function is only going to use a single callback, just pass it
> (together with a void *opaque) as the argument.
>> +};
>> +
>>   typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>>   typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>>   
>> @@ -487,6 +683,7 @@ struct MemoryRegion {
>>       const char *name;
>>       unsigned ioeventfd_nb;
>>       MemoryRegionIoeventfd *ioeventfds;
>> +    RamDiscardMgr *rdm; /* Only for RAM */
>>   };
> 
> 
> The idea of sending discard notifications is obviously good.  I have a
> couple of questions on the design that you used for the interface; I'm
> not saying that it should be done differently, I would only like to
> understand the trade offs that you chose:

Sure!

> 
> 1) can the RamDiscardManager (no abbreviations :)) be just the owner of

I used to call it "SparseRamManager", but wanted to stress the semantics 
- and can use RamDiscardManager ;) . Suggestions welcome.

> the memory region (obj->parent)?  Alternatively, if you want to make it
> separate from the owner, does it make sense for it to be a separate
> reusable object, sitting between virtio-mem and the MemoryRegion, so
> that the implementation can be reused?

virtio-mem consumes a memory backend (e.g., memory-backend-ram). That 
one logically "ownes" the memory region (and thereby the RAMBlock).

The memory backend gets assigned to virtio-mem. At that point, 
virtio-mem "owns" the memory backend. It will set itself as the 
RamDiscardsManager before mapping the memory region into system address 
space (whereby it will get exposed to the system).

This flow made sense to me. Regarding "reusable object" - I think the 
only stuff we could fit in there would be e.g., maintaining the lists of 
notifiers. I'd rather wait until we actually have a second user to see 
what we can factor out.

If you have any suggestion/preference, please let me know.

> 
> 2) why have the new RamDiscardListener instead of just extending
> MemoryListener? Conveniently, vfio already has a MemoryListener that can

It behaves more like the IOMMU notifier in that regard.

> be extended, and you wouldn't need the list of RamDiscardListeners.
> There is already a precedent of replaying the current state when a
> listener is added (see listener_add_address_space), so this is not
> something different between ML and RDL.

The main motivation is to let listener decide how it wants to handle the 
memory region. For example, for vhost, vdpa, kvm, ... I only want a 
single region, not separate ones for each and every populated range, 
punching out discarded ranges. Note that there are cases (i.e., 
anonymous memory), where it's even valid for the guest to read discarded 
memory.

Special cases are only required in corner cases, namely whenever we 
unconditionally:

a) Read memory inside a memory region. (e.g., guest-memory-dump)
b) Write memory inside a memory region. (e.g., TPM, migration)
c) Populate memory inside a memory region. (e.g., vfio)

> 
> Also, if you add a new interface, you should have "method call" wrappers
> similar to user_creatable_complete or user_creatable_can_be_deleted.

I think I had those at some point but decided to drop them. Can readd them.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 14:03     ` David Hildenbrand
@ 2021-02-22 14:18       ` Paolo Bonzini
  2021-02-22 14:53         ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 14:18 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 15:03, David Hildenbrand wrote:
> 
>>> +    /**
>>> +     * @replay_populated:
>>> +     *
>>> +     * Notify the #RamDiscardListener about all populated parts 
>>> within the
>>> +     * #MemoryRegion via the #RamDiscardMgr.
>>> +     *
>>> +     * In case any notification fails, no further notifications are 
>>> triggered.
>>> +     * The listener is not required to be registered.
>>> +     *
>>> +     * @rdm: the #RamDiscardMgr
>>> +     * @mr: the #MemoryRegion
>>> +     * @rdl: the #RamDiscardListener
>>> +     *
>>> +     * Returns 0 on success, or a negative error if any notification 
>>> failed.
>>> +     */
>>> +    int (*replay_populated)(const RamDiscardMgr *rdm, const 
>>> MemoryRegion *mr,
>>> +                            RamDiscardListener *rdl);
>>
>> If this function is only going to use a single callback, just pass it
>> (together with a void *opaque) as the argument.
>>> +};
>>> +
>>>   typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>>>   typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>>> @@ -487,6 +683,7 @@ struct MemoryRegion {
>>>       const char *name;
>>>       unsigned ioeventfd_nb;
>>>       MemoryRegionIoeventfd *ioeventfds;
>>> +    RamDiscardMgr *rdm; /* Only for RAM */
>>>   };
>>
>>
>> The idea of sending discard notifications is obviously good.  I have a
>> couple of questions on the design that you used for the interface; I'm
>> not saying that it should be done differently, I would only like to
>> understand the trade offs that you chose:
> 
> Sure!
> 
>>
>> 1) can the RamDiscardManager (no abbreviations :)) be just the owner of
> 
> I used to call it "SparseRamManager", but wanted to stress the semantics 
> - and can use RamDiscardManager ;) . Suggestions welcome.
> 
>> the memory region (obj->parent)?  Alternatively, if you want to make it
>> separate from the owner, does it make sense for it to be a separate
>> reusable object, sitting between virtio-mem and the MemoryRegion, so
>> that the implementation can be reused?
> 
> virtio-mem consumes a memory backend (e.g., memory-backend-ram). That 
> one logically "ownes" the memory region (and thereby the RAMBlock).
> 
> The memory backend gets assigned to virtio-mem. At that point, 
> virtio-mem "owns" the memory backend. It will set itself as the 
> RamDiscardsManager before mapping the memory region into system address 
> space (whereby it will get exposed to the system).
> 
> This flow made sense to me. Regarding "reusable object" - I think the 
> only stuff we could fit in there would be e.g., maintaining the lists of 
> notifiers. I'd rather wait until we actually have a second user to see 
> what we can factor out.
> 
> If you have any suggestion/preference, please let me know.
> 
>>
>> 2) why have the new RamDiscardListener instead of just extending
>> MemoryListener? Conveniently, vfio already has a MemoryListener that can
> 
> It behaves more like the IOMMU notifier in that regard.

Yes, but does it behave more like the IOMMU notifier in other regards? 
:)  The IOMMU notifier is concerned with an iova concept that doesn't 
exist at the MemoryRegion level, while RamDiscardListener works at the 
(MemoryRegion, offset) level that can easily be represented by a 
MemoryRegionSection.  Using MemoryRegionSection might even simplify the 
listener code.

>> be extended, and you wouldn't need the list of RamDiscardListeners.
>> There is already a precedent of replaying the current state when a
>> listener is added (see listener_add_address_space), so this is not
>> something different between ML and RDL.
> 
> The main motivation is to let listener decide how it wants to handle the 
> memory region. For example, for vhost, vdpa, kvm, ... I only want a 
> single region, not separate ones for each and every populated range, 
> punching out discarded ranges. Note that there are cases (i.e., 
> anonymous memory), where it's even valid for the guest to read discarded 
> memory.

Yes, I agree with that.  You would still have the same 
region-add/region_nop/region_del callbacks for KVM and friends; on top 
of that you would have region_populate/region_discard callbacks for VFIO.

Populated regions would be replayed after region_add, while I don't 
think it makes sense to have a region_discard_all callback before 
region_discard.

Paolo

> Special cases are only required in corner cases, namely whenever we 
> unconditionally:
> 
> a) Read memory inside a memory region. (e.g., guest-memory-dump)
> b) Write memory inside a memory region. (e.g., TPM, migration)
> c) Populate memory inside a memory region. (e.g., vfio)
> 
>>
>> Also, if you add a new interface, you should have "method call" wrappers
>> similar to user_creatable_complete or user_creatable_can_be_deleted.
> 
> I think I had those at some point but decided to drop them. Can readd them.
> 
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case
  2021-02-22 13:20   ` Paolo Bonzini
@ 2021-02-22 14:43     ` David Hildenbrand
  2021-02-22 17:29       ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 14:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 14:20, Paolo Bonzini wrote:
> On 22/02/21 12:57, David Hildenbrand wrote:
>>
>> +static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
>> +                                                   MemoryRegionSection *section)
>> +{
>> +    RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
>> +    RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
>> +    VFIORamDiscardListener tmp_vrdl, *vrdl = NULL;
>> +
>> +    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
>> +        if (vrdl->mr == section->mr &&
>> +            vrdl->offset_within_region == section->offset_within_region) {
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!vrdl) {
>> +        hw_error("vfio: Trying to sync missing RAM discard listener");
>> +    }
>> +
>> +    tmp_vrdl = *vrdl;
>> +    ram_discard_listener_init(&tmp_vrdl.listener,
>> +                              vfio_ram_discard_notify_dirty_bitmap, NULL, NULL);
>> +    return rdmc->replay_populated(rdm, section->mr, &tmp_vrdl.listener);
>> +}
>> +
> 
> Can you explain why this is related to the sync_dirty_bitmap call?  This
> needs a comment in vfio_sync_dirty_bitmap.

We can only synchronize the parts that actually got mapped via VFIO. So 
I have to walk all parts that are populated (and thus, were mapped via 
VFIO). This is similar to the IOMMU notifier handling.

> 
> Also, why can't you just pass vrdl to the call?

I have to modify the callbacks. Similarly done for 
memory_region_iommu_replay().

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 14:18       ` Paolo Bonzini
@ 2021-02-22 14:53         ` David Hildenbrand
  2021-02-22 17:37           ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 14:53 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 15:18, Paolo Bonzini wrote:
> On 22/02/21 15:03, David Hildenbrand wrote:
>>
>>>> +    /**
>>>> +     * @replay_populated:
>>>> +     *
>>>> +     * Notify the #RamDiscardListener about all populated parts
>>>> within the
>>>> +     * #MemoryRegion via the #RamDiscardMgr.
>>>> +     *
>>>> +     * In case any notification fails, no further notifications are
>>>> triggered.
>>>> +     * The listener is not required to be registered.
>>>> +     *
>>>> +     * @rdm: the #RamDiscardMgr
>>>> +     * @mr: the #MemoryRegion
>>>> +     * @rdl: the #RamDiscardListener
>>>> +     *
>>>> +     * Returns 0 on success, or a negative error if any notification
>>>> failed.
>>>> +     */
>>>> +    int (*replay_populated)(const RamDiscardMgr *rdm, const
>>>> MemoryRegion *mr,
>>>> +                            RamDiscardListener *rdl);
>>>
>>> If this function is only going to use a single callback, just pass it
>>> (together with a void *opaque) as the argument.
>>>> +};
>>>> +
>>>>    typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>>>>    typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
>>>> @@ -487,6 +683,7 @@ struct MemoryRegion {
>>>>        const char *name;
>>>>        unsigned ioeventfd_nb;
>>>>        MemoryRegionIoeventfd *ioeventfds;
>>>> +    RamDiscardMgr *rdm; /* Only for RAM */
>>>>    };
>>>
>>>
>>> The idea of sending discard notifications is obviously good.  I have a
>>> couple of questions on the design that you used for the interface; I'm
>>> not saying that it should be done differently, I would only like to
>>> understand the trade offs that you chose:
>>
>> Sure!
>>
>>>
>>> 1) can the RamDiscardManager (no abbreviations :)) be just the owner of
>>
>> I used to call it "SparseRamManager", but wanted to stress the semantics
>> - and can use RamDiscardManager ;) . Suggestions welcome.
>>
>>> the memory region (obj->parent)?  Alternatively, if you want to make it
>>> separate from the owner, does it make sense for it to be a separate
>>> reusable object, sitting between virtio-mem and the MemoryRegion, so
>>> that the implementation can be reused?
>>
>> virtio-mem consumes a memory backend (e.g., memory-backend-ram). That
>> one logically "ownes" the memory region (and thereby the RAMBlock).
>>
>> The memory backend gets assigned to virtio-mem. At that point,
>> virtio-mem "owns" the memory backend. It will set itself as the
>> RamDiscardsManager before mapping the memory region into system address
>> space (whereby it will get exposed to the system).
>>
>> This flow made sense to me. Regarding "reusable object" - I think the
>> only stuff we could fit in there would be e.g., maintaining the lists of
>> notifiers. I'd rather wait until we actually have a second user to see
>> what we can factor out.
>>
>> If you have any suggestion/preference, please let me know.
>>
>>>
>>> 2) why have the new RamDiscardListener instead of just extending
>>> MemoryListener? Conveniently, vfio already has a MemoryListener that can
>>
>> It behaves more like the IOMMU notifier in that regard.
> 
> Yes, but does it behave more like the IOMMU notifier in other regards?
> :)  The IOMMU notifier is concerned with an iova concept that doesn't
> exist at the MemoryRegion level, while RamDiscardListener works at the
> (MemoryRegion, offset) level that can easily be represented by a
> MemoryRegionSection.  Using MemoryRegionSection might even simplify the
> listener code.

It's similarly concerned with rather small, lightweight updates I would say.

> 
>>> be extended, and you wouldn't need the list of RamDiscardListeners.
>>> There is already a precedent of replaying the current state when a
>>> listener is added (see listener_add_address_space), so this is not
>>> something different between ML and RDL.
>>
>> The main motivation is to let listener decide how it wants to handle the
>> memory region. For example, for vhost, vdpa, kvm, ... I only want a
>> single region, not separate ones for each and every populated range,
>> punching out discarded ranges. Note that there are cases (i.e.,
>> anonymous memory), where it's even valid for the guest to read discarded
>> memory.
> 
> Yes, I agree with that.  You would still have the same
> region-add/region_nop/region_del callbacks for KVM and friends; on top
> of that you would have region_populate/region_discard callbacks for VFIO.

I see roughly how it could work, however, I am not sure yet if this is 
the right approach.

I think instead of region_populate/region_discard we would want 
individual region_add/region_del when populating/discarding for all 
MemoryListeners that opt-in somehow (e.g., VFIO, dump-guest-memory, 
...). Similarly, we would want to call log_sync()/log_clear() then only 
for these parts.

But what happens when I populate/discard some memory? I don't want to 
trigger an address space transaction (begin()...region_nop()...commit()) 
- whenever I populate/discard memory (e.g., in 2 MB granularity). 
Especially not, if nothing might have changed for most other 
MemoryListeners.

> 
> Populated regions would be replayed after region_add, while I don't
> think it makes sense to have a region_discard_all callback before
> region_discard.

How would we handle vfio_listerner_log_sync()vfio_sync_dirty_bitmap()?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 14:02       ` Paolo Bonzini
@ 2021-02-22 15:38         ` David Hildenbrand
  2021-02-22 17:32           ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 15:38 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 15:02, Paolo Bonzini wrote:
> On 22/02/21 14:33, David Hildenbrand wrote:
>>> Also, uncoordinated require is unused, and therefore uncoordinated
>>> disable is also never going to block anything.  Does it make sense to
>>> keep it in the API?
>>
>> Right, "ram_block_discard_require()" is not used yet. I am planning on
>> using it in virtio-balloon context at some point, but can remove it for
>> now to simplify.
>>
>> ram_block_uncoordinated_discard_disable(), however, will block
>> virtio-balloon already via ram_block_discard_is_disabled(). (yes,
>> virtio-balloon is ugly)
> 
> Oops, I missed that API.
> 
> Does it make sense to turn the API inside out, with the
> coordinated/uncoordinated choice as an argument and the start/finish
> choice in the name?
> 
> enum {
>       RAM_DISCARD_ALLOW_COORDINATED = 1,
> };
> 

Any reason to go with an enum/flags for this case and not "bool 
allow_coordinated" ?

> bool ram_discard_disable(int flags, Error **errp);
> void ram_discard_enable(int flags);
> int ram_discard_start(bool coordinated, Error **errp);
> void ram_discard_finish(bool coordinated);

Yeah, I tried to avoid boolean flags ;) Don't have a strong opinion. At 
least we get shorter names.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case
  2021-02-22 14:43     ` David Hildenbrand
@ 2021-02-22 17:29       ` Paolo Bonzini
  2021-02-22 17:34         ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 17:29 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 15:43, David Hildenbrand wrote:
>> Also, why can't you just pass vrdl to the call?
> 
> I have to modify the callbacks. Similarly done for 
> memory_region_iommu_replay().

Since replay_populate only uses a single callback, that suggests using a 
function pointer instead of the RamDiscardListener*.

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 15:38         ` David Hildenbrand
@ 2021-02-22 17:32           ` Paolo Bonzini
  2021-02-23  9:02             ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 17:32 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 16:38, David Hildenbrand wrote:
> On 22.02.21 15:02, Paolo Bonzini wrote:
>> On 22/02/21 14:33, David Hildenbrand wrote:
>>>> Also, uncoordinated require is unused, and therefore uncoordinated
>>>> disable is also never going to block anything.  Does it make sense to
>>>> keep it in the API?
>>>
>>> Right, "ram_block_discard_require()" is not used yet. I am planning on
>>> using it in virtio-balloon context at some point, but can remove it for
>>> now to simplify.
>>>
>>> ram_block_uncoordinated_discard_disable(), however, will block
>>> virtio-balloon already via ram_block_discard_is_disabled(). (yes,
>>> virtio-balloon is ugly)
>>
>> Oops, I missed that API.
>>
>> Does it make sense to turn the API inside out, with the
>> coordinated/uncoordinated choice as an argument and the start/finish
>> choice in the name?
>>
>> enum {
>>       RAM_DISCARD_ALLOW_COORDINATED = 1,
>> };
>>
> 
> Any reason to go with an enum/flags for this case and not "bool 
> allow_coordinated" ?

I find it slightly easier to remember the meaning of true for "bool 
coordinated" than for "bool allow_coordinated".  I don't like the API 
below that much, but having both RAM_DISCARD_ALLOW_COORDINATED for 
disable/enable and RAM_DISCARD_SUPPORT_COORDINATED for start/finish 
would be even uglier...

Paolo

>> bool ram_discard_disable(int flags, Error **errp);
>> void ram_discard_enable(int flags);
>> int ram_discard_start(bool coordinated, Error **errp);
>> void ram_discard_finish(bool coordinated);
> 
> Yeah, I tried to avoid boolean flags ;) Don't have a strong opinion. At 
> least we get shorter names.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case
  2021-02-22 17:29       ` Paolo Bonzini
@ 2021-02-22 17:34         ` David Hildenbrand
  0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 17:34 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Alex Williamson,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 18:29, Paolo Bonzini wrote:
> On 22/02/21 15:43, David Hildenbrand wrote:
>>> Also, why can't you just pass vrdl to the call?
>>
>> I have to modify the callbacks. Similarly done for
>> memory_region_iommu_replay().
> 
> Since replay_populate only uses a single callback, that suggests using a
> function pointer instead of the RamDiscardListener*.

Yep, already changed that. Thanks!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 14:53         ` David Hildenbrand
@ 2021-02-22 17:37           ` Paolo Bonzini
  2021-02-22 17:48             ` David Hildenbrand
  2021-02-22 19:43             ` David Hildenbrand
  0 siblings, 2 replies; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-22 17:37 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22/02/21 15:53, David Hildenbrand wrote:
>> Yes, but does it behave more like the IOMMU notifier in other regards?
>> :)  The IOMMU notifier is concerned with an iova concept that doesn't
>> exist at the MemoryRegion level, while RamDiscardListener works at the
>> (MemoryRegion, offset) level that can easily be represented by a
>> MemoryRegionSection.  Using MemoryRegionSection might even simplify the
>> listener code.
> 
> It's similarly concerned with rather small, lightweight updates I would 
> say.

Why does that matter?  I think if it's concerned with the MemoryRegion 
address space it should use MemoryListener and MemoryRegionSection.

>>> The main motivation is to let listener decide how it wants to handle the
>>> memory region. For example, for vhost, vdpa, kvm, ... I only want a
>>> single region, not separate ones for each and every populated range,
>>> punching out discarded ranges. Note that there are cases (i.e.,
>>> anonymous memory), where it's even valid for the guest to read discarded
>>> memory.
>>
>> Yes, I agree with that.  You would still have the same
>> region-add/region_nop/region_del callbacks for KVM and friends; on top
>> of that you would have region_populate/region_discard callbacks for VFIO.
> 
> I think instead of region_populate/region_discard we would want 
> individual region_add/region_del when populating/discarding for all 
> MemoryListeners that opt-in somehow (e.g., VFIO, dump-guest-memory, 
> ...). Similarly, we would want to call log_sync()/log_clear() then only 
> for these parts.
> 
> But what happens when I populate/discard some memory? I don't want to 
> trigger an address space transaction (begin()...region_nop()...commit()) 
> - whenever I populate/discard memory (e.g., in 2 MB granularity). 
> Especially not, if nothing might have changed for most other 
> MemoryListeners.

Right, that was the reason why I was suggesting different callbacks. 
For the VFIO listener, which doesn't have begin or commit callbacks, I 
think you could just rename region_add to region_populate, and point 
both region_del and region_discard to the existing region_del commit.

Calling log_sync/log_clear only for populated parts also makes sense. 
log_sync and log_clear do not have to be within begin/commit, so you can 
change the semantics to call them more than once.



Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 17:37           ` Paolo Bonzini
@ 2021-02-22 17:48             ` David Hildenbrand
  2021-02-22 19:43             ` David Hildenbrand
  1 sibling, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 17:48 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 18:37, Paolo Bonzini wrote:
> On 22/02/21 15:53, David Hildenbrand wrote:
>>> Yes, but does it behave more like the IOMMU notifier in other regards?
>>> :)  The IOMMU notifier is concerned with an iova concept that doesn't
>>> exist at the MemoryRegion level, while RamDiscardListener works at the
>>> (MemoryRegion, offset) level that can easily be represented by a
>>> MemoryRegionSection.  Using MemoryRegionSection might even simplify the
>>> listener code.
>>
>> It's similarly concerned with rather small, lightweight updates I would
>> say.
> 
> Why does that matter?  I think if it's concerned with the MemoryRegion
> address space it should use MemoryListener and MemoryRegionSection.
> 
>>>> The main motivation is to let listener decide how it wants to handle the
>>>> memory region. For example, for vhost, vdpa, kvm, ... I only want a
>>>> single region, not separate ones for each and every populated range,
>>>> punching out discarded ranges. Note that there are cases (i.e.,
>>>> anonymous memory), where it's even valid for the guest to read discarded
>>>> memory.
>>>
>>> Yes, I agree with that.  You would still have the same
>>> region-add/region_nop/region_del callbacks for KVM and friends; on top
>>> of that you would have region_populate/region_discard callbacks for VFIO.
>>
>> I think instead of region_populate/region_discard we would want
>> individual region_add/region_del when populating/discarding for all
>> MemoryListeners that opt-in somehow (e.g., VFIO, dump-guest-memory,
>> ...). Similarly, we would want to call log_sync()/log_clear() then only
>> for these parts.
>>
>> But what happens when I populate/discard some memory? I don't want to
>> trigger an address space transaction (begin()...region_nop()...commit())
>> - whenever I populate/discard memory (e.g., in 2 MB granularity).
>> Especially not, if nothing might have changed for most other
>> MemoryListeners.
> 
> Right, that was the reason why I was suggesting different callbacks.
> For the VFIO listener, which doesn't have begin or commit callbacks, I
> think you could just rename region_add to region_populate, and point
> both region_del and region_discard to the existing region_del commit.
> 
> Calling log_sync/log_clear only for populated parts also makes sense.
> log_sync and log_clear do not have to be within begin/commit, so you can
> change the semantics to call them more than once.

I'll prototype to see how it looks/feels. As long as it's moving logic 
out of the VFIO code into the address space update code it could be 
quite alright.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 17:37           ` Paolo Bonzini
  2021-02-22 17:48             ` David Hildenbrand
@ 2021-02-22 19:43             ` David Hildenbrand
  2021-02-23 10:50               ` David Hildenbrand
  1 sibling, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-22 19:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

>>>> The main motivation is to let listener decide how it wants to handle the
>>>> memory region. For example, for vhost, vdpa, kvm, ... I only want a
>>>> single region, not separate ones for each and every populated range,
>>>> punching out discarded ranges. Note that there are cases (i.e.,
>>>> anonymous memory), where it's even valid for the guest to read discarded
>>>> memory.
>>>
>>> Yes, I agree with that.  You would still have the same
>>> region-add/region_nop/region_del callbacks for KVM and friends; on top
>>> of that you would have region_populate/region_discard callbacks for VFIO.
>>
>> I think instead of region_populate/region_discard we would want
>> individual region_add/region_del when populating/discarding for all
>> MemoryListeners that opt-in somehow (e.g., VFIO, dump-guest-memory,
>> ...). Similarly, we would want to call log_sync()/log_clear() then only
>> for these parts.
>>
>> But what happens when I populate/discard some memory? I don't want to
>> trigger an address space transaction (begin()...region_nop()...commit())
>> - whenever I populate/discard memory (e.g., in 2 MB granularity).
>> Especially not, if nothing might have changed for most other
>> MemoryListeners.
> 
> Right, that was the reason why I was suggesting different callbacks.
> For the VFIO listener, which doesn't have begin or commit callbacks, I
> think you could just rename region_add to region_populate, and point
> both region_del and region_discard to the existing region_del commit.
> 
> Calling log_sync/log_clear only for populated parts also makes sense.
> log_sync and log_clear do not have to be within begin/commit, so you can
> change the semantics to call them more than once.

So I looked at the simplest of all cases (discard) and I am not convinced yet
that this is the right approach. I can understand why it looks like this fits
into the MemoryListener, but I am not sure if gives us any real benefits or
makes the code any clearer (I'd even say it's the contrary).


+void memory_region_notify_discard(MemoryRegion *mr, hwaddr offset,
+                                  hwaddr size)
+{
+    hwaddr mr_start, mr_end;
+    MemoryRegionSection mrs;
+    MemoryListener *listener;
+    AddressSpace *as;
+    FlatView *view;
+    FlatRange *fr;
+
+    QTAILQ_FOREACH(listener, &memory_listeners, link) {
+        if (!listener->region_discard) {
+            continue;
+        }
+        as = listener->address_space;
+        view = address_space_get_flatview(as);
+        FOR_EACH_FLAT_RANGE(fr, view) {
+            if (fr->mr != mr) {
+                continue;
+            }
+
+            mrs = section_from_flat_range(fr, view);
+
+            mr_start = MAX(mrs.offset_within_region, offset);
+            mr_end = MIN(offset + size,
+                          mrs.offset_within_region + int128_get64(mrs.size));
+            mr_end = MIN(mr_end, offset + size);
+
+            if (mr_start >= mr_end) {
+                continue;
+            }
+
+            mrs.offset_within_address_space += mr_start -
+                                               mrs.offset_within_region;
+            mrs.offset_within_region = mr_start;
+            mrs.size = int128_make64(mr_end - mr_start);
+            listener->region_discard(listener, &mrs);
+        }
+        flatview_unref(view);
+    }
+}

Maybe I am missing something important. This looks highly inefficient.

1. Although we know the memory region we have to walk over the whole address
    space ... over and over again for each potential listener.

2. Even without any applicable listeners (=> ! VFIO) we loop over all listeners.
    There are ways around that but it doesn't make the code nicer IMHO.

3. In the future I am planning on sending populate/discard events
    without the BQL (in my approach, synchronizing internally against
    register/unregister/populate/discard ...). I don't see an easy way
    to achieve that here. I think we are required to hold the BQL on any
    updates.

memory_region_notify_populate() gets quite ugly when we realize halfway that
we have to revert what we already did by notifying about already populated
pieces ...

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-22 17:32           ` Paolo Bonzini
@ 2021-02-23  9:02             ` David Hildenbrand
  2021-02-23 15:02               ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-23  9:02 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 18:32, Paolo Bonzini wrote:
> On 22/02/21 16:38, David Hildenbrand wrote:
>> On 22.02.21 15:02, Paolo Bonzini wrote:
>>> On 22/02/21 14:33, David Hildenbrand wrote:
>>>>> Also, uncoordinated require is unused, and therefore uncoordinated
>>>>> disable is also never going to block anything.  Does it make sense to
>>>>> keep it in the API?
>>>>
>>>> Right, "ram_block_discard_require()" is not used yet. I am planning on
>>>> using it in virtio-balloon context at some point, but can remove it for
>>>> now to simplify.
>>>>
>>>> ram_block_uncoordinated_discard_disable(), however, will block
>>>> virtio-balloon already via ram_block_discard_is_disabled(). (yes,
>>>> virtio-balloon is ugly)
>>>
>>> Oops, I missed that API.
>>>
>>> Does it make sense to turn the API inside out, with the
>>> coordinated/uncoordinated choice as an argument and the start/finish
>>> choice in the name?
>>>
>>> enum {
>>>        RAM_DISCARD_ALLOW_COORDINATED = 1,
>>> };
>>>
>>
>> Any reason to go with an enum/flags for this case and not "bool
>> allow_coordinated" ?
> 
> I find it slightly easier to remember the meaning of true for "bool
> coordinated" than for "bool allow_coordinated".  I don't like the API
> below that much, but having both RAM_DISCARD_ALLOW_COORDINATED for
> disable/enable and RAM_DISCARD_SUPPORT_COORDINATED for start/finish
> would be even uglier...
> 
> Paolo
> 
>>> bool ram_discard_disable(int flags, Error **errp);
>>> void ram_discard_enable(int flags);
>>> int ram_discard_start(bool coordinated, Error **errp);
>>> void ram_discard_finish(bool coordinated);
>>

So, the new API I propose is:

int ram_block_discard_disable(bool state)
int ram_block_uncoordinated_discard_disable(bool state)
int ram_block_discard_require(bool state)
int ram_block_coordinated_discard_require(bool state);
bool ram_block_discard_is_disabled(void);
bool ram_block_discard_is_required(void);


Some points (because I thought about this API a bit when I came up with it):

1. I'd really like to keep the functionality of 
ram_block_discard_is_disabled() / ram_block_discard_is_required(). I'd 
assume you just didn't include it in your proposal.

2. I prefer the "require" wording over "start/finish". Start/finish 
sounds like it's a temporary thing like a transaction. For example 
"ram_block_discard_is_started()" sounds misleading to me

3. "ram_discard_enable()" sounds a bit misleading to me as well. We're 
not actually enabling anything, we're not disabling it anymore.

4. I don't think returning an "Error **errp" does make a lot of sense here.


Unless there is real need for a major overhaul I'd like to keep it to 
minor changes.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-22 19:43             ` David Hildenbrand
@ 2021-02-23 10:50               ` David Hildenbrand
  2021-02-23 15:03                 ` Paolo Bonzini
  0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand @ 2021-02-23 10:50 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 22.02.21 20:43, David Hildenbrand wrote:
>>>>> The main motivation is to let listener decide how it wants to handle the
>>>>> memory region. For example, for vhost, vdpa, kvm, ... I only want a
>>>>> single region, not separate ones for each and every populated range,
>>>>> punching out discarded ranges. Note that there are cases (i.e.,
>>>>> anonymous memory), where it's even valid for the guest to read discarded
>>>>> memory.
>>>>
>>>> Yes, I agree with that.  You would still have the same
>>>> region-add/region_nop/region_del callbacks for KVM and friends; on top
>>>> of that you would have region_populate/region_discard callbacks for VFIO.
>>>
>>> I think instead of region_populate/region_discard we would want
>>> individual region_add/region_del when populating/discarding for all
>>> MemoryListeners that opt-in somehow (e.g., VFIO, dump-guest-memory,
>>> ...). Similarly, we would want to call log_sync()/log_clear() then only
>>> for these parts.
>>>
>>> But what happens when I populate/discard some memory? I don't want to
>>> trigger an address space transaction (begin()...region_nop()...commit())
>>> - whenever I populate/discard memory (e.g., in 2 MB granularity).
>>> Especially not, if nothing might have changed for most other
>>> MemoryListeners.
>>
>> Right, that was the reason why I was suggesting different callbacks.
>> For the VFIO listener, which doesn't have begin or commit callbacks, I
>> think you could just rename region_add to region_populate, and point
>> both region_del and region_discard to the existing region_del commit.
>>
>> Calling log_sync/log_clear only for populated parts also makes sense.
>> log_sync and log_clear do not have to be within begin/commit, so you can
>> change the semantics to call them more than once.
> 
> So I looked at the simplest of all cases (discard) and I am not convinced yet
> that this is the right approach. I can understand why it looks like this fits
> into the MemoryListener, but I am not sure if gives us any real benefits or
> makes the code any clearer (I'd even say it's the contrary).
> 
> 
> +void memory_region_notify_discard(MemoryRegion *mr, hwaddr offset,
> +                                  hwaddr size)
> +{
> +    hwaddr mr_start, mr_end;
> +    MemoryRegionSection mrs;
> +    MemoryListener *listener;
> +    AddressSpace *as;
> +    FlatView *view;
> +    FlatRange *fr;
> +
> +    QTAILQ_FOREACH(listener, &memory_listeners, link) {
> +        if (!listener->region_discard) {
> +            continue;
> +        }
> +        as = listener->address_space;
> +        view = address_space_get_flatview(as);
> +        FOR_EACH_FLAT_RANGE(fr, view) {
> +            if (fr->mr != mr) {
> +                continue;
> +            }
> +
> +            mrs = section_from_flat_range(fr, view);
> +
> +            mr_start = MAX(mrs.offset_within_region, offset);
> +            mr_end = MIN(offset + size,
> +                          mrs.offset_within_region + int128_get64(mrs.size));
> +            mr_end = MIN(mr_end, offset + size);
> +
> +            if (mr_start >= mr_end) {
> +                continue;
> +            }
> +
> +            mrs.offset_within_address_space += mr_start -
> +                                               mrs.offset_within_region;
> +            mrs.offset_within_region = mr_start;
> +            mrs.size = int128_make64(mr_end - mr_start);
> +            listener->region_discard(listener, &mrs);
> +        }
> +        flatview_unref(view);
> +    }
> +}
> 
> Maybe I am missing something important. This looks highly inefficient.
> 
> 1. Although we know the memory region we have to walk over the whole address
>      space ... over and over again for each potential listener.
> 
> 2. Even without any applicable listeners (=> ! VFIO) we loop over all listeners.
>      There are ways around that but it doesn't make the code nicer IMHO.
> 
> 3. In the future I am planning on sending populate/discard events
>      without the BQL (in my approach, synchronizing internally against
>      register/unregister/populate/discard ...). I don't see an easy way
>      to achieve that here. I think we are required to hold the BQL on any
>      updates.
> 
> memory_region_notify_populate() gets quite ugly when we realize halfway that
> we have to revert what we already did by notifying about already populated
> pieces ...
> 

So, the more I look into it the more I doubt this should go into the 
MemoryListener.

The RamDiscardManager is specific to RAM memory regions - similarly the 
IOMMU notifier is specific to IOMMU regions.

In the near future we will have two "clients" (vfio, 
tpm/dump-guest-memory), whereby only vfio will actually has to register 
for updates at runtime.

I really want to have a dedicated registration/notifier mechanism, for 
reasons already mentioned in my last mail, but also to later reuse that 
mechanism in other context as noted in the cover letter:

"Note: At some point, we might want to let RAMBlock users (esp. vfio 
used for nvme://) consume this interface as well. We'll need RAMBlock 
notifier calls when a RAMBlock is getting mapped/unmapped (via the 
corresponding memory region), so we can properly register a listener 
there as well."


However, I do agree that the notifier should work with 
MemoryRegionSection - this will make the "client" code much easier.

The register()/replay_populate() mechanism should consume a 
MemoryRegionSection to work on, and call the listener via (adjusted) 
MemoryRegionSection.

Maybe I'm even able to simplify/get rid of the discard_all() callback.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require)
  2021-02-23  9:02             ` David Hildenbrand
@ 2021-02-23 15:02               ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-23 15:02 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Alex Williamson, Wei Yang, Michael S . Tsirkin,
	Dr . David Alan Gilbert, Peter Xu, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 23/02/21 10:02, David Hildenbrand wrote:
> 
> 
> int ram_block_discard_disable(bool state)
> int ram_block_uncoordinated_discard_disable(bool state)
> int ram_block_discard_require(bool state)
> int ram_block_coordinated_discard_require(bool state);
> bool ram_block_discard_is_disabled(void);
> bool ram_block_discard_is_required(void);
> 

Fair enough.

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-23 10:50               ` David Hildenbrand
@ 2021-02-23 15:03                 ` Paolo Bonzini
  2021-02-23 15:09                   ` David Hildenbrand
  0 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2021-02-23 15:03 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 23/02/21 11:50, David Hildenbrand wrote:
> 
> 
> However, I do agree that the notifier should work with 
> MemoryRegionSection - this will make the "client" code much easier.
> 
> The register()/replay_populate() mechanism should consume a 
> MemoryRegionSection to work on, and call the listener via (adjusted) 
> MemoryRegionSection.
> 
> Maybe I'm even able to simplify/get rid of the discard_all() callback.

Good, thanks for trying and finding the best of both worlds.

Paolo



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions
  2021-02-23 15:03                 ` Paolo Bonzini
@ 2021-02-23 15:09                   ` David Hildenbrand
  0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand @ 2021-02-23 15:09 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Pankaj Gupta, Wei Yang, Michael S . Tsirkin, Alex Williamson,
	Peter Xu, Dr . David Alan Gilbert, Auger Eric, Pankaj Gupta,
	teawater, Igor Mammedov, Marek Kedzierski

On 23.02.21 16:03, Paolo Bonzini wrote:
> On 23/02/21 11:50, David Hildenbrand wrote:
>>
>>
>> However, I do agree that the notifier should work with
>> MemoryRegionSection - this will make the "client" code much easier.
>>
>> The register()/replay_populate() mechanism should consume a
>> MemoryRegionSection to work on, and call the listener via (adjusted)
>> MemoryRegionSection.
>>
>> Maybe I'm even able to simplify/get rid of the discard_all() callback.
> 
> Good, thanks for trying and finding the best of both worlds.

I'll do a couple of reworks and then share the current state, then we 
can discuss if this is going into the right direction.

Thanks a lot for the feedback!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-02-23 15:11 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
2021-02-22 11:56 ` [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
2021-02-22 13:27   ` Paolo Bonzini
2021-02-22 14:03     ` David Hildenbrand
2021-02-22 14:18       ` Paolo Bonzini
2021-02-22 14:53         ` David Hildenbrand
2021-02-22 17:37           ` Paolo Bonzini
2021-02-22 17:48             ` David Hildenbrand
2021-02-22 19:43             ` David Hildenbrand
2021-02-23 10:50               ` David Hildenbrand
2021-02-23 15:03                 ` Paolo Bonzini
2021-02-23 15:09                   ` David Hildenbrand
2021-02-22 11:56 ` [PATCH v6 02/12] virtio-mem: Factor out traversing unplugged ranges David Hildenbrand
2021-02-22 11:56 ` [PATCH v6 03/12] virtio-mem: Don't report errors when ram_block_discard_range() fails David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 04/12] virtio-mem: Implement RamDiscardMgr interface David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case David Hildenbrand
2021-02-22 13:20   ` Paolo Bonzini
2021-02-22 14:43     ` David Hildenbrand
2021-02-22 17:29       ` Paolo Bonzini
2021-02-22 17:34         ` David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 06/12] vfio: Query and store the maximum number of possible DMA mappings David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 07/12] vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 08/12] vfio: Support for RamDiscardMgr in the vIOMMU case David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) David Hildenbrand
2021-02-22 13:14   ` Paolo Bonzini
2021-02-22 13:33     ` David Hildenbrand
2021-02-22 14:02       ` Paolo Bonzini
2021-02-22 15:38         ` David Hildenbrand
2021-02-22 17:32           ` Paolo Bonzini
2021-02-23  9:02             ` David Hildenbrand
2021-02-23 15:02               ` Paolo Bonzini
2021-02-22 11:57 ` [PATCH v6 10/12] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 11/12] virtio-mem: Require only coordinated discards David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 12/12] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.