All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Peter Xu <peterx@redhat.com>, Auger Eric <eric.auger@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	teawater <teawaterz@linux.alibaba.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marek Kedzierski <mkedzier@redhat.com>
Subject: [PATCH v6 12/12] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus
Date: Mon, 22 Feb 2021 12:57:08 +0100	[thread overview]
Message-ID: <20210222115708.7623-13-david@redhat.com> (raw)
In-Reply-To: <20210222115708.7623-1-david@redhat.com>

We support coordinated discarding of RAM using the RamDiscardMgr for
the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards,
keeping uncoordinated discards (e.g., via virtio-balloon) disabled if
possible.

This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://"
by the block layer has to be implemented/unlocked separately. For now,
virtio-mem only supports x86-64; we don't restrict RamDiscardMgr to x86-64,
though: arm64 and s390x are supposed to work as well, and we'll test
once unlocking virtio-mem support. The spapr IOMMUs will need special
care, to be tackled later, e.g.., once supporting virtio-mem.

Note: The block size of a virtio-mem device has to be set to sane sizes,
depending on the maximum hotplug size - to not run out of vfio mappings.
The default virtio-mem block size is usually in the range of a couple of
MBs. The maximum number of mapping is 64k, shared with other users.
Assume you want to hotplug 256GB using virtio-mem - the block size would
have to be set to at least 8 MiB (resulting in 32768 separate mappings).

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/vfio/common.c | 63 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 12 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 15ecd05a4b..d879b8ab92 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -135,6 +135,27 @@ static const char *index_to_str(VFIODevice *vbasedev, int index)
     }
 }
 
+static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
+{
+    switch (container->iommu_type) {
+    case VFIO_TYPE1v2_IOMMU:
+    case VFIO_TYPE1_IOMMU:
+        /* We support coordinated discarding of RAM via the RamDiscardMgr. */
+        return ram_block_uncoordinated_discard_disable(state);
+    default:
+        /*
+         * VFIO_SPAPR_TCE_IOMMU most probably works just fine with
+         * RamDiscardMgr, however, it is completely untested.
+         *
+         * VFIO_SPAPR_TCE_v2_IOMMU with "DMA memory preregistering" does
+         * completely the opposite of managing mapping/pinning dynamically as
+         * required by RamDiscardMgr. We would have to special-case sections
+         * with a RamDiscardMgr.
+         */
+        return ram_block_discard_disable(state);
+    }
+}
+
 int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
                            int action, int fd, Error **errp)
 {
@@ -1979,15 +2000,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
      * new memory, it will not yet set ram_block_discard_set_required() and
      * therefore, neither stops us here or deals with the sudden memory
      * consumption of inflated memory.
+     *
+     * We do support discarding of memory coordinated via the RamDiscardMgr
+     * with some IOMMU types. vfio_ram_block_discard_disable() handles the
+     * details once we know which type of IOMMU we are using.
      */
-    ret = ram_block_discard_disable(true);
-    if (ret) {
-        error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
-        return ret;
-    }
 
     QLIST_FOREACH(container, &space->containers, next) {
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
+            ret = vfio_ram_block_discard_disable(container, true);
+            if (ret) {
+                error_setg_errno(errp, -ret,
+                                 "Cannot set discarding of RAM broken");
+                if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER,
+                          &container->fd)) {
+                    error_report("vfio: error disconnecting group %d from"
+                                 " container", group->groupid);
+                }
+                return ret;
+            }
             group->container = container;
             QLIST_INSERT_HEAD(&container->group_list, group, container_next);
             vfio_kvm_device_add_group(group);
@@ -2025,6 +2056,12 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         goto free_container_exit;
     }
 
+    ret = vfio_ram_block_discard_disable(container, true);
+    if (ret) {
+        error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
+        goto free_container_exit;
+    }
+
     switch (container->iommu_type) {
     case VFIO_TYPE1v2_IOMMU:
     case VFIO_TYPE1_IOMMU:
@@ -2072,7 +2109,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             if (ret) {
                 error_setg_errno(errp, errno, "failed to enable container");
                 ret = -errno;
-                goto free_container_exit;
+                goto enable_discards_exit;
             }
         } else {
             container->prereg_listener = vfio_prereg_listener;
@@ -2084,7 +2121,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                 ret = -1;
                 error_propagate_prepend(errp, container->error,
                     "RAM memory listener initialization failed: ");
-                goto free_container_exit;
+                goto enable_discards_exit;
             }
         }
 
@@ -2097,7 +2134,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             if (v2) {
                 memory_listener_unregister(&container->prereg_listener);
             }
-            goto free_container_exit;
+            goto enable_discards_exit;
         }
 
         if (v2) {
@@ -2112,7 +2149,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             if (ret) {
                 error_setg_errno(errp, -ret,
                                  "failed to remove existing window");
-                goto free_container_exit;
+                goto enable_discards_exit;
             }
         } else {
             /* The default table uses 4K pages */
@@ -2153,6 +2190,9 @@ listener_release_exit:
     vfio_kvm_device_del_group(group);
     vfio_listener_release(container);
 
+enable_discards_exit:
+    vfio_ram_block_discard_disable(container, false);
+
 free_container_exit:
     g_free(container);
 
@@ -2160,7 +2200,6 @@ close_fd_exit:
     close(fd);
 
 put_space_exit:
-    ram_block_discard_disable(false);
     vfio_put_address_space(space);
 
     return ret;
@@ -2282,7 +2321,7 @@ void vfio_put_group(VFIOGroup *group)
     }
 
     if (!group->ram_block_discard_allowed) {
-        ram_block_discard_disable(false);
+        vfio_ram_block_discard_disable(group->container, false);
     }
     vfio_kvm_device_del_group(group);
     vfio_disconnect_container(group);
@@ -2336,7 +2375,7 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 
         if (!group->ram_block_discard_allowed) {
             group->ram_block_discard_allowed = true;
-            ram_block_discard_disable(false);
+            vfio_ram_block_discard_disable(group->container, false);
         }
     }
 
-- 
2.29.2



      parent reply	other threads:[~2021-02-22 12:08 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 11:56 [PATCH v6 00/12] virtio-mem: vfio support David Hildenbrand
2021-02-22 11:56 ` [PATCH v6 01/12] memory: Introduce RamDiscardMgr for RAM memory regions David Hildenbrand
2021-02-22 13:27   ` Paolo Bonzini
2021-02-22 14:03     ` David Hildenbrand
2021-02-22 14:18       ` Paolo Bonzini
2021-02-22 14:53         ` David Hildenbrand
2021-02-22 17:37           ` Paolo Bonzini
2021-02-22 17:48             ` David Hildenbrand
2021-02-22 19:43             ` David Hildenbrand
2021-02-23 10:50               ` David Hildenbrand
2021-02-23 15:03                 ` Paolo Bonzini
2021-02-23 15:09                   ` David Hildenbrand
2021-02-22 11:56 ` [PATCH v6 02/12] virtio-mem: Factor out traversing unplugged ranges David Hildenbrand
2021-02-22 11:56 ` [PATCH v6 03/12] virtio-mem: Don't report errors when ram_block_discard_range() fails David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 04/12] virtio-mem: Implement RamDiscardMgr interface David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 05/12] vfio: Support for RamDiscardMgr in the !vIOMMU case David Hildenbrand
2021-02-22 13:20   ` Paolo Bonzini
2021-02-22 14:43     ` David Hildenbrand
2021-02-22 17:29       ` Paolo Bonzini
2021-02-22 17:34         ` David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 06/12] vfio: Query and store the maximum number of possible DMA mappings David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 07/12] vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 08/12] vfio: Support for RamDiscardMgr in the vIOMMU case David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 09/12] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) David Hildenbrand
2021-02-22 13:14   ` Paolo Bonzini
2021-02-22 13:33     ` David Hildenbrand
2021-02-22 14:02       ` Paolo Bonzini
2021-02-22 15:38         ` David Hildenbrand
2021-02-22 17:32           ` Paolo Bonzini
2021-02-23  9:02             ` David Hildenbrand
2021-02-23 15:02               ` Paolo Bonzini
2021-02-22 11:57 ` [PATCH v6 10/12] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types David Hildenbrand
2021-02-22 11:57 ` [PATCH v6 11/12] virtio-mem: Require only coordinated discards David Hildenbrand
2021-02-22 11:57 ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210222115708.7623-13-david@redhat.com \
    --to=david@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mkedzier@redhat.com \
    --cc=mst@redhat.com \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=teawaterz@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.