All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduardo Habkost <ehabkost@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>, qemu-devel@nongnu.org
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Peter Xu <peterx@redhat.com>, Auger Eric <eric.auger@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	teawater <teawaterz@linux.alibaba.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marek Kedzierski <mkedzier@redhat.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>
Subject: [PULL 11/15] vfio: Support for RamDiscardManager in the vIOMMU case
Date: Wed,  7 Jul 2021 15:32:37 -0400	[thread overview]
Message-ID: <20210707193241.2659335-12-ehabkost@redhat.com> (raw)
In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com>

From: David Hildenbrand <david@redhat.com>

vIOMMU support works already with RamDiscardManager as long as guests only
map populated memory. Both, populated and discarded memory is mapped
into &address_space_memory, where vfio_get_xlat_addr() will find that
memory, to create the vfio mapping.

Sane guests will never map discarded memory (e.g., unplugged memory
blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while
memory is getting discarded. However, there are two cases where a malicious
guests could trigger pinning of more memory than intended.

One case is easy to handle: the guest trying to map discarded memory
into an IOMMU.

The other case is harder to handle: the guest keeping memory mapped in
the IOMMU while it is getting discarded. We would have to walk over all
mappings when discarding memory and identify if any mapping would be a
violation. Let's keep it simple for now and print a warning, indicating
that setting RLIMIT_MEMLOCK can mitigate such attacks.

We have to take care of incoming migration: at the point the
IOMMUs get restored and start creating mappings in vfio, RamDiscardManager
implementations might not be back up and running yet: let's add runstate
priorities to enforce the order when restoring.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Auger Eric <eric.auger@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210413095531.25603-10-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
 include/migration/vmstate.h |  1 +
 hw/vfio/common.c            | 39 +++++++++++++++++++++++++++++++++++++
 hw/virtio/virtio-mem.c      |  1 +
 3 files changed, 41 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 8df7b69f389..017c03675ca 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -153,6 +153,7 @@ typedef enum {
     MIG_PRI_DEFAULT = 0,
     MIG_PRI_IOMMU,              /* Must happen before PCI devices */
     MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
+    MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
     MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
     MIG_PRI_GICV3,              /* Must happen before the ITS */
     MIG_PRI_MAX,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f8a2fe8441a..8a9bbf27918 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -36,6 +36,7 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
+#include "sysemu/runstate.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
@@ -569,6 +570,44 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
         error_report("iommu map to non memory area %"HWADDR_PRIx"",
                      xlat);
         return false;
+    } else if (memory_region_has_ram_discard_manager(mr)) {
+        RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
+        MemoryRegionSection tmp = {
+            .mr = mr,
+            .offset_within_region = xlat,
+            .size = int128_make64(len),
+        };
+
+        /*
+         * Malicious VMs can map memory into the IOMMU, which is expected
+         * to remain discarded. vfio will pin all pages, populating memory.
+         * Disallow that. vmstate priorities make sure any RamDiscardManager
+         * were already restored before IOMMUs are restored.
+         */
+        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
+            error_report("iommu map to discarded memory (e.g., unplugged via"
+                         " virtio-mem): %"HWADDR_PRIx"",
+                         iotlb->translated_addr);
+            return false;
+        }
+
+        /*
+         * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
+         * pages will remain pinned inside vfio until unmapped, resulting in a
+         * higher memory consumption than expected. If memory would get
+         * populated again later, there would be an inconsistency between pages
+         * pinned by vfio and pages seen by QEMU. This is the case until
+         * unmapped from the IOMMU (e.g., during device reset).
+         *
+         * With malicious guests, we really only care about pinning more memory
+         * than expected. RLIMIT_MEMLOCK set for the user/process can never be
+         * exceeded and can be used to mitigate this problem.
+         */
+        warn_report_once("Using vfio with vIOMMUs and coordinated discarding of"
+                         " RAM (e.g., virtio-mem) works, however, malicious"
+                         " guests can trigger pinning of more memory than"
+                         " intended via an IOMMU. It's possible to mitigate "
+                         " by setting/adjusting RLIMIT_MEMLOCK.");
     }
 
     /*
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index f60cb8a3fc0..368ae1db903 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -886,6 +886,7 @@ static const VMStateDescription vmstate_virtio_mem_device = {
     .name = "virtio-mem-device",
     .minimum_version_id = 1,
     .version_id = 1,
+    .priority = MIG_PRI_VIRTIO_MEM,
     .post_load = virtio_mem_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
-- 
2.31.1



  parent reply	other threads:[~2021-07-07 19:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07 19:32 [PULL 00/15] Machine queue, 2021-07-07 Eduardo Habkost
2021-07-07 19:32 ` [PULL 01/15] vmbus: Don't make QOM property registration conditional Eduardo Habkost
2021-07-07 19:32 ` [PULL 02/15] Deprecate pmem=on with non-DAX capable backend file Eduardo Habkost
2021-07-07 19:32 ` [PULL 03/15] memory: Introduce RamDiscardManager for RAM memory regions Eduardo Habkost
2021-07-07 19:32 ` [PULL 04/15] memory: Helpers to copy/free a MemoryRegionSection Eduardo Habkost
2021-07-07 19:32 ` [PULL 05/15] virtio-mem: Factor out traversing unplugged ranges Eduardo Habkost
2021-07-07 19:32 ` [PULL 06/15] virtio-mem: Don't report errors when ram_block_discard_range() fails Eduardo Habkost
2021-07-07 19:32 ` [PULL 07/15] virtio-mem: Implement RamDiscardManager interface Eduardo Habkost
2021-07-07 19:32 ` [PULL 08/15] vfio: Support for RamDiscardManager in the !vIOMMU case Eduardo Habkost
2021-07-07 19:32 ` [PULL 09/15] vfio: Query and store the maximum number of possible DMA mappings Eduardo Habkost
2021-07-07 19:32 ` [PULL 10/15] vfio: Sanity check maximum number of DMA mappings with RamDiscardManager Eduardo Habkost
2021-07-07 19:32 ` Eduardo Habkost [this message]
2021-07-07 19:32 ` [PULL 12/15] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) Eduardo Habkost
2021-07-07 19:32 ` [PULL 13/15] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types Eduardo Habkost
2021-07-07 19:32 ` [PULL 14/15] virtio-mem: Require only coordinated discards Eduardo Habkost
2021-07-07 19:32 ` [PULL 15/15] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus Eduardo Habkost
2021-07-08  9:53 ` [PULL 00/15] Machine queue, 2021-07-07 Peter Maydell
2021-07-08 13:03   ` David Hildenbrand
2021-07-08 15:32   ` Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210707193241.2659335-12-ehabkost@redhat.com \
    --to=ehabkost@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mkedzier@redhat.com \
    --cc=mst@redhat.com \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=teawaterz@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.