All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Paul Durrant" <paul@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>
Subject: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches
Date: Mon, 25 Apr 2022 10:34:59 +0200	[thread overview]
Message-ID: <f85a5557-3483-8135-ff47-a15474aaebb4@suse.com> (raw)
In-Reply-To: <b92e294e-7277-d977-bb96-7c28d60000c6@suse.com>

For large page mappings to be easily usable (i.e. in particular without
un-shattering of smaller page mappings) and for mapping operations to
then also be more efficient, pass batches of Dom0 memory to iommu_map().
In dom0_construct_pv() and its helpers (covering strict mode) this
additionally requires establishing the type of those pages (albeit with
zero type references).

The earlier establishing of PGT_writable_page | PGT_validated requires
the existing places where this gets done (through get_page_and_type())
to be updated: For pages which actually have a mapping, the type
refcount needs to be 1.

There is actually a related bug that gets fixed here as a side effect:
Typically the last L1 table would get marked as such only after
get_page_and_type(..., PGT_writable_page). While this is fine as far as
refcounting goes, the page did remain mapped in the IOMMU in this case
(when "iommu=dom0-strict").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Subsequently p2m_add_identity_entry() may want to also gain an order
parameter, for arch_iommu_hwdom_init() to use. While this only affects
non-RAM regions, systems typically have 2-16Mb of reserved space
immediately below 4Gb, which hence could be mapped more efficiently.

The installing of zero-ref writable types has in fact shown (observed
while putting together the change) that despite the intention by the
XSA-288 changes (affecting DomU-s only) for Dom0 a number of
sufficiently ordinary pages (at the very least initrd and P2M ones as
well as pages that are part of the initial allocation but not part of
the initial mapping) still have been starting out as PGT_none, meaning
that they would have gained IOMMU mappings only the first time these
pages would get mapped writably. Consequently an open question is
whether iommu_memory_setup() should set the pages to PGT_writable_page
independent of need_iommu_pt_sync().

I didn't think I need to address the bug mentioned in the description in
a separate (prereq) patch, but if others disagree I could certainly
break out that part (needing to first use iommu_legacy_unmap() then).

Note that 4k P2M pages don't get (pre-)mapped in setup_pv_physmap():
They'll end up mapped via the later get_page_and_type().

As to the way these refs get installed: I've chosen to avoid the more
expensive {get,put}_page_and_type(), favoring to put in place the
intended type directly. I guess I could be convinced to avoid this
bypassing of the actual logic; I merely think it's unnecessarily
expensive.

Note also that strictly speaking the iommu_iotlb_flush_all() here (as
well as the pre-existing one in arch_iommu_hwdom_init()) shouldn't be
needed: Actual hooking up (AMD) or enabling of translation (VT-d)
occurs only afterwards anyway, so nothing can have made it into TLBs
just yet.
---
v3: Fold iommu_map() into (the now renamed) iommu_memory_setup(). Move
    iommu_unmap() into mark_pv_pt_pages_rdonly(). Adjust (split) log
    message in arch_iommu_hwdom_init().

--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -46,7 +46,8 @@ void __init dom0_update_physmap(bool com
 static __init void mark_pv_pt_pages_rdonly(struct domain *d,
                                            l4_pgentry_t *l4start,
                                            unsigned long vpt_start,
-                                           unsigned long nr_pt_pages)
+                                           unsigned long nr_pt_pages,
+                                           unsigned int *flush_flags)
 {
     unsigned long count;
     struct page_info *page;
@@ -71,6 +72,14 @@ static __init void mark_pv_pt_pages_rdon
         ASSERT((page->u.inuse.type_info & PGT_type_mask) <= PGT_root_page_table);
         ASSERT(!(page->u.inuse.type_info & ~(PGT_type_mask | PGT_pae_xen_l2)));
 
+        /*
+         * Page table pages need to be removed from the IOMMU again in case
+         * iommu_memory_setup() ended up mapping them.
+         */
+        if ( need_iommu_pt_sync(d) &&
+             iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, flush_flags) )
+            BUG();
+
         /* Read-only mapping + PGC_allocated + page-table page. */
         page->count_info         = PGC_allocated | 3;
         page->u.inuse.type_info |= PGT_validated | 1;
@@ -107,11 +116,43 @@ static __init void mark_pv_pt_pages_rdon
     unmap_domain_page(pl3e);
 }
 
+static void __init iommu_memory_setup(struct domain *d, const char *what,
+                                      struct page_info *page, unsigned long nr,
+                                      unsigned int *flush_flags)
+{
+    int rc;
+    mfn_t mfn = page_to_mfn(page);
+
+    if ( !need_iommu_pt_sync(d) )
+        return;
+
+    rc = iommu_map(d, _dfn(mfn_x(mfn)), mfn, nr,
+                   IOMMUF_readable | IOMMUF_writable, flush_flags);
+    if ( rc )
+    {
+        printk(XENLOG_ERR "pre-mapping %s MFN [%lx,%lx) into IOMMU failed: %d\n",
+               what, mfn_x(mfn), mfn_x(mfn) + nr, rc);
+        return;
+    }
+
+    /*
+     * For successfully established IOMMU mappings the type of the page(s)
+     * needs to match (for _get_page_type() to unmap upon type change). Set
+     * the page(s) to writable with no type ref.
+     */
+    for ( ; nr--; ++page )
+    {
+        ASSERT(!page->u.inuse.type_info);
+        page->u.inuse.type_info = PGT_writable_page | PGT_validated;
+    }
+}
+
 static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn,
                                     unsigned long v_start, unsigned long v_end,
                                     unsigned long vphysmap_start,
                                     unsigned long vphysmap_end,
-                                    unsigned long nr_pages)
+                                    unsigned long nr_pages,
+                                    unsigned int *flush_flags)
 {
     struct page_info *page = NULL;
     l4_pgentry_t *pl4e, *l4start = map_domain_page(_mfn(pgtbl_pfn));
@@ -177,6 +218,10 @@ static __init void setup_pv_physmap(stru
                                              L3_PAGETABLE_SHIFT - PAGE_SHIFT,
                                              MEMF_no_scrub)) != NULL )
             {
+                iommu_memory_setup(d, "P2M 1G", page,
+                                   SUPERPAGE_PAGES * SUPERPAGE_PAGES,
+                                   flush_flags);
+
                 *pl3e = l3e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
                 vphysmap_start += 1UL << L3_PAGETABLE_SHIFT;
                 continue;
@@ -203,6 +248,9 @@ static __init void setup_pv_physmap(stru
                                              L2_PAGETABLE_SHIFT - PAGE_SHIFT,
                                              MEMF_no_scrub)) != NULL )
             {
+                iommu_memory_setup(d, "P2M 2M", page, SUPERPAGE_PAGES,
+                                   flush_flags);
+
                 *pl2e = l2e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
                 vphysmap_start += 1UL << L2_PAGETABLE_SHIFT;
                 continue;
@@ -311,6 +359,7 @@ int __init dom0_construct_pv(struct doma
     unsigned long initrd_pfn = -1, initrd_mfn = 0;
     unsigned long count;
     struct page_info *page = NULL;
+    unsigned int flush_flags = 0;
     start_info_t *si;
     struct vcpu *v = d->vcpu[0];
     void *image_base = bootstrap_map(image);
@@ -573,6 +622,9 @@ int __init dom0_construct_pv(struct doma
                     BUG();
         }
         initrd->mod_end = 0;
+
+        iommu_memory_setup(d, "initrd", mfn_to_page(_mfn(initrd_mfn)),
+                           PFN_UP(initrd_len), &flush_flags);
     }
 
     printk("PHYSICAL MEMORY ARRANGEMENT:\n"
@@ -606,6 +658,13 @@ int __init dom0_construct_pv(struct doma
 
     process_pending_softirqs();
 
+    /*
+     * Map the full range here and then punch holes for page tables
+     * alongside marking them as such in mark_pv_pt_pages_rdonly().
+     */
+    iommu_memory_setup(d, "init-alloc", mfn_to_page(_mfn(alloc_spfn)),
+                       alloc_epfn - alloc_spfn, &flush_flags);
+
     mpt_alloc = (vpt_start - v_start) + pfn_to_paddr(alloc_spfn);
     if ( vinitrd_start )
         mpt_alloc -= PAGE_ALIGN(initrd_len);
@@ -690,7 +749,8 @@ int __init dom0_construct_pv(struct doma
         l1tab++;
 
         page = mfn_to_page(_mfn(mfn));
-        if ( !page->u.inuse.type_info &&
+        if ( (!page->u.inuse.type_info ||
+              page->u.inuse.type_info == (PGT_writable_page | PGT_validated)) &&
              !get_page_and_type(page, d, PGT_writable_page) )
             BUG();
     }
@@ -719,7 +779,7 @@ int __init dom0_construct_pv(struct doma
     }
 
     /* Pages that are part of page tables must be read only. */
-    mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages);
+    mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages, &flush_flags);
 
     /* Mask all upcalls... */
     for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
@@ -794,7 +854,7 @@ int __init dom0_construct_pv(struct doma
     {
         pfn = pagetable_get_pfn(v->arch.guest_table);
         setup_pv_physmap(d, pfn, v_start, v_end, vphysmap_start, vphysmap_end,
-                         nr_pages);
+                         nr_pages, &flush_flags);
     }
 
     /* Write the phys->machine and machine->phys table entries. */
@@ -825,7 +885,9 @@ int __init dom0_construct_pv(struct doma
         if ( get_gpfn_from_mfn(mfn) >= count )
         {
             BUG_ON(compat);
-            if ( !page->u.inuse.type_info &&
+            if ( (!page->u.inuse.type_info ||
+                  page->u.inuse.type_info == (PGT_writable_page |
+                                              PGT_validated)) &&
                  !get_page_and_type(page, d, PGT_writable_page) )
                 BUG();
 
@@ -841,8 +903,12 @@ int __init dom0_construct_pv(struct doma
 #endif
     while ( pfn < nr_pages )
     {
-        if ( (page = alloc_chunk(d, nr_pages - domain_tot_pages(d))) == NULL )
+        count = domain_tot_pages(d);
+        if ( (page = alloc_chunk(d, nr_pages - count)) == NULL )
             panic("Not enough RAM for DOM0 reservation\n");
+
+        iommu_memory_setup(d, "chunk", page, domain_tot_pages(d) - count,
+                           &flush_flags);
         while ( pfn < domain_tot_pages(d) )
         {
             mfn = mfn_x(page_to_mfn(page));
@@ -857,6 +923,10 @@ int __init dom0_construct_pv(struct doma
         }
     }
 
+    /* Use while() to avoid compiler warning. */
+    while ( iommu_iotlb_flush_all(d, flush_flags) )
+        break;
+
     if ( initrd_len != 0 )
     {
         si->mod_start = vinitrd_start ?: initrd_pfn;
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i
 
 void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
 {
-    unsigned long i, top, max_pfn;
-    unsigned int flush_flags = 0;
+    unsigned long i, top, max_pfn, start, count;
+    unsigned int flush_flags = 0, start_perms = 0;
 
     BUG_ON(!is_hardware_domain(d));
 
@@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init(
      * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid
      * setting up potentially conflicting mappings here.
      */
-    i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
+    start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
 
-    for ( ; i < top; i++ )
+    for ( i = start, count = 0; i < top; )
     {
         unsigned long pfn = pdx_to_pfn(i);
         unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);
@@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init(
         if ( !perms )
             rc = 0;
         else if ( paging_mode_translate(d) )
+        {
             rc = p2m_add_identity_entry(d, pfn,
                                         perms & IOMMUF_writable ? p2m_access_rw
                                                                 : p2m_access_r,
                                         0);
+            if ( rc )
+                printk(XENLOG_WARNING
+                       "%pd: identity mapping of %lx failed: %d\n",
+                       d, pfn, rc);
+        }
+        else if ( pfn != start + count || perms != start_perms )
+        {
+        commit:
+            rc = iommu_map(d, _dfn(start), _mfn(start), count, start_perms,
+                           &flush_flags);
+            if ( rc )
+                printk(XENLOG_WARNING
+                       "%pd: IOMMU identity mapping of [%lx,%lx) failed: %d\n",
+                       d, pfn, pfn + count, rc);
+            SWAP(start, pfn);
+            start_perms = perms;
+            count = 1;
+        }
         else
-            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K,
-                           perms, &flush_flags);
+        {
+            ++count;
+            rc = 0;
+        }
 
-        if ( rc )
-            printk(XENLOG_WARNING "%pd: identity %smapping of %lx failed: %d\n",
-                   d, !paging_mode_translate(d) ? "IOMMU " : "", pfn, rc);
 
-        if (!(i & 0xfffff))
+        if ( !(++i & 0xfffff) )
             process_pending_softirqs();
+
+        if ( i == top && count )
+            goto commit;
     }
 
     /* Use if to avoid compiler warning */



  parent reply	other threads:[~2022-04-25  8:35 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-25  8:29 [PATCH v4 00/21] IOMMU: superpage support when not sharing pagetables Jan Beulich
2022-04-25  8:30 ` [PATCH v4 01/21] AMD/IOMMU: correct potentially-UB shifts Jan Beulich
2022-04-27 13:08   ` Andrew Cooper
2022-04-27 13:57     ` Jan Beulich
2022-05-03 10:10   ` Roger Pau Monné
2022-05-03 14:34     ` Jan Beulich
2022-04-25  8:32 ` [PATCH v4 02/21] IOMMU: simplify unmap-on-error in iommu_map() Jan Beulich
2022-04-27 13:16   ` Andrew Cooper
2022-04-27 14:05     ` Jan Beulich
2022-05-03 10:25   ` Roger Pau Monné
2022-05-03 14:37     ` Jan Beulich
2022-05-03 16:22       ` Roger Pau Monné
2022-04-25  8:32 ` [PATCH v4 03/21] IOMMU: add order parameter to ->{,un}map_page() hooks Jan Beulich
2022-04-25  8:33 ` [PATCH v4 04/21] IOMMU: have iommu_{,un}map() split requests into largest possible chunks Jan Beulich
2022-05-03 12:37   ` Roger Pau Monné
2022-05-03 14:44     ` Jan Beulich
2022-05-04 10:20       ` Roger Pau Monné
2022-04-25  8:34 ` [PATCH v4 05/21] IOMMU/x86: restrict IO-APIC mappings for PV Dom0 Jan Beulich
2022-05-03 13:00   ` Roger Pau Monné
2022-05-03 14:50     ` Jan Beulich
2022-05-04  9:32       ` Jan Beulich
2022-05-04 10:30         ` Roger Pau Monné
2022-05-04 10:51           ` Jan Beulich
2022-05-04 12:01             ` Roger Pau Monné
2022-05-04 12:12               ` Jan Beulich
2022-05-04 13:00                 ` Roger Pau Monné
2022-05-04 13:19                   ` Jan Beulich
2022-05-04 13:46                     ` Roger Pau Monné
2022-05-04 13:55                       ` Jan Beulich
2022-05-04 15:22                         ` Roger Pau Monné
2022-04-25  8:34 ` Jan Beulich [this message]
2022-05-03 14:49   ` [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches Roger Pau Monné
2022-05-04  9:46     ` Jan Beulich
2022-05-04 11:20       ` Roger Pau Monné
2022-05-04 12:27         ` Jan Beulich
2022-05-04 13:55           ` Roger Pau Monné
2022-05-04 14:26             ` Jan Beulich
2022-04-25  8:35 ` [PATCH v4 07/21] IOMMU/x86: support freeing of pagetables Jan Beulich
2022-05-03 16:20   ` Roger Pau Monné
2022-05-04 13:07     ` Jan Beulich
2022-05-04 15:06       ` Roger Pau Monné
2022-05-05  8:20         ` Jan Beulich
2022-05-05  9:57           ` Roger Pau Monné
2022-04-25  8:36 ` [PATCH v4 08/21] AMD/IOMMU: walk trees upon page fault Jan Beulich
2022-05-04 15:57   ` Roger Pau Monné
2022-04-25  8:37 ` [PATCH v4 09/21] AMD/IOMMU: return old PTE from {set,clear}_iommu_pte_present() Jan Beulich
2022-04-25  8:38 ` [PATCH v4 10/21] AMD/IOMMU: allow use of superpage mappings Jan Beulich
2022-05-05 13:19   ` Roger Pau Monné
2022-05-05 14:34     ` Jan Beulich
2022-05-05 15:26       ` Roger Pau Monné
2022-04-25  8:38 ` [PATCH v4 11/21] VT-d: " Jan Beulich
2022-05-05 16:20   ` Roger Pau Monné
2022-05-06  6:13     ` Jan Beulich
2022-04-25  8:40 ` [PATCH v4 12/21] IOMMU: fold flush-all hook into "flush one" Jan Beulich
2022-05-06  8:38   ` Roger Pau Monné
2022-05-06  9:59     ` Jan Beulich
2022-04-25  8:40 ` [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables Jan Beulich
2022-05-06 11:16   ` Roger Pau Monné
2022-05-19 12:12     ` Jan Beulich
2022-05-20 10:47       ` Roger Pau Monné
2022-05-20 11:11         ` Jan Beulich
2022-05-20 11:13           ` Jan Beulich
2022-05-20 12:22             ` Roger Pau Monné
2022-05-20 12:36               ` Jan Beulich
2022-05-20 14:28                 ` Roger Pau Monné
2022-05-20 14:38                   ` Roger Pau Monné
2022-05-23  6:49                     ` Jan Beulich
2022-05-23  9:10                       ` Roger Pau Monné
2022-05-23 10:52                         ` Jan Beulich
2022-04-25  8:41 ` [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in " Jan Beulich
2022-05-06 13:25   ` Roger Pau Monné
2022-05-18 10:06     ` Jan Beulich
2022-05-20 10:22       ` Roger Pau Monné
2022-05-20 10:59         ` Jan Beulich
2022-05-20 11:27           ` Roger Pau Monné
2022-04-25  8:42 ` [PATCH v4 15/21] AMD/IOMMU: free all-empty " Jan Beulich
2022-05-10 13:30   ` Roger Pau Monné
2022-05-18 10:18     ` Jan Beulich
2022-04-25  8:42 ` [PATCH v4 16/21] VT-d: " Jan Beulich
2022-04-27  4:09   ` Tian, Kevin
2022-05-10 14:30   ` Roger Pau Monné
2022-05-18 10:26     ` Jan Beulich
2022-05-20  0:38       ` Tian, Kevin
2022-05-20 11:13       ` Roger Pau Monné
2022-05-27  7:40         ` Jan Beulich
2022-05-27  7:53           ` Jan Beulich
2022-05-27  9:21             ` Roger Pau Monné
2022-04-25  8:43 ` [PATCH v4 17/21] AMD/IOMMU: replace all-contiguous page tables by superpage mappings Jan Beulich
2022-05-10 15:31   ` Roger Pau Monné
2022-05-18 10:40     ` Jan Beulich
2022-05-20 10:35       ` Roger Pau Monné
2022-04-25  8:43 ` [PATCH v4 18/21] VT-d: " Jan Beulich
2022-05-11 11:08   ` Roger Pau Monné
2022-05-18 10:44     ` Jan Beulich
2022-05-20 10:38       ` Roger Pau Monné
2022-04-25  8:44 ` [PATCH v4 19/21] IOMMU/x86: add perf counters for page table splitting / coalescing Jan Beulich
2022-05-11 13:48   ` Roger Pau Monné
2022-05-18 11:39     ` Jan Beulich
2022-05-20 10:41       ` Roger Pau Monné
2022-04-25  8:44 ` [PATCH v4 20/21] VT-d: fold iommu_flush_iotlb{,_pages}() Jan Beulich
2022-04-27  4:12   ` Tian, Kevin
2022-05-11 13:50   ` Roger Pau Monné
2022-04-25  8:45 ` [PATCH v4 21/21] VT-d: fold dma_pte_clear_one() into its only caller Jan Beulich
2022-04-27  4:13   ` Tian, Kevin
2022-05-11 13:57   ` Roger Pau Monné
2022-05-18 12:50 ` [PATCH v4 00/21] IOMMU: superpage support when not sharing pagetables Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f85a5557-3483-8135-ff47-a15474aaebb4@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=paul@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.