All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Paul Durrant" <paul@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Kevin Tian" <kevin.tian@intel.com>
Subject: [PATCH v3 20/23] VT-d: free all-empty page tables
Date: Mon, 10 Jan 2022 17:36:22 +0100	[thread overview]
Message-ID: <807a48fe-3829-d976-75dc-1767d32fb0f4@suse.com> (raw)
In-Reply-To: <76cb9f26-e316-98a2-b1ba-e51e3d20f335@suse.com>

When a page table ends up with no present entries left, it can be
replaced by a non-present entry at the next higher level. The page table
itself can then be scheduled for freeing.

Note that while its output isn't used there yet,
pt_update_contig_markers() right away needs to be called in all places
where entries get updated, not just the one where entries get cleared.

Note further that while pt_update_contig_markers() updates perhaps
several PTEs within the table, since these are changes to "avail" bits
only I do not think that cache flushing would be needed afterwards. Such
cache flushing (of entire pages, unless adding yet more logic to me more
selective) would be quite noticable performance-wise (very prominent
during Dom0 boot).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Properly bound loop. Re-base over changes earlier in the series.
v2: New.
---
The hang during boot on my Latitude E6410 (see the respective code
comment) was pretty close after iommu_enable_translation(). No errors,
no watchdog would kick in, just sometimes the first few pixel lines of
the next log message's (XEN) prefix would have made it out to the screen
(and there's no serial there). It's been a lot of experimenting until I
figured the workaround (which I consider ugly, but halfway acceptable).
I've been trying hard to make sure the workaround wouldn't be masking a
real issue, yet I'm still wary of it possibly doing so ... My best guess
at this point is that on these old IOMMUs the ignored bits 52...61
aren't really ignored for present entries, but also aren't "reserved"
enough to trigger faults. This guess is from having tried to set other
bits in this range (unconditionally, and with the workaround here in
place), which yielded the same behavior.

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -42,6 +42,9 @@
 #include "vtd.h"
 #include "../ats.h"
 
+#define CONTIG_MASK DMA_PTE_CONTIG_MASK
+#include <asm/pt-contig-markers.h>
+
 /* dom_io is used as a sentinel for quarantined devices */
 #define QUARANTINE_SKIP(d) ((d) == dom_io && !dom_iommu(d)->arch.vtd.pgd_maddr)
 
@@ -452,6 +455,9 @@ static uint64_t addr_to_dma_page_maddr(s
 
             write_atomic(&pte->val, new_pte.val);
             iommu_sync_cache(pte, sizeof(struct dma_pte));
+            pt_update_contig_markers(&parent->val,
+                                     address_level_offset(addr, level),
+                                     level, PTE_kind_table);
         }
 
         if ( --level == target )
@@ -879,9 +885,31 @@ static int dma_pte_clear_one(struct doma
 
     old = *pte;
     dma_clear_pte(*pte);
+    iommu_sync_cache(pte, sizeof(*pte));
+
+    while ( pt_update_contig_markers(&page->val,
+                                     address_level_offset(addr, level),
+                                     level, PTE_kind_null) &&
+            ++level < min_pt_levels )
+    {
+        struct page_info *pg = maddr_to_page(pg_maddr);
+
+        unmap_vtd_domain_page(page);
+
+        pg_maddr = addr_to_dma_page_maddr(domain, addr, level, flush_flags,
+                                          false);
+        BUG_ON(pg_maddr < PAGE_SIZE);
+
+        page = map_vtd_domain_page(pg_maddr);
+        pte = &page[address_level_offset(addr, level)];
+        dma_clear_pte(*pte);
+        iommu_sync_cache(pte, sizeof(*pte));
+
+        *flush_flags |= IOMMU_FLUSHF_all;
+        iommu_queue_free_pgtable(domain, pg);
+    }
 
     spin_unlock(&hd->arch.mapping_lock);
-    iommu_sync_cache(pte, sizeof(struct dma_pte));
 
     unmap_vtd_domain_page(page);
 
@@ -2037,8 +2065,21 @@ static int __must_check intel_iommu_map_
     }
 
     *pte = new;
-
     iommu_sync_cache(pte, sizeof(struct dma_pte));
+
+    /*
+     * While the (ab)use of PTE_kind_table here allows to save some work in
+     * the function, the main motivation for it is that it avoids a so far
+     * unexplained hang during boot (while preparing Dom0) on a Westmere
+     * based laptop.
+     */
+    pt_update_contig_markers(&page->val,
+                             address_level_offset(dfn_to_daddr(dfn), level),
+                             level,
+                             (hd->platform_ops->page_sizes &
+                              (1UL << level_to_offset_bits(level + 1))
+                              ? PTE_kind_leaf : PTE_kind_table));
+
     spin_unlock(&hd->arch.mapping_lock);
     unmap_vtd_domain_page(page);
 



  parent reply	other threads:[~2022-01-10 16:36 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 16:19 [PATCH v3 00/23] IOMMU: superpage support when not sharing pagetables Jan Beulich
2022-01-10 16:22 ` [PATCH v3 01/23] AMD/IOMMU: have callers specify the target level for page table walks Jan Beulich
2022-01-10 16:22 ` [PATCH v3 02/23] VT-d: " Jan Beulich
2022-01-30  3:17   ` Tian, Kevin
2022-01-31 10:04     ` Jan Beulich
2022-01-10 16:23 ` [PATCH v3 03/23] VT-d: limit page table population in domain_pgd_maddr() Jan Beulich
2022-01-30  3:22   ` Tian, Kevin
2022-01-10 16:25 ` [PATCH v3 04/23] IOMMU: have vendor code announce supported page sizes Jan Beulich
2022-01-10 16:25 ` [PATCH v3 05/23] IOMMU: simplify unmap-on-error in iommu_map() Jan Beulich
2022-01-10 16:27 ` [PATCH v3 06/23] IOMMU: add order parameter to ->{,un}map_page() hooks Jan Beulich
2022-01-10 16:27 ` [PATCH v3 07/23] IOMMU: have iommu_{,un}map() split requests into largest possible chunks Jan Beulich
2022-01-10 16:28 ` [PATCH v3 08/23] IOMMU/x86: restrict IO-APIC mappings for PV Dom0 Jan Beulich
2022-01-10 16:28 ` [PATCH v3 09/23] IOMMU/x86: perform PV Dom0 mappings in batches Jan Beulich
2022-01-10 16:29 ` [PATCH v3 10/23] IOMMU/x86: support freeing of pagetables Jan Beulich
2022-01-10 16:29 ` [PATCH v3 11/23] AMD/IOMMU: drop stray TLB flush Jan Beulich
2022-01-10 16:30 ` [PATCH v3 12/23] AMD/IOMMU: walk trees upon page fault Jan Beulich
2022-01-10 16:30 ` [PATCH v3 13/23] AMD/IOMMU: return old PTE from {set,clear}_iommu_pte_present() Jan Beulich
2022-01-10 16:31 ` [PATCH v3 14/23] AMD/IOMMU: allow use of superpage mappings Jan Beulich
2022-01-10 16:32 ` [PATCH v3 15/23] VT-d: " Jan Beulich
2022-01-30  3:26   ` Tian, Kevin
2022-01-10 16:33 ` [PATCH v3 16/23] IOMMU: fold flush-all hook into "flush one" Jan Beulich
2022-01-30  3:38   ` Tian, Kevin
2022-01-10 16:34 ` [PATCH v3 17/23] IOMMU/x86: prefill newly allocate page tables Jan Beulich
2022-02-18  5:01   ` Tian, Kevin
2022-02-18  8:24     ` Jan Beulich
2022-02-18  8:26       ` Tian, Kevin
2022-01-10 16:35 ` [PATCH v3 18/23] x86: introduce helper for recording degree of contiguity in " Jan Beulich
2022-01-10 16:35 ` [PATCH v3 19/23] AMD/IOMMU: free all-empty " Jan Beulich
2022-01-10 16:36 ` Jan Beulich [this message]
2022-02-18  5:20   ` [PATCH v3 20/23] VT-d: " Tian, Kevin
2022-02-18  8:31     ` Jan Beulich
2022-03-14  4:01       ` Tian, Kevin
2022-03-14  7:33         ` Jan Beulich
2022-03-17  5:55           ` Tian, Kevin
2022-03-17  8:55             ` Jan Beulich
2022-01-10 16:37 ` [PATCH v3 21/23] AMD/IOMMU: replace all-contiguous page tables by superpage mappings Jan Beulich
2022-01-10 16:38 ` [PATCH v3 22/23] VT-d: " Jan Beulich
2022-02-18  5:22   ` Tian, Kevin
2022-01-10 16:38 ` [PATCH v3 23/23] IOMMU/x86: add perf counters for page table splitting / coalescing Jan Beulich
2022-02-18  5:23   ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=807a48fe-3829-d976-75dc-1767d32fb0f4@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=paul@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.