All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <paul@xen.org>, Wei Liu <wl@xen.org>
Subject: Re: [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches
Date: Wed, 4 May 2022 13:20:00 +0200	[thread overview]
Message-ID: <YnJhYJSZ2IXxPEyz@Air-de-Roger> (raw)
In-Reply-To: <73aabaf9-4509-53f9-8af4-354fd3d13fb3@suse.com>

On Wed, May 04, 2022 at 11:46:37AM +0200, Jan Beulich wrote:
> On 03.05.2022 16:49, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:34:59AM +0200, Jan Beulich wrote:
> >> For large page mappings to be easily usable (i.e. in particular without
> >> un-shattering of smaller page mappings) and for mapping operations to
> >> then also be more efficient, pass batches of Dom0 memory to iommu_map().
> >> In dom0_construct_pv() and its helpers (covering strict mode) this
> >> additionally requires establishing the type of those pages (albeit with
> >> zero type references).
> > 
> > I think it's possible I've already asked this.  Would it make sense to
> > add the IOMMU mappings in alloc_domheap_pages(), maybe by passing a
> > specific flag?
> 
> I don't think you did ask, but now that you do: This would look like a
> layering violation to me. I don't think allocation should ever have
> mapping (into the IOMMU or elsewhere) as a "side effect", no matter
> that ...

Hm, I'm certainly not that familiar with PV itself to likely be able
to make a proper argument here.  I fully agree with you for translated
guests using a p2m.

For PV we currently establish/teardown IOMMU mappings in
_get_page_type(), which already looks like a layering violation to me,
hence also doing so in alloc_domheap_pages() wouldn't seem that bad if
it allows to simplify the resulting code overall.

> > It would seem to me that doing it that way would also allow the
> > mappings to get established in blocks for domUs.
> 
> ... then this would perhaps be possible.
> 
> >> The installing of zero-ref writable types has in fact shown (observed
> >> while putting together the change) that despite the intention by the
> >> XSA-288 changes (affecting DomU-s only) for Dom0 a number of
> >> sufficiently ordinary pages (at the very least initrd and P2M ones as
> >> well as pages that are part of the initial allocation but not part of
> >> the initial mapping) still have been starting out as PGT_none, meaning
> >> that they would have gained IOMMU mappings only the first time these
> >> pages would get mapped writably. Consequently an open question is
> >> whether iommu_memory_setup() should set the pages to PGT_writable_page
> >> independent of need_iommu_pt_sync().
> > 
> > I think I'm confused, doesn't the setting of PGT_writable_page happen
> > as a result of need_iommu_pt_sync() and having those pages added to
> > the IOMMU page tables? (so they can be properly tracked and IOMMU
> > mappings are removed if thte page is also removed)
> 
> In principle yes - in guest_physmap_add_page(). But this function isn't
> called for the pages I did enumerate in the remark. XSA-288 really only
> cared about getting this right for DomU-s.

Would it make sense to change guest_physmap_add_page() to be able to
pass the page_order parameter down to iommu_map(), and then use it for
dom0 build instead of introducing iommu_memory_setup()?

I think guest_physmap_add_page() will need to be adjusted at some
point for domUs, and hence it could be unified with dom0 usage
also?

> > If the pages are not added here (because dom0 is not running in strict
> > mode) then setting PGT_writable_page is not required?
> 
> Correct - in that case we skip fiddling with IOMMU mappings on
> transitions to/from PGT_writable_page, and hence putting this type in
> place would be benign (but improve consistency).
> 
> >> Note also that strictly speaking the iommu_iotlb_flush_all() here (as
> >> well as the pre-existing one in arch_iommu_hwdom_init()) shouldn't be
> >> needed: Actual hooking up (AMD) or enabling of translation (VT-d)
> >> occurs only afterwards anyway, so nothing can have made it into TLBs
> >> just yet.
> > 
> > Hm, indeed. I think the one in arch_iommu_hwdom_init can surely go
> > away, as we must strictly do the hwdom init before enabling the iommu
> > itself.
> 
> Why would that be? That's imo as much of an implementation detail as
> ...

Well, you want to have the reserved/inclusive options applied (and
mappings created) before enabling the IOMMU, because at that point
devices have already been assigned.  So it depends more on a
combination of devices assigned & IOMMU enabled rather than just IOMMU
being enabled.

> > The one in dom0 build I'm less convinced, just to be on the safe side
> > if we ever change the order of IOMMU init and memory setup.
> 
> ... this. Just like we populate tables with the IOMMU already enabled
> for DomU-s, I think the same would be valid to do for Dom0.
> 
> > I would expect flushing an empty TLB to not be very expensive?
> 
> I wouldn't "expect" this - it might be this way, but it surely depends
> on whether an implementation can easily tell whether the TLB is empty,
> and whether its emptiness actually makes a difference for the latency
> of a flush operation.
> 
> >> --- a/xen/drivers/passthrough/x86/iommu.c
> >> +++ b/xen/drivers/passthrough/x86/iommu.c
> >> @@ -347,8 +347,8 @@ static unsigned int __hwdom_init hwdom_i
> >>  
> >>  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >>  {
> >> -    unsigned long i, top, max_pfn;
> >> -    unsigned int flush_flags = 0;
> >> +    unsigned long i, top, max_pfn, start, count;
> >> +    unsigned int flush_flags = 0, start_perms = 0;
> >>  
> >>      BUG_ON(!is_hardware_domain(d));
> >>  
> >> @@ -379,9 +379,9 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>       * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid
> >>       * setting up potentially conflicting mappings here.
> >>       */
> >> -    i = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >> +    start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
> >>  
> >> -    for ( ; i < top; i++ )
> >> +    for ( i = start, count = 0; i < top; )
> >>      {
> >>          unsigned long pfn = pdx_to_pfn(i);
> >>          unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);
> >> @@ -390,20 +390,41 @@ void __hwdom_init arch_iommu_hwdom_init(
> >>          if ( !perms )
> >>              rc = 0;
> >>          else if ( paging_mode_translate(d) )
> >> +        {
> >>              rc = p2m_add_identity_entry(d, pfn,
> >>                                          perms & IOMMUF_writable ? p2m_access_rw
> >>                                                                  : p2m_access_r,
> >>                                          0);
> >> +            if ( rc )
> >> +                printk(XENLOG_WARNING
> >> +                       "%pd: identity mapping of %lx failed: %d\n",
> >> +                       d, pfn, rc);
> >> +        }
> >> +        else if ( pfn != start + count || perms != start_perms )
> >> +        {
> >> +        commit:
> >> +            rc = iommu_map(d, _dfn(start), _mfn(start), count, start_perms,
> >> +                           &flush_flags);
> >> +            if ( rc )
> >> +                printk(XENLOG_WARNING
> >> +                       "%pd: IOMMU identity mapping of [%lx,%lx) failed: %d\n",
> >> +                       d, pfn, pfn + count, rc);
> >> +            SWAP(start, pfn);
> >> +            start_perms = perms;
> >> +            count = 1;
> >> +        }
> >>          else
> >> -            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), 1ul << PAGE_ORDER_4K,
> >> -                           perms, &flush_flags);
> >> +        {
> >> +            ++count;
> >> +            rc = 0;
> > 
> > Seeing as we want to process this in blocks now, I wonder whether it
> > would make sense to take a different approach, and use a rangeset to
> > track which regions need to be mapped.  What gets added would be based
> > on the host e820 plus the options
> > iommu_hwdom_{strict,inclusive,reserved}.  We would then punch holes
> > based on the logic in hwdom_iommu_map() and finally we could iterate
> > over the regions afterwards using rangeset_consume_ranges().
> > 
> > Not that you strictly need to do it here, just think the end result
> > would be clearer.
> 
> The end result might indeed be, but it would be more of a change right
> here. Hence I'd prefer to leave that out of the series for now.

OK.  I think it might be nice to add a comment in that regard, mostly
because I tend to forget myself.

Thanks, Roger.


  reply	other threads:[~2022-05-04 11:20 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-25  8:29 [PATCH v4 00/21] IOMMU: superpage support when not sharing pagetables Jan Beulich
2022-04-25  8:30 ` [PATCH v4 01/21] AMD/IOMMU: correct potentially-UB shifts Jan Beulich
2022-04-27 13:08   ` Andrew Cooper
2022-04-27 13:57     ` Jan Beulich
2022-05-03 10:10   ` Roger Pau Monné
2022-05-03 14:34     ` Jan Beulich
2022-04-25  8:32 ` [PATCH v4 02/21] IOMMU: simplify unmap-on-error in iommu_map() Jan Beulich
2022-04-27 13:16   ` Andrew Cooper
2022-04-27 14:05     ` Jan Beulich
2022-05-03 10:25   ` Roger Pau Monné
2022-05-03 14:37     ` Jan Beulich
2022-05-03 16:22       ` Roger Pau Monné
2022-04-25  8:32 ` [PATCH v4 03/21] IOMMU: add order parameter to ->{,un}map_page() hooks Jan Beulich
2022-04-25  8:33 ` [PATCH v4 04/21] IOMMU: have iommu_{,un}map() split requests into largest possible chunks Jan Beulich
2022-05-03 12:37   ` Roger Pau Monné
2022-05-03 14:44     ` Jan Beulich
2022-05-04 10:20       ` Roger Pau Monné
2022-04-25  8:34 ` [PATCH v4 05/21] IOMMU/x86: restrict IO-APIC mappings for PV Dom0 Jan Beulich
2022-05-03 13:00   ` Roger Pau Monné
2022-05-03 14:50     ` Jan Beulich
2022-05-04  9:32       ` Jan Beulich
2022-05-04 10:30         ` Roger Pau Monné
2022-05-04 10:51           ` Jan Beulich
2022-05-04 12:01             ` Roger Pau Monné
2022-05-04 12:12               ` Jan Beulich
2022-05-04 13:00                 ` Roger Pau Monné
2022-05-04 13:19                   ` Jan Beulich
2022-05-04 13:46                     ` Roger Pau Monné
2022-05-04 13:55                       ` Jan Beulich
2022-05-04 15:22                         ` Roger Pau Monné
2022-04-25  8:34 ` [PATCH v4 06/21] IOMMU/x86: perform PV Dom0 mappings in batches Jan Beulich
2022-05-03 14:49   ` Roger Pau Monné
2022-05-04  9:46     ` Jan Beulich
2022-05-04 11:20       ` Roger Pau Monné [this message]
2022-05-04 12:27         ` Jan Beulich
2022-05-04 13:55           ` Roger Pau Monné
2022-05-04 14:26             ` Jan Beulich
2022-04-25  8:35 ` [PATCH v4 07/21] IOMMU/x86: support freeing of pagetables Jan Beulich
2022-05-03 16:20   ` Roger Pau Monné
2022-05-04 13:07     ` Jan Beulich
2022-05-04 15:06       ` Roger Pau Monné
2022-05-05  8:20         ` Jan Beulich
2022-05-05  9:57           ` Roger Pau Monné
2022-04-25  8:36 ` [PATCH v4 08/21] AMD/IOMMU: walk trees upon page fault Jan Beulich
2022-05-04 15:57   ` Roger Pau Monné
2022-04-25  8:37 ` [PATCH v4 09/21] AMD/IOMMU: return old PTE from {set,clear}_iommu_pte_present() Jan Beulich
2022-04-25  8:38 ` [PATCH v4 10/21] AMD/IOMMU: allow use of superpage mappings Jan Beulich
2022-05-05 13:19   ` Roger Pau Monné
2022-05-05 14:34     ` Jan Beulich
2022-05-05 15:26       ` Roger Pau Monné
2022-04-25  8:38 ` [PATCH v4 11/21] VT-d: " Jan Beulich
2022-05-05 16:20   ` Roger Pau Monné
2022-05-06  6:13     ` Jan Beulich
2022-04-25  8:40 ` [PATCH v4 12/21] IOMMU: fold flush-all hook into "flush one" Jan Beulich
2022-05-06  8:38   ` Roger Pau Monné
2022-05-06  9:59     ` Jan Beulich
2022-04-25  8:40 ` [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables Jan Beulich
2022-05-06 11:16   ` Roger Pau Monné
2022-05-19 12:12     ` Jan Beulich
2022-05-20 10:47       ` Roger Pau Monné
2022-05-20 11:11         ` Jan Beulich
2022-05-20 11:13           ` Jan Beulich
2022-05-20 12:22             ` Roger Pau Monné
2022-05-20 12:36               ` Jan Beulich
2022-05-20 14:28                 ` Roger Pau Monné
2022-05-20 14:38                   ` Roger Pau Monné
2022-05-23  6:49                     ` Jan Beulich
2022-05-23  9:10                       ` Roger Pau Monné
2022-05-23 10:52                         ` Jan Beulich
2022-04-25  8:41 ` [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in " Jan Beulich
2022-05-06 13:25   ` Roger Pau Monné
2022-05-18 10:06     ` Jan Beulich
2022-05-20 10:22       ` Roger Pau Monné
2022-05-20 10:59         ` Jan Beulich
2022-05-20 11:27           ` Roger Pau Monné
2022-04-25  8:42 ` [PATCH v4 15/21] AMD/IOMMU: free all-empty " Jan Beulich
2022-05-10 13:30   ` Roger Pau Monné
2022-05-18 10:18     ` Jan Beulich
2022-04-25  8:42 ` [PATCH v4 16/21] VT-d: " Jan Beulich
2022-04-27  4:09   ` Tian, Kevin
2022-05-10 14:30   ` Roger Pau Monné
2022-05-18 10:26     ` Jan Beulich
2022-05-20  0:38       ` Tian, Kevin
2022-05-20 11:13       ` Roger Pau Monné
2022-05-27  7:40         ` Jan Beulich
2022-05-27  7:53           ` Jan Beulich
2022-05-27  9:21             ` Roger Pau Monné
2022-04-25  8:43 ` [PATCH v4 17/21] AMD/IOMMU: replace all-contiguous page tables by superpage mappings Jan Beulich
2022-05-10 15:31   ` Roger Pau Monné
2022-05-18 10:40     ` Jan Beulich
2022-05-20 10:35       ` Roger Pau Monné
2022-04-25  8:43 ` [PATCH v4 18/21] VT-d: " Jan Beulich
2022-05-11 11:08   ` Roger Pau Monné
2022-05-18 10:44     ` Jan Beulich
2022-05-20 10:38       ` Roger Pau Monné
2022-04-25  8:44 ` [PATCH v4 19/21] IOMMU/x86: add perf counters for page table splitting / coalescing Jan Beulich
2022-05-11 13:48   ` Roger Pau Monné
2022-05-18 11:39     ` Jan Beulich
2022-05-20 10:41       ` Roger Pau Monné
2022-04-25  8:44 ` [PATCH v4 20/21] VT-d: fold iommu_flush_iotlb{,_pages}() Jan Beulich
2022-04-27  4:12   ` Tian, Kevin
2022-05-11 13:50   ` Roger Pau Monné
2022-04-25  8:45 ` [PATCH v4 21/21] VT-d: fold dma_pte_clear_one() into its only caller Jan Beulich
2022-04-27  4:13   ` Tian, Kevin
2022-05-11 13:57   ` Roger Pau Monné
2022-05-18 12:50 ` [PATCH v4 00/21] IOMMU: superpage support when not sharing pagetables Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnJhYJSZ2IXxPEyz@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=paul@xen.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.