All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Durrant, Paul" <pdurrant@amazon.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Wei Liu" <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain
Date: Wed, 27 Nov 2019 15:31:31 +0000	[thread overview]
Message-ID: <645bec84b69546089f2a018ac9254e2b@EX13D32EUC003.ant.amazon.com> (raw)
In-Reply-To: <daac5903-cc73-15e3-5d5a-2a1d9e147301@suse.com>

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan
> Beulich
> Sent: 20 November 2019 13:52
> To: Durrant, Paul <pdurrant@amazon.com>
> Cc: xen-devel@lists.xenproject.org; Kevin Tian <kevin.tian@intel.com>;
> Roger Pau Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; Andrew
> Cooper <andrew.cooper3@citrix.com>
> Subject: Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 20.11.2019 13:08, Paul Durrant wrote:
> > This patch introduces a new iommu_op to facilitate a per-implementation
> > quarantine set up, and then further code for x86 implementations
> > (amd and vtd) to set up a read/wrote scratch page to serve as the
> source/
> > target for all DMA whilst a device is assigned to dom_io.
> 
> A single page in the system won't do, I'm afraid. If one guest's
> (prior) device is retrying reads with data containing secrets of that
> guest, another guest's (prior) device could end up writing this data
> to e.g. storage where after a guest restart it is then available to
> the wrong guest.
> 

True. I was unsure whether this was a concern in the scenarios we had to deal with but I'm informed it is, and in the general case it is too.

> Also nit: s/wrote/write/ .
> 

Yep. Will fix.

> > The reason for doing this is that some hardware may continue to re-try
> > DMA, despite FLR, in the event of an error. Having a scratch page mapped
> > will allow pending DMA to drain and thus quiesce such buggy hardware.
> 
> Without a "sink" page mapped, this would result in IOMMU faults aiui.
> What's the problem with having these faults surface and get handled,
> eventually leading to the device getting bus-mastering disabled? Is
> it that devices continue DMAing even when bus-mastering is off? If
> so, is it even safe to pass through any such device? In any event
> the description needs to be extended here.
> 

The devices in question ignore both FLR and BME and some IOMMU faults are fatal. I believe, however, write faults are not and so I think a single read-only 'source' page will be sufficient.

> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> What about Arm? Can devices which Arm allows to assign to guests
> also "babble" like this after de-assignment? If not, this should be
> said in the description. If so, obviously that side would also want
> fixing.
> 
> > --- a/xen/drivers/passthrough/amd/iommu_map.c
> > +++ b/xen/drivers/passthrough/amd/iommu_map.c
> > @@ -560,6 +560,63 @@ int amd_iommu_reserve_domain_unity_map(struct
> domain *domain,
> >      return rt;
> >  }
> >
> > +int amd_iommu_quarantine_init(struct domain *d)
> 
> __init
> 

Ok.

> > +{
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +    unsigned int level;
> > +    struct amd_iommu_pte *table;
> > +
> > +    if ( hd->arch.root_table )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return 0;
> > +    }
> > +
> > +    spin_lock(&hd->arch.mapping_lock);
> > +
> > +    level = hd->arch.paging_mode;
> 
> With DomIO being PV in principle, this is going to be the
> fixed value PV domains get set, merely depending on RAM size at
> boot time (i.e. not accounting for memory hotplug). This could
> be easily too little for HVM guests, which are free to extend
> their GFN (and hence DFN) space. Therefore I think you need to
> set the maximum possible level here.
>

Ok. I'd not considered memory hotplug. I'll use a static maximum value instead, as VT-d does in general.

> > +    hd->arch.root_table = alloc_amd_iommu_pgtable();
> > +    if ( !hd->arch.root_table )
> > +        goto out;
> > +
> > +    table = __map_domain_page(hd->arch.root_table);
> > +    while ( level )
> > +    {
> > +        struct page_info *pg;
> > +        unsigned int i;
> > +
> > +        /*
> > +         * The pgtable allocator is fine for the leaf page, as well as
> > +         * page table pages.
> > +         */
> > +        pg = alloc_amd_iommu_pgtable();
> > +        if ( !pg )
> > +            break;
> > +
> > +        for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
> > +        {
> > +            struct amd_iommu_pte *pde = &table[i];
> > +
> > +            set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level -
> 1,
> > +                                  true, true);
> 
> This would also benefit from a comment indicating that it's fine
> for the leaf level as well, despite the "pde" in the name (and
> its sibling set_iommu_pte_present() actually existing). Of course
> you could as well extend the comment a few lines up.

The AMD IOMMU conflates PDE and PTE all over the place but I'll add a comment here to that effect.

> 
> What you do need to do though is pre-fill the leaf page ...
> 
> > +        }
> > +
> > +        unmap_domain_page(table);
> > +        table = __map_domain_page(pg);
> > +        level--;
> > +    }
> 
> ... here, such that possible left over secrets can't end up
> getting written to e.g. persistent storage or over a network.
> 

That's actually one reason for using the pgtable allocator... It already does that.

> > @@ -2683,9 +2671,68 @@ static void vtd_dump_p2m_table(struct domain *d)
> >      vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd-
> >arch.agaw), 0, 0);
> >  }
> >
> > +static int intel_iommu_quarantine_init(struct domain *d)
> 
> __init again.
> 

Ok.

> > +{
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +    struct dma_pte *parent;
> > +    unsigned int level = agaw_to_level(hd->arch.agaw);
> 
> Other than for AMD this is not a problem here, as all domains
> (currently) get the same AGAW. I wonder though whether precautions
> would be possible here against the "normal" domain setting getting
> adjusted without recalling the need to come back here.
> 

I could just go for a hardcoded width value here too.

> > +    int rc;
> > +
> > +    if ( hd->arch.pgd_maddr )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return 0;
> > +    }
> > +
> > +    spin_lock(&hd->arch.mapping_lock);
> > +
> > +    hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
> > +    if ( !hd->arch.pgd_maddr )
> > +        goto out;
> > +
> > +    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
> 
> Unnecessary cast; funnily enough you don't have one further down.
> 

Sorry. Cut'n'paste.

> > +    while ( level )
> > +    {
> > +        uint64_t maddr;
> > +        unsigned int offset;
> > +
> > +        /*
> > +         * The pgtable allocator is fine for the leaf page, as well as
> > +         * page table pages.
> > +         */
> > +        maddr = alloc_pgtable_maddr(1, hd->node);
> > +        if ( !maddr )
> > +            break;
> > +
> > +        for ( offset = 0; offset < PTE_NUM; offset++ )
> > +        {
> > +            struct dma_pte *pte = &parent[offset];
> > +
> > +            dma_set_pte_addr(*pte, maddr);
> > +            dma_set_pte_readable(*pte);
> > +            dma_set_pte_writable(*pte);
> > +        }
> > +        iommu_flush_cache_page(parent, 1);
> > +
> > +        unmap_vtd_domain_page(parent);
> > +        parent = map_vtd_domain_page(maddr);
> > +        level--;
> > +    }
> 
> The leaf page wants scrubbing here as well.
> 

Already done by alloc_pgtable_maddr(). I'll note in the comment in this hunk and the AMD one that the pages are zeroed.

  Paul

> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-11-27 15:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 12:08 [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain Paul Durrant
2019-11-20 13:51 ` Jan Beulich
2019-11-27 15:31   ` Durrant, Paul [this message]
2019-11-25  8:21 ` Tian, Kevin
2019-11-27 15:18   ` Durrant, Paul
2019-11-27 15:26     ` Jan Beulich
2019-11-27 15:32       ` Durrant, Paul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=645bec84b69546089f2a018ac9254e2b@EX13D32EUC003.ant.amazon.com \
    --to=pdurrant@amazon.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=kevin.tian@intel.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.