All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-devel@nongnu.org, tianyu.lan@intel.com,
	kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com,
	jasowang@redhat.com, alex.williamson@redhat.com,
	bd.aviv@gmail.com
Subject: Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region
Date: Tue, 20 Dec 2016 12:16:50 +0800	[thread overview]
Message-ID: <20161220041650.GA22006@pxdev.xzpeter.org> (raw)
In-Reply-To: <20161219233012.GF23176@umbus.fritz.box>

On Tue, Dec 20, 2016 at 10:30:12AM +1100, David Gibson wrote:

[...]

> > +static void vtd_switch_address_space(IntelIOMMUState *s, bool enabled)
> > +{
> > +    GHashTableIter iter;
> > +    VTDBus *vtd_bus;
> > +    VTDAddressSpace *as;
> > +    int i;
> > +
> > +    g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> > +    while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
> > +        for (i = 0; i < X86_IOMMU_PCI_DEVFN_MAX; i++) {
> > +            as = vtd_bus->dev_as[i];
> > +            if (as == NULL) {
> > +                continue;
> > +            }
> > +            trace_vtd_switch_address_space(pci_bus_num(vtd_bus->bus),
> > +                                           VTD_PCI_SLOT(i), VTD_PCI_FUNC(i),
> > +                                           enabled);
> > +            if (enabled) {
> > +                memory_region_add_subregion_overlap(&as->root, 0,
> > +                                                    &as->iommu, 2);
> > +            } else {
> > +                memory_region_del_subregion(&as->root, &as->iommu);
> 
> Why not use memory_region_set_enabled() rather than actually
> adding/deleting the subregion?

Good idea, thanks. :-)

[...]

> > @@ -2343,15 +2378,47 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
> >          vtd_dev_as->devfn = (uint8_t)devfn;
> >          vtd_dev_as->iommu_state = s;
> >          vtd_dev_as->context_cache_entry.context_cache_gen = 0;
> > +
> > +        /*
> > +         * When DMAR is disabled, memory region relationships looks
> > +         * like:
> > +         *
> > +         * 0000000000000000-ffffffffffffffff (prio 0, RW): vtd_root
> > +         *  0000000000000000-ffffffffffffffff (prio 1, RW): vtd_sys_alias
> > +         *  00000000fee00000-00000000feefffff (prio 64, RW): intel_iommu_ir
> > +         *
> > +         * When DMAR is disabled, it becomes:
> > +         *
> > +         * 0000000000000000-ffffffffffffffff (prio 0, RW): vtd_root
> > +         *  0000000000000000-ffffffffffffffff (prio 2, RW): intel_iommu
> > +         *  0000000000000000-ffffffffffffffff (prio 1, RW): vtd_sys_alias
> > +         *  00000000fee00000-00000000feefffff (prio 64, RW): intel_iommu_ir
> > +         *
> > +         * The intel_iommu region is dynamically added/removed.
> > +         */
> >          memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
> >                                   &s->iommu_ops, "intel_iommu", UINT64_MAX);
> 
> I'm almost certain UINT64_MAX is wrong here.  For one thing it would
> collide with PCI BARs.  For another, I can't imagine that the IOMMU
> page tables can really span an entire 2^64 space.

Could you explain why here device address space has things to do with
PCI BARs? I thought BARs are for CPU address space only (so that CPU
can access PCI registers via MMIO manner), am I wrong?

I think we should have a big enough IOMMU region size here. If device
writes to invalid addresses, IMHO we should trap it and report to
guest. If we have a smaller size than UINT64_MAX, how we can trap this
behavior and report for the whole address space (it should cover [0,
2^64-1])?

> 
> > +        memory_region_init_alias(&vtd_dev_as->sys_alias, OBJECT(s),
> > +                                 "vtd_sys_alias", get_system_memory(),
> > +                                 0, memory_region_size(get_system_memory()));
> 
> I strongly suspect using memory_region_size(get_system_memory()) is
> also incorrect here.  System memory has size UINT64_MAX, but I'll bet
> you you can't actually access all of that via PCI space (again, it
> would collide with actual PCI BARs).  I also suspect you can't reach
> CPU MMIO regions via the PCI DMA space.

Hmm, sounds correct.

However if so we will have the same problem if without IOMMU? See
pci_device_iommu_address_space() - address_space_memory will be the
default if we have no IOMMU protection, and that will cover e.g. CPU
MMIO regions as well.

> 
> So, I think you should find out what this limit actually is and
> restrict the alias to that window.

/me needs some more reading to figure this out. Still not quite
familiar with the whole VM memory regions. Hints are welcomed...

> 
> >          memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
> >                                &vtd_mem_ir_ops, s, "intel_iommu_ir",
> >                                VTD_INTERRUPT_ADDR_SIZE);
> > -        memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST,
> > -                                    &vtd_dev_as->iommu_ir);
> > -        address_space_init(&vtd_dev_as->as,
> > -                           &vtd_dev_as->iommu, "intel_iommu");
> > +        memory_region_init(&vtd_dev_as->root, OBJECT(s),
> > +                           "vtd_root", UINT64_MAX);
> > +        memory_region_add_subregion_overlap(&vtd_dev_as->root,
> > +                                            VTD_INTERRUPT_ADDR_FIRST,
> > +                                            &vtd_dev_as->iommu_ir, 64);
> > +        address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, name);
> > +        memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
> > +                                            &vtd_dev_as->sys_alias, 1);
> > +        if (s->dmar_enabled) {
> > +            memory_region_add_subregion_overlap(&vtd_dev_as->root, 0,
> > +                                                &vtd_dev_as->iommu, 2);
> > +        }
> 
> Hmm.  You have the IOMMU translated region overlaying the
> direct-mapped alias.  You enable and disable the IOMMU subregion, but
> you always leave the direct mapping enabled.  You might get away with
> this because the size of the IOMMU region is UINT64_MAX, which will
> overlay everything - but as above, I think that's wrong.  If that's
> changed then guest devices may be able to access portions of the raw
> address space outside the IOMMU mapped region, which could break the
> guest's expectations of device isolation.
> 
> I think it would be much safer to disable the system memory alias when
> the IOMMU is enabled.

Reasonable. Will adopt.

Thanks!

-- peterx

  reply	other threads:[~2016-12-20  4:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-19 14:41 [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2016-12-19 16:56 ` Alex Williamson
2016-12-20  3:44   ` Peter Xu
2016-12-20  4:52     ` Alex Williamson
2016-12-20  6:38       ` Peter Xu
2016-12-21  0:04         ` Alex Williamson
2016-12-21  3:19           ` Peter Xu
2016-12-21  3:49           ` David Gibson
2016-12-21  3:30       ` David Gibson
2016-12-19 23:30 ` David Gibson
2016-12-20  4:16   ` Peter Xu [this message]
2016-12-21  2:53     ` David Gibson
2016-12-21 10:05       ` Peter Xu
2016-12-21 22:56         ` David Gibson
2016-12-20 23:02 ` no-reply
2016-12-21  3:33   ` Peter Xu
2016-12-20 23:57 ` no-reply
2016-12-21  3:39   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161220041650.GA22006@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.