Re: [PATCH 2/3] iommu: Add Allwinner H6 IOMMU driver

From: Maxime Ripard <maxime@cerno.tech>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	devicetree@vger.kernel.org, Joerg Roedel <joro@8bytes.org>,
	iommu@lists.linux-foundation.org, Chen-Yu Tsai <wens@csie.org>,
	Rob Herring <robh+dt@kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 2/3] iommu: Add Allwinner H6 IOMMU driver
Date: Tue, 11 Feb 2020 13:39:01 +0100	[thread overview]
Message-ID: <20200211123901.nm6ajh5sxqurim6w@gilmour.lan> (raw)
In-Reply-To: <99e9cb73-761f-3b30-bd73-c50aa7c21692@arm.com>

[-- Attachment #1.1: Type: text/plain, Size: 8451 bytes --]

Hi Robin,

On Mon, Jan 27, 2020 at 07:01:02PM +0000, Robin Murphy wrote:
> > > > +static void sun50i_iommu_domain_free(struct iommu_domain *domain)
> > > > +{
> > > > +	struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain);
> > > > +	struct sun50i_iommu *iommu = sun50i_domain->iommu;
> > > > +	unsigned long flags;
> > > > +	int i;
> > > > +
> > > > +	spin_lock_irqsave(&sun50i_domain->dt_lock, flags);
> > > > +
> > > > +	for (i = 0; i < NUM_DT_ENTRIES; i++) {
> > > > +		phys_addr_t pt_phys;
> > > > +		u32 *page_table;
> > > > +		u32 *dte_addr;
> > > > +		u32 dte;
> > > > +
> > > > +		dte_addr = &sun50i_domain->dt[i];
> > > > +		dte = *dte_addr;
> > > > +		if (!sun50i_dte_is_pt_valid(dte))
> > > > +			continue;
> > > > +
> > > > +		memset(dte_addr, 0, sizeof(*dte_addr));
> > > > +		sun50i_table_flush(sun50i_domain, virt_to_phys(dte_addr), 1);
> > >
> > > This shouldn't be necessary - freeing a domain while it's still live is an
> > > incredibly very wrong thing to do, so the hardware should have already been
> > > programmed to no longer walk this table by this point.
> >
> > We never "garbage collect" and remove the dte for the page table we
> > don't use anymore elsewhere though, so couldn't we end up in a
> > situation where we don't have a page table (because it has been freed)
> > at the other end of our dte, but the IOMMU doesn't know about it since
> > we never flushed?
>
> Let me reiterate: at the point of freeing, the assumption is that this
> domain should be long dissociated from the hardware. Any actual invalidation
> should have already happened at the point the TTB was disabled or pointed to
> some other domain, therefore fiddling with pagetable pages just before you
> free them back to the kernel is just pointless busywork.
>
> If the TTB *was* still live here, then as soon as you call free_pages()
> below the DT memory could get reallocated by someone else and filled with
> data that happens to look like valid pagetables, so even if you invalidate
> the TLBs the hardware could just immediately go walk that data and refill
> them with nonsense, thus any pretence at invalidation is in vain anyway.

Thanks, that makes a lot of sense.

> The fly in the soup, however, is that default domains appear to be lacking
> proper detach notifications (I hadn't consciously noticed that before), so
> if you've been looking at the iommu_group_release() path it might have given
> the wrong impression. So what might be justifiable here is to check if the
> domain being freed is the one currently active in hardware, and if so
> perform a detach (i.e. disable the TTB and invalidate everything) first,
> then free everything as normal. Or just handwave that we essentially never
> free a default domain anyway so it's OK to just assume that we're not
> freeing anything live.

I'm still a bit unsure as of what to do exactly here. I haven't found
a hook that would be called when a given domain doesn't have any
devices attached to it. Or did you mean that I should just create a
separate function, not part of the IOMMU ops?

> > > > +
> > > > +	if (iommu->domain == domain)
> > > > +		return 0;
> > > > +
> > > > +	if (iommu->domain)
> > > > +		sun50i_iommu_detach_device(iommu->domain, dev);
> > > > +
> > > > +	iommu->domain = domain;
> > > > +	sun50i_domain->iommu = iommu;
> > > > +
> > > > +	return pm_runtime_get_sync(iommu->dev);
> > >
> > > Deferring all the actual hardware pogramming to the suspend/resume hooks is
> > > a fiendishly clever idea that took me more than a moment to make sense of,
> > > but how well does it work when RPM is compiled out or runtime-inhibited?
> >
> > We have a bunch of other controllers that require runtime_pm already,
> > so it's going to be enabled. But that should be expressed in Kconfig.
>
> But it can still be inhibited from sysfs, right? We don't want driver
> behaviour to be *unnecessarily* fragile to user actions, however silly they
> may be.

That's a good point :)

> > > Furthermore, basing RPM on having a domain attached means that
> > > you'll effectively never turn the IOMMU off, even when all the
> > > clients are idle. It would make more sene to use device links like
> > > most other drivers do to properly model the producer/consumer
> > > relationship.
> >
> > I'm not familiar with device links for runtime_pm, I thought this was
> > only useful for system-wide resume and suspend?
>
> See DL_FLAG_PM_RUNTIME - we already have several IOMMU drivers taking full
> advantage of this.

I'll look into it, thanks!

> > > > +static int __maybe_unused sun50i_iommu_resume(struct device *dev)
> > > > +{
> > > > +	struct sun50i_iommu_domain *sun50i_domain;
> > > > +	struct sun50i_iommu *iommu;
> > > > +	unsigned long flags;
> > > > +	dma_addr_t dt_dma;
> > > > +	int ret;
> > > > +
> > > > +	iommu = dev_get_drvdata(dev);
> > > > +	if (!iommu->domain)
> > > > +		return 0;
> > > > +
> > > > +	sun50i_domain = to_sun50i_domain(iommu->domain);
> > > > +	dt_dma = dma_map_single(dev, sun50i_domain->dt, DT_SIZE, DMA_TO_DEVICE);
> > >
> > > As above. The power state of the IOMMU should be enitrely irrelevant to the
> > > contents of RAM.
> >
> > Sorry, I should have put a comment here.
> >
> > I'm not quite sure what the difference between a group and domain in
> > the IOMMU framework is, but since this IOMMU can only deal with a
> > single address space, my understanding was that we'd need to allocate
> > a single domain and group, and that the domain was the abstraction
> > tied to an address space (since it's what is passed as an argument to
> > map).
>
> That's correct, a domain is effectvely an address space, while groups
> represents sets of devices that the IOMMU can isolate from each other.
> IOMMUs like this one (and the MediaTek M4U in mtk_iommu.c) are a little
> hard-done-by in that they do actually have some finer-grained isolation on a
> basic allow/deny level, but the API really assumes that isolation happens at
> the address space level, so it's easier to ignore it and just use the
> single-group model anyway.
>
> The really neat advantage of having a guaranteed single group, though, is
> that you then no longer need to care about address spaces: since the group
> can only ever be attached to one domain at a time, you can have as many
> domains as you like, and handle it by having the first attach_dev call on a
> given domain context-switch that pagetable into the hardware. That's more or
> less what you've done already, which is good, it would just benefit from
> that context-switching being done in a more robust and obvious manner :)

Got it, thanks :)

> > So, given this, what made since was to allocate the directory table
> > buffer at domain_alloc time and map it. But then, domain_alloc seems
> > to not have any pointer back to the iommu we registered for some
> > reason (I guess that a domain could be shared across multiple
> > IOMMUs?), and so we don't have access to our IOMMU's struct device.
>
> I'll spare you the unrpoductive "Robin complains bitterly about the
> iommu_domain_alloc() interface being terrible, episode #27"...
>
> You'll see two main ways that existing drivers work around that - if you're
> happy to assume that you'll only ever have one IOMMU instance, or that all
> instances will always be functionally equal, then you can simply keep track
> of any old IOMMU device handle for DMA purposes (e.g. exynos_iommu);
> otherwise if you might need to cope with multiple IOMMU instances having
> different DMA capabilities then deferring instance-specific setup to the
> first device attach is the de-facto standard (e.g. arm-smmu).

I don't have any idea on how it's going to evolve, and the latter
seems cleaner, I'll work on that.

> > Also, a more general question. One of the cleanups I wanted to do was
> > to remove the kmem_cache in favour of a dma_pool, which triggered that
> > test. It looks like with a dma_pool, the physical address and dma
> > address are not the same, even though the IOMMU is directly connected
> > to the RAM so there should be no intermediate mapping. Do you know
> > why?
>
> DMA pools are backed by dma_alloc_coherent, so (at least on arm64) the
> virtual address you get will be a non-cacheable remap (assuming a
> non-coherent device), and thus calling virt_to_phys() on it is bogus and
> will give you nonsense.

Going further off-topic, why do we need a remap instead of a regular
physical address?

Thanks!
Maxime

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel