iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, Marc Zyngier <maz@kernel.org>,
	Will Deacon <will@kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Sudeep Holla <sudeep.holla@arm.com>,
	"Guohanjun (Hanjun Guo)" <guohanjun@huawei.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Linuxarm <linuxarm@huawei.com>,
	iommu <iommu@lists.linux-foundation.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: arm64 iommu groups issue
Date: Thu, 19 Sep 2019 14:25:38 +0100	[thread overview]
Message-ID: <4768c541-ebf4-61d5-0c5e-77dee83f8f94@arm.com> (raw)
In-Reply-To: <9625faf4-48ef-2dd3-d82f-931d9cf26976@huawei.com>

Hi John,

On 19/09/2019 09:43, John Garry wrote:
> Hi all,
> 
> We have noticed a special behaviour on our arm64 D05 board when the SMMU 
> is enabled with regards PCI device iommu groups.
> 
> This platform does not support ACS, yet we find that all functions for a 
> PCI device are not grouped together:
> 
> root@ubuntu:/sys# dmesg | grep "Adding to iommu group"
> [    7.307539] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [   12.590533] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [   13.688527] mlx5_core 000a:11:00.0: Adding to iommu group 2
> [   14.324606] mlx5_core 000a:11:00.1: Adding to iommu group 3
> [   14.937090] ehci-platform PNP0D20:00: Adding to iommu group 4
> [   15.276637] pcieport 0002:f8:00.0: Adding to iommu group 5
> [   15.340845] pcieport 0004:88:00.0: Adding to iommu group 6
> [   15.392098] pcieport 0005:78:00.0: Adding to iommu group 7
> [   15.443356] pcieport 000a:10:00.0: Adding to iommu group 8
> [   15.484975] pcieport 000c:20:00.0: Adding to iommu group 9
> [   15.543647] pcieport 000d:30:00.0: Adding to iommu group 10
> [   15.599771] serial 0002:f9:00.0: Adding to iommu group 5
> [   15.690807] serial 0002:f9:00.1: Adding to iommu group 5
> [   84.322097] mlx5_core 000a:11:00.2: Adding to iommu group 8
> [   84.856408] mlx5_core 000a:11:00.3: Adding to iommu group 8
> 
> root@ubuntu:/sys#  lspci -tv
> lspci -tvv
> -+-[000d:30]---00.0-[31]--
>    +-[000c:20]---00.0-[21]----00.0  Huawei Technologies Co., Ltd.
>    +-[000a:10]---00.0-[11-12]--+-00.0  Mellanox [ConnectX-5]
>    |                           +-00.1  Mellanox [ConnectX-5]
>    |                           +-00.2  Mellanox [ConnectX-5 VF]
>    |                           \-00.3  Mellanox [ConnectX-5 VF]
>    +-[0007:90]---00.0-[91]----00.0  Huawei Technologies Co., ...
>    +-[0006:c0]---00.0-[c1]--
>    +-[0005:78]---00.0-[79]--
>    +-[0004:88]---00.0-[89]--
>    +-[0002:f8]---00.0-[f9]--+-00.0  MosChip Semiconductor Technology ...
>    |                        +-00.1  MosChip Semiconductor Technology ...
>    |                        \-00.2  MosChip Semiconductor Technology ...
>    \-[0000:00]-
> 
> For the PCI devices in question - on port 000a:10:00.0 - you will notice 
> that the port and VFs (000a:11:00.2, 3) are in one group, yet the 2 PFs 
> (000a:11:00.0, 000a:11:00.1) are in separate groups.
> 
> I also notice the same ordering nature on our D06 platform - the 
> pcieport is added to an iommu group after PF for that port. However this 
> platform supports ACS, so not such a problem.
> 
> After some checking, I find that when the pcieport driver probes, the 
> associated SMMU device had not registered yet with the IOMMU framework, 
> so we defer the probe for this device - in iort.c:iort_iommu_xlate(), 
> when no iommu ops are available, we defer.
> 
> Yet, when the mlx5 PF devices probe, the iommu ops are available at this 
> stage. So the probe continues and we get an iommu group for the device - 
> but not the same group as the parent port, as it has not yet been added 
> to a group. When the port eventually probes it gets a new, separate group.
> 
> This all seems to be as the built-in module init ordering is as follows: 
> pcieport drv, smmu drv, mlx5 drv
> 
> I notice that if I build the mlx5 drv as a ko and insert after boot, all 
> functions + pcieport are in the same group:
> 
> [   11.530046] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [   17.301093] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [   18.743600] ehci-platform PNP0D20:00: Adding to iommu group 2
> [   20.212284] pcieport 0002:f8:00.0: Adding to iommu group 3
> [   20.356303] pcieport 0004:88:00.0: Adding to iommu group 4
> [   20.493337] pcieport 0005:78:00.0: Adding to iommu group 5
> [   20.702999] pcieport 000a:10:00.0: Adding to iommu group 6
> [   20.859183] pcieport 000c:20:00.0: Adding to iommu group 7
> [   20.996140] pcieport 000d:30:00.0: Adding to iommu group 8
> [   21.152637] serial 0002:f9:00.0: Adding to iommu group 3
> [   21.346991] serial 0002:f9:00.1: Adding to iommu group 3
> [  100.754306] mlx5_core 000a:11:00.0: Adding to iommu group 6
> [  101.420156] mlx5_core 000a:11:00.1: Adding to iommu group 6
> [  292.481714] mlx5_core 000a:11:00.2: Adding to iommu group 6
> [  293.281061] mlx5_core 000a:11:00.3: Adding to iommu group 6
> 
> This does seem like a problem for arm64 platforms which don't support 
> ACS, yet enable an SMMU. Maybe also a problem even if they do support ACS.
> 
> Opinion?

Yeah, this is less than ideal. One way to bodge it might be to make 
pci_device_group() also walk downwards to see if any non-ACS-isolated 
children already have a group, rather than assuming that groups get 
allocated in hierarchical order, but that's far from ideal.

The underlying issue is that, for historical reasons, OF/IORT-based 
IOMMU drivers have ended up with group allocation being tied to endpoint 
driver probing via the dma_configure() mechanism (long story short, 
driver probe is the only thing which can be delayed in order to wait for 
a specific IOMMU instance to be ready). However, in the meantime, the 
IOMMU API internals have evolved sufficiently that I think there's a way 
to really put things right - I have the spark of an idea which I'll try 
to sketch out ASAP...

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2019-09-19 13:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19  8:43 arm64 iommu groups issue John Garry
2019-09-19 13:25 ` Robin Murphy [this message]
2019-09-19 14:35   ` John Garry
2019-11-04 12:18     ` John Garry
2020-02-13 15:49     ` John Garry
2020-02-13 19:40       ` Robin Murphy
2020-02-14 14:09         ` John Garry
2020-02-14 18:35           ` Robin Murphy
2020-02-17 12:08             ` John Garry
2020-06-12 14:30               ` Lorenzo Pieralisi
2020-06-15  7:35                 ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4768c541-ebf4-61d5-0c5e-77dee83f8f94@arm.com \
    --to=robin.murphy@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=guohanjun@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).