From: Robin Murphy <robin.murphy@arm.com>
To: John Garry <john.garry@huawei.com>, Marc Zyngier <maz@kernel.org>,
Will Deacon <will@kernel.org>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Sudeep Holla <sudeep.holla@arm.com>,
"Guohanjun (Hanjun Guo)" <guohanjun@huawei.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Linuxarm <linuxarm@huawei.com>,
iommu <iommu@lists.linux-foundation.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: arm64 iommu groups issue
Date: Thu, 19 Sep 2019 14:25:38 +0100 [thread overview]
Message-ID: <4768c541-ebf4-61d5-0c5e-77dee83f8f94@arm.com> (raw)
In-Reply-To: <9625faf4-48ef-2dd3-d82f-931d9cf26976@huawei.com>
Hi John,
On 19/09/2019 09:43, John Garry wrote:
> Hi all,
>
> We have noticed a special behaviour on our arm64 D05 board when the SMMU
> is enabled with regards PCI device iommu groups.
>
> This platform does not support ACS, yet we find that all functions for a
> PCI device are not grouped together:
>
> root@ubuntu:/sys# dmesg | grep "Adding to iommu group"
> [ 7.307539] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [ 12.590533] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [ 13.688527] mlx5_core 000a:11:00.0: Adding to iommu group 2
> [ 14.324606] mlx5_core 000a:11:00.1: Adding to iommu group 3
> [ 14.937090] ehci-platform PNP0D20:00: Adding to iommu group 4
> [ 15.276637] pcieport 0002:f8:00.0: Adding to iommu group 5
> [ 15.340845] pcieport 0004:88:00.0: Adding to iommu group 6
> [ 15.392098] pcieport 0005:78:00.0: Adding to iommu group 7
> [ 15.443356] pcieport 000a:10:00.0: Adding to iommu group 8
> [ 15.484975] pcieport 000c:20:00.0: Adding to iommu group 9
> [ 15.543647] pcieport 000d:30:00.0: Adding to iommu group 10
> [ 15.599771] serial 0002:f9:00.0: Adding to iommu group 5
> [ 15.690807] serial 0002:f9:00.1: Adding to iommu group 5
> [ 84.322097] mlx5_core 000a:11:00.2: Adding to iommu group 8
> [ 84.856408] mlx5_core 000a:11:00.3: Adding to iommu group 8
>
> root@ubuntu:/sys# lspci -tv
> lspci -tvv
> -+-[000d:30]---00.0-[31]--
> +-[000c:20]---00.0-[21]----00.0 Huawei Technologies Co., Ltd.
> +-[000a:10]---00.0-[11-12]--+-00.0 Mellanox [ConnectX-5]
> | +-00.1 Mellanox [ConnectX-5]
> | +-00.2 Mellanox [ConnectX-5 VF]
> | \-00.3 Mellanox [ConnectX-5 VF]
> +-[0007:90]---00.0-[91]----00.0 Huawei Technologies Co., ...
> +-[0006:c0]---00.0-[c1]--
> +-[0005:78]---00.0-[79]--
> +-[0004:88]---00.0-[89]--
> +-[0002:f8]---00.0-[f9]--+-00.0 MosChip Semiconductor Technology ...
> | +-00.1 MosChip Semiconductor Technology ...
> | \-00.2 MosChip Semiconductor Technology ...
> \-[0000:00]-
>
> For the PCI devices in question - on port 000a:10:00.0 - you will notice
> that the port and VFs (000a:11:00.2, 3) are in one group, yet the 2 PFs
> (000a:11:00.0, 000a:11:00.1) are in separate groups.
>
> I also notice the same ordering nature on our D06 platform - the
> pcieport is added to an iommu group after PF for that port. However this
> platform supports ACS, so not such a problem.
>
> After some checking, I find that when the pcieport driver probes, the
> associated SMMU device had not registered yet with the IOMMU framework,
> so we defer the probe for this device - in iort.c:iort_iommu_xlate(),
> when no iommu ops are available, we defer.
>
> Yet, when the mlx5 PF devices probe, the iommu ops are available at this
> stage. So the probe continues and we get an iommu group for the device -
> but not the same group as the parent port, as it has not yet been added
> to a group. When the port eventually probes it gets a new, separate group.
>
> This all seems to be as the built-in module init ordering is as follows:
> pcieport drv, smmu drv, mlx5 drv
>
> I notice that if I build the mlx5 drv as a ko and insert after boot, all
> functions + pcieport are in the same group:
>
> [ 11.530046] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [ 17.301093] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [ 18.743600] ehci-platform PNP0D20:00: Adding to iommu group 2
> [ 20.212284] pcieport 0002:f8:00.0: Adding to iommu group 3
> [ 20.356303] pcieport 0004:88:00.0: Adding to iommu group 4
> [ 20.493337] pcieport 0005:78:00.0: Adding to iommu group 5
> [ 20.702999] pcieport 000a:10:00.0: Adding to iommu group 6
> [ 20.859183] pcieport 000c:20:00.0: Adding to iommu group 7
> [ 20.996140] pcieport 000d:30:00.0: Adding to iommu group 8
> [ 21.152637] serial 0002:f9:00.0: Adding to iommu group 3
> [ 21.346991] serial 0002:f9:00.1: Adding to iommu group 3
> [ 100.754306] mlx5_core 000a:11:00.0: Adding to iommu group 6
> [ 101.420156] mlx5_core 000a:11:00.1: Adding to iommu group 6
> [ 292.481714] mlx5_core 000a:11:00.2: Adding to iommu group 6
> [ 293.281061] mlx5_core 000a:11:00.3: Adding to iommu group 6
>
> This does seem like a problem for arm64 platforms which don't support
> ACS, yet enable an SMMU. Maybe also a problem even if they do support ACS.
>
> Opinion?
Yeah, this is less than ideal. One way to bodge it might be to make
pci_device_group() also walk downwards to see if any non-ACS-isolated
children already have a group, rather than assuming that groups get
allocated in hierarchical order, but that's far from ideal.
The underlying issue is that, for historical reasons, OF/IORT-based
IOMMU drivers have ended up with group allocation being tied to endpoint
driver probing via the dma_configure() mechanism (long story short,
driver probe is the only thing which can be delayed in order to wait for
a specific IOMMU instance to be ready). However, in the meantime, the
IOMMU API internals have evolved sufficiently that I think there's a way
to really put things right - I have the spark of an idea which I'll try
to sketch out ASAP...
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2019-09-19 13:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-19 8:43 arm64 iommu groups issue John Garry
2019-09-19 13:25 ` Robin Murphy [this message]
2019-09-19 14:35 ` John Garry
2019-11-04 12:18 ` John Garry
2020-02-13 15:49 ` John Garry
2020-02-13 19:40 ` Robin Murphy
2020-02-14 14:09 ` John Garry
2020-02-14 18:35 ` Robin Murphy
2020-02-17 12:08 ` John Garry
2020-06-12 14:30 ` Lorenzo Pieralisi
2020-06-15 7:35 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4768c541-ebf4-61d5-0c5e-77dee83f8f94@arm.com \
--to=robin.murphy@arm.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=guohanjun@huawei.com \
--cc=iommu@lists.linux-foundation.org \
--cc=john.garry@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=lorenzo.pieralisi@arm.com \
--cc=maz@kernel.org \
--cc=sudeep.holla@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).