linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU
@ 2017-12-19 15:13 Tomasz Nowicki
  2017-12-19 15:20 ` [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique Tomasz Nowicki
  2017-12-19 16:34 ` [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Robin Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Tomasz Nowicki @ 2017-12-19 15:13 UTC (permalink / raw)
  To: joro, robin.murphy, will.deacon, lorenzo.pieralisi, bhelgaas
  Cc: Jayachandran.Nair, Tomasz Nowicki, linux-pci, linux-kernel,
	stable, linux-acpi, iommu, Ganapatrao.Kulkarni, mw,
	linux-arm-kernel

Here is my lspci output of ThunderX2 for which I am observing kernel panic coming from
SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):

# lspci -vt
-[0000:00]-+-00.0-[01-1f]--+ [...]
                           + [...]
                           \-00.0-[1e-1f]----00.0-[1f]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family

ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family
PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge
                           
While setting up ASP device SID in IORT dirver:
iort_iommu_configure() -> pci_for_each_dma_alias()
we need to walk up and iterate over each device which alias transaction from
downstream devices.

AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT.
Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge
spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using
the subordinate bus number. For bridge (1e:00.0), the subordinate is equal
to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream
device. So it is possible to have two identical SIDs. The question is what we
should do about such case. Presented patch prevents from registering the same
ID so that SMMUv3 is not complaining later on.

Tomasz Nowicki (1):
  iommu: Make sure device's ID array elements are unique

 drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique
  2017-12-19 15:13 [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Tomasz Nowicki
@ 2017-12-19 15:20 ` Tomasz Nowicki
  2017-12-19 16:37   ` Alex Williamson
  2017-12-19 16:34 ` [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Robin Murphy
  1 sibling, 1 reply; 8+ messages in thread
From: Tomasz Nowicki @ 2017-12-19 15:20 UTC (permalink / raw)
  To: joro, robin.murphy, will.deacon, lorenzo.pieralisi, bhelgaas
  Cc: Jayachandran.Nair, Tomasz Nowicki, linux-pci, linux-kernel,
	stable, linux-acpi, iommu, Ganapatrao.Kulkarni, mw,
	linux-arm-kernel

While iterating over DMA aliases for a PCI device, for some rare cases
(i.e. PCIe-to-PCI/X bridges) we may get exactly the same ID as initial child
device. In turn, the same ID may get registered for a device multiple times.
Eventually IOMMU  driver may try to configure the same ID within domain
multiple times too which for some IOMMU drivers is illegal and causes kernel
panic.

Rule out ID duplication prior to device ID array registration.

CC: stable@vger.kernel.org	# v4.14+
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
---
 drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0b..9b2c138 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1945,6 +1945,31 @@ void iommu_fwspec_free(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_fwspec_free);
 
+static void iommu_fwspec_remove_ids_dup(struct device *dev, u32 *ids,
+					int *num_ids)
+{
+	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+	int i, j, k, valid_ids = *num_ids;
+
+	for (i = 0; i < valid_ids; i++) {
+		for (j = 0; j < fwspec->num_ids; j++) {
+			if (ids[i] != fwspec->ids[j])
+				continue;
+
+			dev_info(dev, "found 0x%x ID duplication, skipped\n",
+				 ids[i]);
+
+			for (k = i + 1; k < valid_ids; k++)
+				ids[k - 1] = ids[k];
+
+			valid_ids--;
+			break;
+		}
+	}
+
+	*num_ids = valid_ids;
+}
+
 int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 {
 	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
@@ -1954,6 +1979,9 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
 	if (!fwspec)
 		return -EINVAL;
 
+	/* Rule out IDs already registered */
+	iommu_fwspec_remove_ids_dup(dev, ids, &num_ids);
+
 	size = offsetof(struct iommu_fwspec, ids[fwspec->num_ids + num_ids]);
 	if (size > sizeof(*fwspec)) {
 		fwspec = krealloc(dev->iommu_fwspec, size, GFP_KERNEL);
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU
  2017-12-19 15:13 [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Tomasz Nowicki
  2017-12-19 15:20 ` [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique Tomasz Nowicki
@ 2017-12-19 16:34 ` Robin Murphy
  2017-12-20 10:27   ` Tomasz Nowicki
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Robin Murphy @ 2017-12-19 16:34 UTC (permalink / raw)
  To: Tomasz Nowicki, joro, will.deacon, lorenzo.pieralisi, bhelgaas
  Cc: Jayachandran.Nair, linux-pci, linux-kernel, stable, linux-acpi,
	iommu, Ganapatrao.Kulkarni, mw, linux-arm-kernel

Hi Tomasz,

On 19/12/17 15:13, Tomasz Nowicki wrote:
> Here is my lspci output of ThunderX2 for which I am observing kernel panic coming from
> SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):
> 
> # lspci -vt
> -[0000:00]-+-00.0-[01-1f]--+ [...]
>                             + [...]
>                             \-00.0-[1e-1f]----00.0-[1f]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
> 
> ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family
> PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge
>                             
> While setting up ASP device SID in IORT dirver:
> iort_iommu_configure() -> pci_for_each_dma_alias()
> we need to walk up and iterate over each device which alias transaction from
> downstream devices.
> 
> AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT.
> Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge
> spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using
> the subordinate bus number. For bridge (1e:00.0), the subordinate is equal
> to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream
> device. So it is possible to have two identical SIDs. The question is what we
> should do about such case. Presented patch prevents from registering the same
> ID so that SMMUv3 is not complaining later on.

Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate
grouped devices aliasing to the same ID, but I guess I overlooked the
distinction of a device sharing an alias ID with itself. I'm not sure
I really like trying to work around this in generic code, since
fwspec->ids is essentially opaque data in a driver-specific format - in
theory a driver is free to encode a single logical ID into multiple
fwspec elements (I think I did that in an early draft of SMMUv2 SMR
support), at which point this approach might corrupt things massively.

Does the (untested) diff below suffice?

Robin.

----->8-----diff --git a/drivers/iommu/arm-smmu-v3.c 
b/drivers/iommu/arm-smmu-v3.c
index f122071688fd..d8a730d83401 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
arm_smmu_device *smmu, u32 sid)

  static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
  {
-	int i;
+	int i, j;
  	struct arm_smmu_master_data *master = fwspec->iommu_priv;
  	struct arm_smmu_device *smmu = master->smmu;

@@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct 
iommu_fwspec *fwspec)
  		u32 sid = fwspec->ids[i];
  		__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);

+		/* Bridged PCI devices may end up with duplicated IDs */
+		for (j = 0; j < i; j++)
+			if (fwspec->ids[j] == sid)
+				break;
+		if (j < i)
+			continue;
+
  		arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
  	}
  }

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique
  2017-12-19 15:20 ` [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique Tomasz Nowicki
@ 2017-12-19 16:37   ` Alex Williamson
  2017-12-20 10:28     ` Tomasz Nowicki
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2017-12-19 16:37 UTC (permalink / raw)
  To: Tomasz Nowicki
  Cc: joro, robin.murphy, will.deacon, lorenzo.pieralisi, bhelgaas,
	Jayachandran.Nair, linux-pci, linux-kernel, stable, linux-acpi,
	iommu, Ganapatrao.Kulkarni, mw, linux-arm-kernel

On Tue, 19 Dec 2017 16:20:21 +0100
Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> wrote:

> While iterating over DMA aliases for a PCI device, for some rare cases
> (i.e. PCIe-to-PCI/X bridges) we may get exactly the same ID as initial child
> device. In turn, the same ID may get registered for a device multiple times.
> Eventually IOMMU  driver may try to configure the same ID within domain
> multiple times too which for some IOMMU drivers is illegal and causes kernel
> panic.
> 
> Rule out ID duplication prior to device ID array registration.
> 
> CC: stable@vger.kernel.org	# v4.14+

You've identified a release, is there a specific commit this fixes?

> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
> ---
>  drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3de5c0b..9b2c138 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1945,6 +1945,31 @@ void iommu_fwspec_free(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_free);
>  
> +static void iommu_fwspec_remove_ids_dup(struct device *dev, u32 *ids,
> +					int *num_ids)
> +{
> +	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +	int i, j, k, valid_ids = *num_ids;
> +
> +	for (i = 0; i < valid_ids; i++) {
> +		for (j = 0; j < fwspec->num_ids; j++) {
> +			if (ids[i] != fwspec->ids[j])
> +				continue;
> +
> +			dev_info(dev, "found 0x%x ID duplication, skipped\n",
> +				 ids[i]);
> +
> +			for (k = i + 1; k < valid_ids; k++)
> +				ids[k - 1] = ids[k];

Use memmove()?

> +
> +			valid_ids--;
> +			break;

At this point ids[i] is not the ids[i] that we tested for dupes, it's
what was ids[i + 1], but we're going to i++ on the next iteration and
we therefore never test that entry.

> +		}
> +	}
> +
> +	*num_ids = valid_ids;
> +}
> +
>  int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>  {
>  	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> @@ -1954,6 +1979,9 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids)
>  	if (!fwspec)
>  		return -EINVAL;
>  
> +	/* Rule out IDs already registered */
> +	iommu_fwspec_remove_ids_dup(dev, ids, &num_ids);
> +
>  	size = offsetof(struct iommu_fwspec, ids[fwspec->num_ids + num_ids]);
>  	if (size > sizeof(*fwspec)) {
>  		fwspec = krealloc(dev->iommu_fwspec, size, GFP_KERNEL);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU
  2017-12-19 16:34 ` [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Robin Murphy
@ 2017-12-20 10:27   ` Tomasz Nowicki
  2017-12-22 11:29   ` Tomasz Nowicki
  2017-12-27 19:34   ` Jayachandran C
  2 siblings, 0 replies; 8+ messages in thread
From: Tomasz Nowicki @ 2017-12-20 10:27 UTC (permalink / raw)
  To: Robin Murphy, Tomasz Nowicki, joro, will.deacon,
	lorenzo.pieralisi, bhelgaas
  Cc: Jayachandran.Nair, Ganapatrao.Kulkarni, linux-kernel, iommu,
	linux-arm-kernel, linux-acpi, linux-pci, mw, stable

Hi Robin,

On 19.12.2017 17:34, Robin Murphy wrote:
> Hi Tomasz,
> 
> On 19/12/17 15:13, Tomasz Nowicki wrote:
>> Here is my lspci output of ThunderX2 for which I am observing kernel 
>> panic coming from
>> SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):
>>
>> # lspci -vt
>> -[0000:00]-+-00.0-[01-1f]--+ [...]
>>                             + [...]
>>                             \-00.0-[1e-1f]----00.0-[1f]----00.0  
>> ASPEED Technology, Inc. ASPEED Graphics Family
>>
>> ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, 
>> Inc. ASPEED Graphics Family
>> PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED 
>> Technology, Inc. AST1150 PCI-to-PCI Bridge
>> While setting up ASP device SID in IORT dirver:
>> iort_iommu_configure() -> pci_for_each_dma_alias()
>> we need to walk up and iterate over each device which alias 
>> transaction from
>> downstream devices.
>>
>> AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from 
>> IORT.
>> Bridge (1e:00.0) is the first alias. Following PCI Express to 
>> PCI/PCI-X Bridge
>> spec: PCIe-to-PCI/X bridges alias transactions from downstream devices 
>> using
>> the subordinate bus number. For bridge (1e:00.0), the subordinate is 
>> equal
>> to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as 
>> downstream
>> device. So it is possible to have two identical SIDs. The question is 
>> what we
>> should do about such case. Presented patch prevents from registering 
>> the same
>> ID so that SMMUv3 is not complaining later on.
> 
> Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate
> grouped devices aliasing to the same ID, but I guess I overlooked the
> distinction of a device sharing an alias ID with itself. I'm not sure
> I really like trying to work around this in generic code, since
> fwspec->ids is essentially opaque data in a driver-specific format - in
> theory a driver is free to encode a single logical ID into multiple
> fwspec elements (I think I did that in an early draft of SMMUv2 SMR
> support), at which point this approach might corrupt things massively.

I don't have strong favourite here, the fix in SMMUv3 driver would work 
too. Initially we can fix things for SMMUv3 only and if ever face the 
same issue for other IOMMU driver, then we can move it to generic layer.

> 
> Does the (untested) diff below suffice?
> 
> Robin.
> 
> ----->8-----diff --git a/drivers/iommu/arm-smmu-v3.c 
> b/drivers/iommu/arm-smmu-v3.c
> index f122071688fd..d8a730d83401 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
> arm_smmu_device *smmu, u32 sid)
> 
>   static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
>   {
> -    int i;
> +    int i, j;
>       struct arm_smmu_master_data *master = fwspec->iommu_priv;
>       struct arm_smmu_device *smmu = master->smmu;
> 
> @@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct 
> iommu_fwspec *fwspec)
>           u32 sid = fwspec->ids[i];
>           __le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
> 
> +        /* Bridged PCI devices may end up with duplicated IDs */
> +        for (j = 0; j < i; j++)
> +            if (fwspec->ids[j] == sid)
> +                break;
> +        if (j < i)
> +            continue;
> +
>           arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
>       }
>   }

Yes, worked for me.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique
  2017-12-19 16:37   ` Alex Williamson
@ 2017-12-20 10:28     ` Tomasz Nowicki
  0 siblings, 0 replies; 8+ messages in thread
From: Tomasz Nowicki @ 2017-12-20 10:28 UTC (permalink / raw)
  To: Alex Williamson, Tomasz Nowicki
  Cc: joro, robin.murphy, will.deacon, lorenzo.pieralisi, bhelgaas,
	Jayachandran.Nair, linux-pci, linux-kernel, stable, linux-acpi,
	iommu, Ganapatrao.Kulkarni, mw, linux-arm-kernel

On 19.12.2017 17:37, Alex Williamson wrote:
> On Tue, 19 Dec 2017 16:20:21 +0100
> Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> wrote:
> 
>> While iterating over DMA aliases for a PCI device, for some rare cases
>> (i.e. PCIe-to-PCI/X bridges) we may get exactly the same ID as initial child
>> device. In turn, the same ID may get registered for a device multiple times.
>> Eventually IOMMU  driver may try to configure the same ID within domain
>> multiple times too which for some IOMMU drivers is illegal and causes kernel
>> panic.
>>
>> Rule out ID duplication prior to device ID array registration.
>>
>> CC: stable@vger.kernel.org	# v4.14+
> 
> You've identified a release, is there a specific commit this fixes?

Yes, it was triggered by converting drm_pci_init() to 
pci_register_driver() in ast_drv.c

Fixes: 10631d724def ("drm/pci: Deprecate drm_pci_init/exit completely
")

> 
>> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
>> ---
>>   drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 3de5c0b..9b2c138 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1945,6 +1945,31 @@ void iommu_fwspec_free(struct device *dev)
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_fwspec_free);
>>   
>> +static void iommu_fwspec_remove_ids_dup(struct device *dev, u32 *ids,
>> +					int *num_ids)
>> +{
>> +	struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>> +	int i, j, k, valid_ids = *num_ids;
>> +
>> +	for (i = 0; i < valid_ids; i++) {
>> +		for (j = 0; j < fwspec->num_ids; j++) {
>> +			if (ids[i] != fwspec->ids[j])
>> +				continue;
>> +
>> +			dev_info(dev, "found 0x%x ID duplication, skipped\n",
>> +				 ids[i]);
>> +
>> +			for (k = i + 1; k < valid_ids; k++)
>> +				ids[k - 1] = ids[k];
> 
> Use memmove()?

Right.

> 
>> +
>> +			valid_ids--;
>> +			break;
> 
> At this point ids[i] is not the ids[i] that we tested for dupes, it's
> what was ids[i + 1], but we're going to i++ on the next iteration and
> we therefore never test that entry.

Good point.

Now the fundamental question is where we should put the patch, here or 
in SMMUv3 driver as per Robin suggestion.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU
  2017-12-19 16:34 ` [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Robin Murphy
  2017-12-20 10:27   ` Tomasz Nowicki
@ 2017-12-22 11:29   ` Tomasz Nowicki
  2017-12-27 19:34   ` Jayachandran C
  2 siblings, 0 replies; 8+ messages in thread
From: Tomasz Nowicki @ 2017-12-22 11:29 UTC (permalink / raw)
  To: Robin Murphy, Tomasz Nowicki, joro, will.deacon,
	lorenzo.pieralisi, bhelgaas
  Cc: Jayachandran.Nair, Ganapatrao.Kulkarni, linux-kernel, iommu,
	linux-arm-kernel, linux-acpi, linux-pci, mw, stable

Hi Robin,

On 19.12.2017 17:34, Robin Murphy wrote:
> Hi Tomasz,
> 
> On 19/12/17 15:13, Tomasz Nowicki wrote:
>> Here is my lspci output of ThunderX2 for which I am observing kernel 
>> panic coming from
>> SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):
>>
>> # lspci -vt
>> -[0000:00]-+-00.0-[01-1f]--+ [...]
>>                             + [...]
>>                             \-00.0-[1e-1f]----00.0-[1f]----00.0  
>> ASPEED Technology, Inc. ASPEED Graphics Family
>>
>> ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, 
>> Inc. ASPEED Graphics Family
>> PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED 
>> Technology, Inc. AST1150 PCI-to-PCI Bridge
>> While setting up ASP device SID in IORT dirver:
>> iort_iommu_configure() -> pci_for_each_dma_alias()
>> we need to walk up and iterate over each device which alias 
>> transaction from
>> downstream devices.
>>
>> AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from 
>> IORT.
>> Bridge (1e:00.0) is the first alias. Following PCI Express to 
>> PCI/PCI-X Bridge
>> spec: PCIe-to-PCI/X bridges alias transactions from downstream devices 
>> using
>> the subordinate bus number. For bridge (1e:00.0), the subordinate is 
>> equal
>> to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as 
>> downstream
>> device. So it is possible to have two identical SIDs. The question is 
>> what we
>> should do about such case. Presented patch prevents from registering 
>> the same
>> ID so that SMMUv3 is not complaining later on.
> 
> Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate
> grouped devices aliasing to the same ID, but I guess I overlooked the
> distinction of a device sharing an alias ID with itself. I'm not sure
> I really like trying to work around this in generic code, since
> fwspec->ids is essentially opaque data in a driver-specific format - in
> theory a driver is free to encode a single logical ID into multiple
> fwspec elements (I think I did that in an early draft of SMMUv2 SMR
> support), at which point this approach might corrupt things massively.
> 
> Does the (untested) diff below suffice?
> 
> Robin.
> 
> ----->8-----diff --git a/drivers/iommu/arm-smmu-v3.c 
> b/drivers/iommu/arm-smmu-v3.c
> index f122071688fd..d8a730d83401 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
> arm_smmu_device *smmu, u32 sid)
> 
>   static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
>   {
> -    int i;
> +    int i, j;
>       struct arm_smmu_master_data *master = fwspec->iommu_priv;
>       struct arm_smmu_device *smmu = master->smmu;
> 
> @@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct 
> iommu_fwspec *fwspec)
>           u32 sid = fwspec->ids[i];
>           __le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
> 
> +        /* Bridged PCI devices may end up with duplicated IDs */
> +        for (j = 0; j < i; j++)
> +            if (fwspec->ids[j] == sid)
> +                break;
> +        if (j < i)
> +            continue;
> +
>           arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
>       }
>   }

The patch works for me:

Tested-by: Tomasz Nowicki <Tomasz.Nowicki@cavium.com>

The bug causes a regression on our platform and would be nice to fix it 
for 4.15. Since no more comments so far, IMO we can put the fix to 
SMMUv3 driver. Do you prefer to send patch yourself or would you like me 
to do it?

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU
  2017-12-19 16:34 ` [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Robin Murphy
  2017-12-20 10:27   ` Tomasz Nowicki
  2017-12-22 11:29   ` Tomasz Nowicki
@ 2017-12-27 19:34   ` Jayachandran C
  2 siblings, 0 replies; 8+ messages in thread
From: Jayachandran C @ 2017-12-27 19:34 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jayachandran.Nair, lorenzo.pieralisi, Tomasz Nowicki, linux-pci,
	joro, will.deacon, linux-kernel, stable, linux-acpi, iommu,
	Ganapatrao.Kulkarni, bhelgaas, mw, linux-arm-kernel

Hi Robin,
On Tue, Dec 19, 2017 at 04:34:46PM +0000, Robin Murphy wrote:
> Hi Tomasz,
> 
> On 19/12/17 15:13, Tomasz Nowicki wrote:
> >Here is my lspci output of ThunderX2 for which I am observing kernel panic coming from
> >SMMUv3 driver -> arm_smmu_write_strtab_ent() -> BUG_ON(ste_live):
> >
> ># lspci -vt
> >-[0000:00]-+-00.0-[01-1f]--+ [...]
> >                            + [...]
> >                            \-00.0-[1e-1f]----00.0-[1f]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
> >
> >ASP device -> 1f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family
> >PCI-Express to PCI/PCI-X Bridge -> 1e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge
> >While setting up ASP device SID in IORT dirver:
> >iort_iommu_configure() -> pci_for_each_dma_alias()
> >we need to walk up and iterate over each device which alias transaction from
> >downstream devices.
> >
> >AST device (1f:00.0) gets BDF=0x1f00 and corresponding SID=0x1f00 from IORT.
> >Bridge (1e:00.0) is the first alias. Following PCI Express to PCI/PCI-X Bridge
> >spec: PCIe-to-PCI/X bridges alias transactions from downstream devices using
> >the subordinate bus number. For bridge (1e:00.0), the subordinate is equal
> >to 0x1f. This gives BDF=0x1f00 and SID=1f00 which is the same as downstream
> >device. So it is possible to have two identical SIDs. The question is what we
> >should do about such case. Presented patch prevents from registering the same
> >ID so that SMMUv3 is not complaining later on.
> 
> Ooh, subtle :( There is logic in arm_smmu_attach_device() to tolerate
> grouped devices aliasing to the same ID, but I guess I overlooked the
> distinction of a device sharing an alias ID with itself. I'm not sure
> I really like trying to work around this in generic code, since
> fwspec->ids is essentially opaque data in a driver-specific format - in
> theory a driver is free to encode a single logical ID into multiple
> fwspec elements (I think I did that in an early draft of SMMUv2 SMR
> support), at which point this approach might corrupt things massively.
> 
> Does the (untested) diff below suffice?
> 
> Robin.
> 
> ----->8-----diff --git a/drivers/iommu/arm-smmu-v3.c
> b/drivers/iommu/arm-smmu-v3.c
> index f122071688fd..d8a730d83401 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1731,7 +1731,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct
> arm_smmu_device *smmu, u32 sid)
> 
>  static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
>  {
> -	int i;
> +	int i, j;
>  	struct arm_smmu_master_data *master = fwspec->iommu_priv;
>  	struct arm_smmu_device *smmu = master->smmu;
> 
> @@ -1739,6 +1739,13 @@ static void arm_smmu_install_ste_for_dev(struct
> iommu_fwspec *fwspec)
>  		u32 sid = fwspec->ids[i];
>  		__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
> 
> +		/* Bridged PCI devices may end up with duplicated IDs */
> +		for (j = 0; j < i; j++)
> +			if (fwspec->ids[j] == sid)
> +				break;
> +		if (j < i)
> +			continue;
> +
>  		arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste);
>  	}
>  }

Here is another tested by if you need one more:
Tested-by: Jayachandran C. <jnair@caviumnetworks.com>

This fixes the crash below seen in ThunderX2 boards with Aspeed BMC (when
booted without iommu.passthrough). Since this is a regression that breaks
bootup on the platform, can you consider submitting this for the 4.15 cycle?

[   84.729351] ------------[ cut here ]------------
[   84.729354] kernel BUG at /home/ubuntu/linux/drivers/iommu/arm-smmu-v3.c:1201!
[   84.729358] Internal error: Oops - BUG: 0 [#1] SMP
[   84.729360] Modules linked in: ast(+) ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops qede(+) drm q
ed bnx2x(+) mpt3sas raid_class scsi_transport_sas sdhci_acpi mdio sdhci libcrc32c gpio_xlp
[   84.729375] CPU: 190 PID: 1725 Comm: systemd-udevd Not tainted 4.15.0-rc5 #37
[   84.729376] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 11/08/2017
[   84.729379] pstate: 80400009 (Nzcv daif +PAN -UAO)
[   84.729388] pc : arm_smmu_write_strtab_ent+0x1f0/0x1f8
[   84.729391] lr : arm_smmu_install_ste_for_dev+0x70/0xc8
[   84.729392] sp : ffff00001f6e3730
[   84.729393] x29: ffff00001f6e3730 x28: ffff809ecc0d2268
[   84.729395] x27: 0000000000000018 x26: ffff00001f6e3910
[   84.729397] x25: ffff809ecc0d2238 x24: ffff809ecdc94158
[   84.729398] x23: 0000000000000018 x22: 0000000000001d00
[   84.729400] x21: ffff809ecdc90018 x20: ffff809ecb5ef288
[   84.729401] x19: ffff80beca480000 x18: ffff0000093b8c08
[   84.729402] x17: ffff000008aeab70 x16: ffff00001f6e3a20
[   84.729404] x15: 0000000000000000 x14: dead000000000100
[   84.729405] x13: dead000000000200 x12: 0000000000000020
[   84.729407] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   84.729408] x9 : 0000000000000000 x8 : 0000000000000000
[   84.729410] x7 : 0000000000000100 x6 : 0000000000000015
[   84.729411] x5 : 0000000000000015 x4 : 0000000000000002
[   84.729413] x3 : ffff809ecb5ef288 x2 : 0000000000000001
[   84.729414] x1 : ffff000008691e30 x0 : ffff809ecc0d2238
[   84.729418] Process systemd-udevd (pid: 1725, stack limit = 0x000000002c585821)
[   84.729420] Call trace:
[   84.729423]  arm_smmu_write_strtab_ent+0x1f0/0x1f8
[   84.729425]  arm_smmu_install_ste_for_dev+0x70/0xc8
[   84.729426]  arm_smmu_attach_dev+0x100/0x2f8
[   84.729431]  __iommu_attach_device+0x54/0xe0
[   84.729433]  iommu_group_add_device+0x150/0x428
[   84.729435]  iommu_group_get_for_dev+0x84/0x180
[   84.729436]  arm_smmu_add_device+0x138/0x240
[   84.729445]  iort_iommu_configure+0x138/0x188
[   84.729452]  acpi_dma_configure+0x3c/0x80
[   84.729456]  dma_configure+0xb0/0xe0
[   84.729462]  driver_probe_device+0x1f0/0x4a8
[   84.729464]  __driver_attach+0x124/0x128
[   84.729466]  bus_for_each_dev+0x70/0xb0
[   84.729467]  driver_attach+0x30/0x40
[   84.729469]  bus_add_driver+0x248/0x2b8
[   84.729471]  driver_register+0x68/0x100
[   84.729478]  __pci_register_driver+0x5c/0x70
[   84.729488]  ast_init+0x30/0x1000 [ast]
[   84.729494]  do_one_initcall+0x5c/0x168
[   84.729501]  do_init_module+0x64/0x1f4
[   84.729502]  load_module+0x1e64/0x21e8
[   84.729503]  SyS_finit_module+0x108/0x118
[   84.729505]  el0_svc_naked+0x20/0x24
[   84.729508] Code: 34ffff82 d4210000 d2800120 17ffffd8 (d4210000)

Thanks,
JC.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-27 19:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-19 15:13 [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Tomasz Nowicki
2017-12-19 15:20 ` [PATCH V1 1/1] iommu: Make sure device's ID array elements are unique Tomasz Nowicki
2017-12-19 16:37   ` Alex Williamson
2017-12-20 10:28     ` Tomasz Nowicki
2017-12-19 16:34 ` [PATCH V1 0/1] Fix kernel panic caused by device ID duplication presented to the IOMMU Robin Murphy
2017-12-20 10:27   ` Tomasz Nowicki
2017-12-22 11:29   ` Tomasz Nowicki
2017-12-27 19:34   ` Jayachandran C

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).