iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
@ 2019-11-04  5:58 Adrian Huang
  2019-11-11 15:22 ` Joerg Roedel
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Huang @ 2019-11-04  5:58 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, Adrian Huang

When attaching two Broadcom RAID controllers to a server, the first one
reports the failure during booting (the disks connecting to the RAID
controller cannot be detected):

  megaraid_sas 0000:42:00.0: Init cmd return status FAILED for SCSI host 0
  megaraid_sas 0000:42:00.0: Failed from megasas_init_fw 6376

Root-cause of the issue:
	1) Those two RAID controllers define their own IVMDs with the
	   valid exclusion range, and they are associated with the same
	   IOMMU hardware:

                 Subtable Type : 10 [Hardware Definition Block]
                         Flags : B0
                        Length : 0028
                      DeviceId : 4002

             Capability Offset : 0040
                  Base Address : 00000000B4100000
             PCI Segment Group : 0000
           Virtualization Info : 0000
                      Reserved : 80048F6F

                    Entry Type : 03
                     Device ID : 4008
                  Data Setting : 00

                    Entry Type : 04
                     Device ID : 7FFE
                  Data Setting : 00

                 Subtable Type : 21 [Memory Definition Block]
                         Flags : 08
                        Length : 0020
                      DeviceId : 4200

                Auxiliary Data : 0000
                      Reserved : 0000000000000000
                 Start Address : 000000009F58D000
                 Memory Length : 0000000008040000

                 Subtable Type : 21 [Memory Definition Block]
                         Flags : 08
                        Length : 0020
                      DeviceId : 4300

                Auxiliary Data : 0000
                      Reserved : 0000000000000000
                 Start Address : 000000009754D000
                 Memory Length : 0000000008040000

	2) When set_device_exclusion_range() parses the IVMD of devce id
	   '4200', the exclusion range of the amd_iommu struct becomes:

		iommu->exclusion_start = 0x9F58D000;
		iommu->exclusion_length = 0x8040000;

	3) When parsing the second IVMD (device id '4300') in
	   set_device_exclusion_range(), the exclusion range of the
	   amd_iommu struct becomes:

		iommu->exclusion_start = 0x9754D000;
		iommu->exclusion_length = 0x8040000;

	   This overwrites the first IVMD configuration, which leads to
	   the failure of the first RAID controller.

This patch fixes the issue by using unity map for multiple IVMDs if
those IVMDs define the valid exclusion range (different exclusion range)
and they are associated with the same IOMMU hardware. Note that the first
IVMD still uses the exclusion range.

Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
---
 drivers/iommu/amd_iommu_init.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 568c52317757..d65b548a42f5 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -71,6 +71,8 @@
 #define IVHD_FLAG_ISOC_EN_MASK          0x08
 
 #define IVMD_FLAG_EXCL_RANGE            0x08
+#define IVMD_FLAG_IW                    0x04
+#define IVMD_FLAG_IR                    0x02
 #define IVMD_FLAG_UNITY_MAP             0x01
 
 #define ACPI_DEVFLAG_INITPASS           0x01
@@ -1110,6 +1112,32 @@ static int __init add_early_maps(void)
 	return 0;
 }
 
+static int __init exclusion_range_has_configured(struct amd_iommu *iommu,
+							struct ivmd_header *m)
+{
+	/* Not configure yet. */
+	if (!iommu->exclusion_start) {
+		iommu->exclusion_start = m->range_start;
+		iommu->exclusion_length = m->range_length;
+
+		return 0;
+	}
+
+	if (iommu->exclusion_start == m->range_start &&
+	    iommu->exclusion_length == m->range_length)
+		return 0;
+
+	/*
+	 * The exclusion range of the iommu has been configured
+	 * by the other IVMD, so we need to use unity map for this
+	 * IVMD to avoid the overwritten exclusion range members of the
+	 * amd_iommu structure.
+	 */
+	m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP);
+
+	return 1;
+}
+
 /*
  * Reads the device exclusion range from ACPI and initializes the IOMMU with
  * it
@@ -1122,14 +1150,15 @@ static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m)
 		return;
 
 	if (iommu) {
+		if (exclusion_range_has_configured(iommu, m))
+			return;
+
 		/*
 		 * We only can configure exclusion ranges per IOMMU, not
 		 * per device. But we can enable the exclusion range per
 		 * device. This is done here
 		 */
 		set_dev_entry_bit(devid, DEV_ENTRY_EX);
-		iommu->exclusion_start = m->range_start;
-		iommu->exclusion_length = m->range_length;
 	}
 }
 
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
  2019-11-04  5:58 [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs Adrian Huang
@ 2019-11-11 15:22 ` Joerg Roedel
  2019-11-12  9:32   ` Huang Adrian
  0 siblings, 1 reply; 5+ messages in thread
From: Joerg Roedel @ 2019-11-11 15:22 UTC (permalink / raw)
  To: Adrian Huang; +Cc: iommu, Adrian Huang

Hi Adrian,

On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote:
> 	2) When set_device_exclusion_range() parses the IVMD of devce id
> 	   '4200', the exclusion range of the amd_iommu struct becomes:
> 
> 		iommu->exclusion_start = 0x9F58D000;
> 		iommu->exclusion_length = 0x8040000;
> 
> 	3) When parsing the second IVMD (device id '4300') in
> 	   set_device_exclusion_range(), the exclusion range of the
> 	   amd_iommu struct becomes:
> 
> 		iommu->exclusion_start = 0x9754D000;
> 		iommu->exclusion_length = 0x8040000;
> 
> 	   This overwrites the first IVMD configuration, which leads to
> 	   the failure of the first RAID controller.

Okay, I think this is clearly a BIOS bug as the BIOS should not define
two different exclusion ranges in the IVRS table.

So there are a couple of options here:

	1) Bail out and disable the IOMMU as the BIOS screwed up

	2) Treat per-device exclusion ranges just as r/w unity-mapped
	   regions.


I think option 2) is the best fix here.


Regards,

	Joerg
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
  2019-11-11 15:22 ` Joerg Roedel
@ 2019-11-12  9:32   ` Huang Adrian
  2019-11-12 16:05     ` Joerg Roedel
  0 siblings, 1 reply; 5+ messages in thread
From: Huang Adrian @ 2019-11-12  9:32 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, Adrian Huang

Hi Joerg,

> On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel <joro@8bytes.org> wrote:
>
> Hi Adrian,
>
> On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote:
> >       2) When set_device_exclusion_range() parses the IVMD of devce id
> >          '4200', the exclusion range of the amd_iommu struct becomes:
> >
> >               iommu->exclusion_start = 0x9F58D000;
> >               iommu->exclusion_length = 0x8040000;
> >
> >       3) When parsing the second IVMD (device id '4300') in
> >          set_device_exclusion_range(), the exclusion range of the
> >          amd_iommu struct becomes:
> >
> >               iommu->exclusion_start = 0x9754D000;
> >               iommu->exclusion_length = 0x8040000;
> >
> >          This overwrites the first IVMD configuration, which leads to
> >          the failure of the first RAID controller.
>
> Okay, I think this is clearly a BIOS bug as the BIOS should not define
> two different exclusion ranges in the IVRS table.

Thanks for the review.

Yes. I understand this is a BIOS bug. The purpose of this patch is to
prevent the failure event though the system boots with the buggy BIOS.

> So there are a couple of options here:
>
>         1) Bail out and disable the IOMMU as the BIOS screwed up
>
>         2) Treat per-device exclusion ranges just as r/w unity-mapped
>            regions.
>
>
> I think option 2) is the best fix here.

Yes. This is what this patch does (The first exclusion region still
uses the exclusion range while the remaining exclusion regions are
modified to be r/w unity-mapped regions when detecting multiple
exclusion regions) .

But, I'm guessing you're talking about that BIOS has to define r/w
unity-mapped regions instead of the per-device exclusions (Fix from
BIOS, not prevent the failure from kernel). Right?

-- Adrian
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
  2019-11-12  9:32   ` Huang Adrian
@ 2019-11-12 16:05     ` Joerg Roedel
  2019-11-13  1:05       ` Huang Adrian
  0 siblings, 1 reply; 5+ messages in thread
From: Joerg Roedel @ 2019-11-12 16:05 UTC (permalink / raw)
  To: Huang Adrian; +Cc: iommu, Adrian Huang

On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote:
> > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel <joro@8bytes.org> wrote:

> > So there are a couple of options here:
> >
> >         1) Bail out and disable the IOMMU as the BIOS screwed up
> >
> >         2) Treat per-device exclusion ranges just as r/w unity-mapped
> >            regions.
> >
> >
> > I think option 2) is the best fix here.
> 
> Yes. This is what this patch does (The first exclusion region still
> uses the exclusion range while the remaining exclusion regions are
> modified to be r/w unity-mapped regions when detecting multiple
> exclusion regions) .

Yeah, but I think it is better to just stop using the exclusion-range
feature of the hardware (because it meand BIOSes are correct) and just
treat every exclusion range (also the first one) as an r/w unity range.

> But, I'm guessing you're talking about that BIOS has to define r/w
> unity-mapped regions instead of the per-device exclusions (Fix from
> BIOS, not prevent the failure from kernel). Right?

That would be best, but I fear this is too much to wish for.

Regards,

	Joerg
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
  2019-11-12 16:05     ` Joerg Roedel
@ 2019-11-13  1:05       ` Huang Adrian
  0 siblings, 0 replies; 5+ messages in thread
From: Huang Adrian @ 2019-11-13  1:05 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu, Adrian Huang

On Wed, Nov 13, 2019 at 12:05 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote:
> > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel <joro@8bytes.org> wrote:
>
> > > So there are a couple of options here:
> > >
> > >         1) Bail out and disable the IOMMU as the BIOS screwed up
> > >
> > >         2) Treat per-device exclusion ranges just as r/w unity-mapped
> > >            regions.
> > >
> > >
> > > I think option 2) is the best fix here.
> >
> > Yes. This is what this patch does (The first exclusion region still
> > uses the exclusion range while the remaining exclusion regions are
> > modified to be r/w unity-mapped regions when detecting multiple
> > exclusion regions) .
>
> Yeah, but I think it is better to just stop using the exclusion-range
> feature of the hardware (because it meand BIOSes are correct) and just
> treat every exclusion range (also the first one) as an r/w unity range.

Okay, I see. If you're okay with this (treat per-device exclusion
ranges as r/w unity-mapped regions - including the first one), I can
prepare the patch.

>
> > But, I'm guessing you're talking about that BIOS has to define r/w
> > unity-mapped regions instead of the per-device exclusions (Fix from
> > BIOS, not prevent the failure from kernel). Right?
>
> That would be best, but I fear this is too much to wish for.

Totally agree. That's why I submit this patch. :-)

-- Adrian
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-13  1:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04  5:58 [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs Adrian Huang
2019-11-11 15:22 ` Joerg Roedel
2019-11-12  9:32   ` Huang Adrian
2019-11-12 16:05     ` Joerg Roedel
2019-11-13  1:05       ` Huang Adrian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).