linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable
@ 2021-03-11 14:28 Huang Rui
  2021-03-16 13:16 ` Joerg Roedel
  0 siblings, 1 reply; 5+ messages in thread
From: Huang Rui @ 2021-03-11 14:28 UTC (permalink / raw)
  To: iommu, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Alex Deucher, Xiaojian Du,
	Huang Rui, Joerg Roedel, stable

While amd_iommu is set to disable in cmd line, it will free the iommu
resources. Then the pages of rlookup table is freed as well. If that, we
have to check rlookup table in irq_remapping_select(), otherwise, it
will trigger a kernel panic below.

[    2.245855] BUG: kernel NULL pointer dereference, address: 0000000000000500
[    2.252861] #PF: supervisor read access in kernel mode
[    2.258053] #PF: error_code(0x0000) - not-present page
[    2.263247] PGD 0 P4D 0
[    2.265844] Oops: 0000 [#1] SMP NOPTI
[    2.269570] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-custom #1
[    2.276150] Hardware name: AMD Chachani-VN/Chachani-VN, BIOS VCH162755N.FD 03/04/2021
[    2.284085] RIP: 0010:irq_remapping_select+0x5c/0xb0
[    2.289107] Code: 4b 0c 48 3d 70 7c c8 8f 75 0d eb 35 48 8b 00 48 3d 70 7c c8 8f 74 2a 0f b6 50 10 39 d1 75 ed 0f b7 40 12 48 8b 15 f4 8a 3b 02 <48> 8b 14 c2 48 85 d2 74 0e b8 01 00 00 00 4c 3b a2 90 04 00 00 74
[    2.307999] RSP: 0000:ffffffff8f403d40 EFLAGS: 00010246
[    2.313285] RAX: 00000000000000a0 RBX: ffffffff8f403d98 RCX: 0000000000000021
[    2.320471] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8cbb4006118a
[    2.327658] RBP: ffffffff8f403d50 R08: 0000000000000021 R09: 0000000000000002
[    2.334838] R10: 000000000000000a R11: f000000000000000 R12: ffff8cbb401bfe40
[    2.342022] R13: 0000000000000000 R14: ffff8cbb401be900 R15: 0000000000000000
[    2.349210] FS:  0000000000000000(0000) GS:ffff8cbd73e00000(0000) knlGS:0000000000000000
[    2.357399] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.363198] CR2: 0000000000000500 CR3: 00000002dae0a000 CR4: 00000000000406b0
[    2.370382] Call Trace:
[    2.372897]  irq_find_matching_fwspec+0x48/0xd0
[    2.377489]  mp_irqdomain_create+0x7c/0x180
[    2.381736]  ? __raw_callee_save___native_queued_spin_unlock+0x15/0x23
[    2.388320]  setup_IO_APIC+0x81/0x875
[    2.392048]  ? clear_IO_APIC_pin+0xd6/0x130
[    2.396294]  ? clear_IO_APIC+0x39/0x60
[    2.400103]  apic_intr_mode_init+0x107/0x10a
[    2.404432]  x86_late_time_init+0x24/0x35
[    2.408507]  start_kernel+0x509/0x5c7
[    2.412230]  x86_64_start_reservations+0x24/0x26
[    2.416911]  x86_64_start_kernel+0x75/0x79
[    2.421068]  secondary_startup_64_no_verify+0xb0/0xbb

Fixes: a1a785b572425 ("iommu/amd: Implement select() method on remapping irqdomain")
Tested-by: Xiaojian Du <xiaojian.du@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
---
 drivers/iommu/amd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f0adbc48fd17..a08e885403b7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3862,7 +3862,7 @@ static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
 	else if (x86_fwspec_is_hpet(fwspec))
 		devid = get_hpet_devid(fwspec->param[0]);
 
-	if (devid < 0)
+	if (devid < 0 || !amd_iommu_rlookup_table)
 		return 0;
 
 	iommu = amd_iommu_rlookup_table[devid];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable
  2021-03-11 14:28 [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable Huang Rui
@ 2021-03-16 13:16 ` Joerg Roedel
  2021-03-16 13:36   ` Huang Rui
  0 siblings, 1 reply; 5+ messages in thread
From: Joerg Roedel @ 2021-03-16 13:16 UTC (permalink / raw)
  To: Huang Rui
  Cc: iommu, linux-kernel, Joerg Roedel, Suravee Suthikulpanit,
	Alex Deucher, Xiaojian Du, stable

Hi Huang,

On Thu, Mar 11, 2021 at 10:28:07PM +0800, Huang Rui wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index f0adbc48fd17..a08e885403b7 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3862,7 +3862,7 @@ static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
>  	else if (x86_fwspec_is_hpet(fwspec))
>  		devid = get_hpet_devid(fwspec->param[0]);
>  
> -	if (devid < 0)
> +	if (devid < 0 || !amd_iommu_rlookup_table)
>  		return 0;

The problem is deeper than this fix suggests. I prepared other fixes for
this particular problem. Please find them here:

	https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-fixes

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable
  2021-03-16 13:16 ` Joerg Roedel
@ 2021-03-16 13:36   ` Huang Rui
  2021-03-16 15:00     ` Joerg Roedel
  0 siblings, 1 reply; 5+ messages in thread
From: Huang Rui @ 2021-03-16 13:36 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Joerg Roedel, Suthikulpanit, Suravee,
	Deucher, Alexander, Du, Xiaojian, stable

On Tue, Mar 16, 2021 at 09:16:34PM +0800, Joerg Roedel wrote:
> Hi Huang,
> 
> On Thu, Mar 11, 2021 at 10:28:07PM +0800, Huang Rui wrote:
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index f0adbc48fd17..a08e885403b7 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -3862,7 +3862,7 @@ static int irq_remapping_select(struct irq_domain *d, struct irq_fwspec *fwspec,
> >  	else if (x86_fwspec_is_hpet(fwspec))
> >  		devid = get_hpet_devid(fwspec->param[0]);
> >  
> > -	if (devid < 0)
> > +	if (devid < 0 || !amd_iommu_rlookup_table)
> >  		return 0;
> 
> The problem is deeper than this fix suggests. I prepared other fixes for
> this particular problem. Please find them here:
> 

Thanks for the comments. Could you please elaborate this?

Do you mean while amd_iommu=off, we won't prepare the IVRS, and even
needn't get all ACPI talbes. Because they are never be used and the next
state will always goes into IOMMU_CMDLINE_DISABLED, am I right?

Thanks,
Ray

> 	https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fjoro%2Flinux.git%2Flog%2F%3Fh%3Diommu-fixes&amp;data=04%7C01%7Cray.huang%40amd.com%7Cce731dda3b444ac9a14108d8e87dbb3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637514974013915073%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NVahf1tIfgno%2BNWPu8hY4DygiGWdKXBJ8G6OsD%2BHC14%3D&amp;reserved=0
> 
> Regards,
> 
> 	Joerg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable
  2021-03-16 13:36   ` Huang Rui
@ 2021-03-16 15:00     ` Joerg Roedel
  2021-03-17 10:46       ` Huang Rui
  0 siblings, 1 reply; 5+ messages in thread
From: Joerg Roedel @ 2021-03-16 15:00 UTC (permalink / raw)
  To: Huang Rui
  Cc: iommu, linux-kernel, Joerg Roedel, Suthikulpanit, Suravee,
	Deucher, Alexander, Du, Xiaojian, stable

On Tue, Mar 16, 2021 at 09:36:02PM +0800, Huang Rui wrote:
> Thanks for the comments. Could you please elaborate this?
> 
> Do you mean while amd_iommu=off, we won't prepare the IVRS, and even
> needn't get all ACPI talbes. Because they are never be used and the next
> state will always goes into IOMMU_CMDLINE_DISABLED, am I right?

The first problem was that amd_iommu_irq_remap is never set back to
false when irq-remapping initialization fails in amd_iommu_prepare().

But there are other problems, like that even when the IOMMU is set to
disabled on the command line with amd_iommu=off, the code still sets up
all IOMMUs and registers IRQ domains for them.

Later the code checks wheter the IOMMU should stay disabled and tears
everything down, except for the IRQ domains, which stay in the global
list.

The APIs do not really support tearing down IRQ domains well, so its not
so easy to add this to the tear-down path. Now that the IRQ domains stay
in the list, the ACPI code will come along later and calls the
->select() call-back for every IRQ domain, which gets execution to
irq_remapping_select(), depite IOMMU being disabled and
amd_iommu_rlookup_table already de-allocated. But since
amd_iommu_irq_remap is still true the NULL pointer is dereferenced,
causing the crash.

When the IRQ domains would not be around, this would also not happen. So
my patches also change the initializtion to not do all the setup work
when amd_iommu=off was passed on the command line.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable
  2021-03-16 15:00     ` Joerg Roedel
@ 2021-03-17 10:46       ` Huang Rui
  0 siblings, 0 replies; 5+ messages in thread
From: Huang Rui @ 2021-03-17 10:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu, linux-kernel, Joerg Roedel, Suthikulpanit, Suravee,
	Deucher, Alexander, Du, Xiaojian, stable

On Tue, Mar 16, 2021 at 11:00:26PM +0800, Joerg Roedel wrote:
> On Tue, Mar 16, 2021 at 09:36:02PM +0800, Huang Rui wrote:
> > Thanks for the comments. Could you please elaborate this?
> > 
> > Do you mean while amd_iommu=off, we won't prepare the IVRS, and even
> > needn't get all ACPI talbes. Because they are never be used and the next
> > state will always goes into IOMMU_CMDLINE_DISABLED, am I right?
> 
> The first problem was that amd_iommu_irq_remap is never set back to
> false when irq-remapping initialization fails in amd_iommu_prepare().
> 
> But there are other problems, like that even when the IOMMU is set to
> disabled on the command line with amd_iommu=off, the code still sets up
> all IOMMUs and registers IRQ domains for them.
> 

Yes, that was the problem I found in my side. No very clear about the
orignal intention. But if you said this is a problem, that makes sense...

> Later the code checks wheter the IOMMU should stay disabled and tears
> everything down, except for the IRQ domains, which stay in the global
> list.
> 
> The APIs do not really support tearing down IRQ domains well, so its not
> so easy to add this to the tear-down path. Now that the IRQ domains stay
> in the list, the ACPI code will come along later and calls the
> ->select() call-back for every IRQ domain, which gets execution to
> irq_remapping_select(), depite IOMMU being disabled and
> amd_iommu_rlookup_table already de-allocated. But since
> amd_iommu_irq_remap is still true the NULL pointer is dereferenced,
> causing the crash.

OK. We found some memleaks here before as well. It looks cause by iommu data
buffer initialization and not be cleared. And yes, if setup iommu here, it
actually causes many issues.

unreferenced object 0xffff888332047500 (size 224):
  comm "swapper/0", pid 0, jiffies 4294892296 (age 362.192s)
  hex dump (first 32 bytes):
    b0 22 03 00 00 00 00 00 00 00 00 40 00 00 00 00  .".........@....
    06 00 00 00 00 00 00 00 00 20 00 00 00 20 00 00  ......... ... ..
  backtrace:
    [<000000008f162024>] kmem_cache_alloc+0x145/0x440
    [<00000000420093ba>] kmem_cache_create_usercopy+0x127/0x2c0
    [<00000000f5ed7ff0>] kmem_cache_create+0x16/0x20
    [<000000004c1e4f47>] iommu_go_to_state+0x8e4/0x165d
    [<00000000a191b705>] amd_iommu_prepare+0x1a/0x2f
    [<00000000afe5b97e>] irq_remapping_prepare+0x35/0x6d
    [<00000000209f36b5>] enable_IR_x2apic+0x2b/0x1ae
    [<00000000b64491b5>] default_setup_apic_routing+0x16/0x65
    [<00000000e89c64a1>] apic_intr_mode_init+0x81/0x10a
    [<00000000eaa14bf6>] x86_late_time_init+0x24/0x35
    [<00000000e291fc71>] start_kernel+0x509/0x5c7
    [<0000000099930dfe>] x86_64_start_reservations+0x24/0x26
    [<00000000b369765d>] x86_64_start_kernel+0x75/0x79
    [<000000004f411919>] secondary_startup_64_no_verify+0xb0/0xbb

> 
> When the IRQ domains would not be around, this would also not happen. So
> my patches also change the initializtion to not do all the setup work
> when amd_iommu=off was passed on the command line.
> 

Thanks for the fix and explaination. :-)

Best Regards,
Ray

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-17 10:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 14:28 [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable Huang Rui
2021-03-16 13:16 ` Joerg Roedel
2021-03-16 13:36   ` Huang Rui
2021-03-16 15:00     ` Joerg Roedel
2021-03-17 10:46       ` Huang Rui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).