All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG
@ 2024-03-27 19:11 Guenter Roeck
  2024-03-27 19:33 ` Xi Ruoyao
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2024-03-27 19:11 UTC (permalink / raw)
  To: loongarch
  Cc: Huacai Chen, WANG Xuerui, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev

Hi,

when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following
backtraces when running loongarch images in qemu.

[    2.496257] kernel BUG at include/linux/scatterlist.h:187!
...
[    2.501925] Call Trace:
[    2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0
[    2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4
[    2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4
[    2.502494] [<9000000004a41b48>] alg_test+0x128/0x690
[    2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40
[    2.502775] [<90000000041b4508>] kthread+0x138/0x158
[    2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4

The backtrace is always similar but not exactly the same. It is always
triggered from cryptomgr_test, but not always from the same test.

Analysis shows that with CONFIG_KFENCE active, the address returned from
kmalloc() and friends is not always below vm_map_base. It is allocated by
kfence_alloc() which at least sometimes seems to get its memory from an
address space above vm_map_base. This causes virt_addr_valid() to return
false for the affected objects.

I have only seen this if CONFIG_DEBUG_SG is enabled because sg_set_buf()
otherwise does not call virt_addr_valid(), but I found that many memory
allocation calls return addresses above vm_map_base, making this a
potential problem when running loongarch images with CONFIG_KFENCE enabled
whenever some code calls virt_addr_valid().

I don't know how to solve the problem, but I did notice that virt_to_page()
does handle situations with addr >= vm_map_base. Maybe a similar solution
would be possible for virt_addr_valid().

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG
  2024-03-27 19:11 Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG Guenter Roeck
@ 2024-03-27 19:33 ` Xi Ruoyao
  2024-03-27 23:38   ` Guenter Roeck
  0 siblings, 1 reply; 5+ messages in thread
From: Xi Ruoyao @ 2024-03-27 19:33 UTC (permalink / raw)
  To: Guenter Roeck, loongarch
  Cc: Huacai Chen, WANG Xuerui, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev

On Wed, 2024-03-27 at 12:11 -0700, Guenter Roeck wrote:
> Hi,
> 
> when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following
> backtraces when running loongarch images in qemu.
> 
> [    2.496257] kernel BUG at include/linux/scatterlist.h:187!
> ...
> [    2.501925] Call Trace:
> [    2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0
> [    2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4
> [    2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4
> [    2.502494] [<9000000004a41b48>] alg_test+0x128/0x690
> [    2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40
> [    2.502775] [<90000000041b4508>] kthread+0x138/0x158
> [    2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4
> 
> The backtrace is always similar but not exactly the same. It is always
> triggered from cryptomgr_test, but not always from the same test.
> 
> Analysis shows that with CONFIG_KFENCE active, the address returned from
> kmalloc() and friends is not always below vm_map_base. It is allocated by
> kfence_alloc() which at least sometimes seems to get its memory from an
> address space above vm_map_base. This causes virt_addr_valid() to return
> false for the affected objects.

Oops, Xuerui has been haunted by some "random" kernel crashes only
occurring with CONFIG_KFENCE=y for months but we weren't able to triage
the issue:

https://github.com/loongson-community/discussions/issues/34

Maybe the same issue or not.

> I have only seen this if CONFIG_DEBUG_SG is enabled because sg_set_buf()
> otherwise does not call virt_addr_valid(), but I found that many memory
> allocation calls return addresses above vm_map_base, making this a
> potential problem when running loongarch images with CONFIG_KFENCE enabled
> whenever some code calls virt_addr_valid().
> 
> I don't know how to solve the problem, but I did notice that virt_to_page()
> does handle situations with addr >= vm_map_base. Maybe a similar solution
> would be possible for virt_addr_valid().
> 
> Thanks,
> Guenter
> 

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG
  2024-03-27 19:33 ` Xi Ruoyao
@ 2024-03-27 23:38   ` Guenter Roeck
  2024-03-29  2:17     ` Huacai Chen
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2024-03-27 23:38 UTC (permalink / raw)
  To: Xi Ruoyao
  Cc: loongarch, Huacai Chen, WANG Xuerui, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev

On Thu, Mar 28, 2024 at 03:33:03AM +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-27 at 12:11 -0700, Guenter Roeck wrote:
> > Hi,
> > 
> > when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following
> > backtraces when running loongarch images in qemu.
> > 
> > [    2.496257] kernel BUG at include/linux/scatterlist.h:187!
> > ...
> > [    2.501925] Call Trace:
> > [    2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0
> > [    2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4
> > [    2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4
> > [    2.502494] [<9000000004a41b48>] alg_test+0x128/0x690
> > [    2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40
> > [    2.502775] [<90000000041b4508>] kthread+0x138/0x158
> > [    2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4
> > 
> > The backtrace is always similar but not exactly the same. It is always
> > triggered from cryptomgr_test, but not always from the same test.
> > 
> > Analysis shows that with CONFIG_KFENCE active, the address returned from
> > kmalloc() and friends is not always below vm_map_base. It is allocated by
> > kfence_alloc() which at least sometimes seems to get its memory from an
> > address space above vm_map_base. This causes virt_addr_valid() to return
> > false for the affected objects.
> 
> Oops, Xuerui has been haunted by some "random" kernel crashes only
> occurring with CONFIG_KFENCE=y for months but we weren't able to triage
> the issue:
> 
> https://github.com/loongson-community/discussions/issues/34
> 
> Maybe the same issue or not.
> 

Good question. I suspect it might at least be related.

Maybe people can try the patch below. It seems to fix the probem for me.
It might well be, though, that there are other instances in the code
where the same or a similar check is needed.

Thanks,
Guenter

---
diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c
index a9630a81b38a..89af7c12e8c0 100644
--- a/arch/loongarch/mm/mmap.c
+++ b/arch/loongarch/mm/mmap.c
@@ -4,6 +4,7 @@
  */
 #include <linux/export.h>
 #include <linux/io.h>
+#include <linux/kfence.h>
 #include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/mman.h>
@@ -111,6 +112,9 @@ int __virt_addr_valid(volatile void *kaddr)
 {
 	unsigned long vaddr = (unsigned long)kaddr;
 
+	if (is_kfence_address((void *)kaddr))
+		return 1;
+
 	if ((vaddr < PAGE_OFFSET) || (vaddr >= vm_map_base))
 		return 0;
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG
  2024-03-27 23:38   ` Guenter Roeck
@ 2024-03-29  2:17     ` Huacai Chen
  2024-03-29 16:32       ` Guenter Roeck
  0 siblings, 1 reply; 5+ messages in thread
From: Huacai Chen @ 2024-03-29  2:17 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Xi Ruoyao, loongarch, WANG Xuerui, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev

Hi, Guenter,

Thank you for your report, we find there are several kfence-related
problems, and we have solved part of them.
Link: https://github.com/chenhuacai/linux/commits/loongarch-next

Huacai

On Thu, Mar 28, 2024 at 7:39 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Thu, Mar 28, 2024 at 03:33:03AM +0800, Xi Ruoyao wrote:
> > On Wed, 2024-03-27 at 12:11 -0700, Guenter Roeck wrote:
> > > Hi,
> > >
> > > when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following
> > > backtraces when running loongarch images in qemu.
> > >
> > > [    2.496257] kernel BUG at include/linux/scatterlist.h:187!
> > > ...
> > > [    2.501925] Call Trace:
> > > [    2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0
> > > [    2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4
> > > [    2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4
> > > [    2.502494] [<9000000004a41b48>] alg_test+0x128/0x690
> > > [    2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40
> > > [    2.502775] [<90000000041b4508>] kthread+0x138/0x158
> > > [    2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4
> > >
> > > The backtrace is always similar but not exactly the same. It is always
> > > triggered from cryptomgr_test, but not always from the same test.
> > >
> > > Analysis shows that with CONFIG_KFENCE active, the address returned from
> > > kmalloc() and friends is not always below vm_map_base. It is allocated by
> > > kfence_alloc() which at least sometimes seems to get its memory from an
> > > address space above vm_map_base. This causes virt_addr_valid() to return
> > > false for the affected objects.
> >
> > Oops, Xuerui has been haunted by some "random" kernel crashes only
> > occurring with CONFIG_KFENCE=y for months but we weren't able to triage
> > the issue:
> >
> > https://github.com/loongson-community/discussions/issues/34
> >
> > Maybe the same issue or not.
> >
>
> Good question. I suspect it might at least be related.
>
> Maybe people can try the patch below. It seems to fix the probem for me.
> It might well be, though, that there are other instances in the code
> where the same or a similar check is needed.
>
> Thanks,
> Guenter
>
> ---
> diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c
> index a9630a81b38a..89af7c12e8c0 100644
> --- a/arch/loongarch/mm/mmap.c
> +++ b/arch/loongarch/mm/mmap.c
> @@ -4,6 +4,7 @@
>   */
>  #include <linux/export.h>
>  #include <linux/io.h>
> +#include <linux/kfence.h>
>  #include <linux/memblock.h>
>  #include <linux/mm.h>
>  #include <linux/mman.h>
> @@ -111,6 +112,9 @@ int __virt_addr_valid(volatile void *kaddr)
>  {
>         unsigned long vaddr = (unsigned long)kaddr;
>
> +       if (is_kfence_address((void *)kaddr))
> +               return 1;
> +
>         if ((vaddr < PAGE_OFFSET) || (vaddr >= vm_map_base))
>                 return 0;
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG
  2024-03-29  2:17     ` Huacai Chen
@ 2024-03-29 16:32       ` Guenter Roeck
  0 siblings, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2024-03-29 16:32 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Xi Ruoyao, loongarch, WANG Xuerui, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev

On 3/28/24 19:17, Huacai Chen wrote:
> Hi, Guenter,
> 
> Thank you for your report, we find there are several kfence-related
> problems, and we have solved part of them.
> Link: https://github.com/chenhuacai/linux/commits/loongarch-next
> 

Thanks a lot for the update.

A note regarding the patches in that tree, not related to the kfence
problem: I don't immediately see why the hwmon driver should reside
outside drivers/hwmon/, and hwmon_device_register_with_groups() is
deprecated and should not be used in new drivers.
On top of that, shutting off the system in case of thermal issues
is not the responsibility of a hardware monitoring driver.
That functionality should be handled by the thermal subsystem.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-29 16:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-27 19:11 Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG Guenter Roeck
2024-03-27 19:33 ` Xi Ruoyao
2024-03-27 23:38   ` Guenter Roeck
2024-03-29  2:17     ` Huacai Chen
2024-03-29 16:32       ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.