* Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG @ 2024-03-27 19:11 Guenter Roeck 2024-03-27 19:33 ` Xi Ruoyao 0 siblings, 1 reply; 5+ messages in thread From: Guenter Roeck @ 2024-03-27 19:11 UTC (permalink / raw) To: loongarch Cc: Huacai Chen, WANG Xuerui, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev Hi, when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following backtraces when running loongarch images in qemu. [ 2.496257] kernel BUG at include/linux/scatterlist.h:187! ... [ 2.501925] Call Trace: [ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0 [ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4 [ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4 [ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690 [ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40 [ 2.502775] [<90000000041b4508>] kthread+0x138/0x158 [ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4 The backtrace is always similar but not exactly the same. It is always triggered from cryptomgr_test, but not always from the same test. Analysis shows that with CONFIG_KFENCE active, the address returned from kmalloc() and friends is not always below vm_map_base. It is allocated by kfence_alloc() which at least sometimes seems to get its memory from an address space above vm_map_base. This causes virt_addr_valid() to return false for the affected objects. I have only seen this if CONFIG_DEBUG_SG is enabled because sg_set_buf() otherwise does not call virt_addr_valid(), but I found that many memory allocation calls return addresses above vm_map_base, making this a potential problem when running loongarch images with CONFIG_KFENCE enabled whenever some code calls virt_addr_valid(). I don't know how to solve the problem, but I did notice that virt_to_page() does handle situations with addr >= vm_map_base. Maybe a similar solution would be possible for virt_addr_valid(). Thanks, Guenter ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG 2024-03-27 19:11 Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG Guenter Roeck @ 2024-03-27 19:33 ` Xi Ruoyao 2024-03-27 23:38 ` Guenter Roeck 0 siblings, 1 reply; 5+ messages in thread From: Xi Ruoyao @ 2024-03-27 19:33 UTC (permalink / raw) To: Guenter Roeck, loongarch Cc: Huacai Chen, WANG Xuerui, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev On Wed, 2024-03-27 at 12:11 -0700, Guenter Roeck wrote: > Hi, > > when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following > backtraces when running loongarch images in qemu. > > [ 2.496257] kernel BUG at include/linux/scatterlist.h:187! > ... > [ 2.501925] Call Trace: > [ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0 > [ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4 > [ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4 > [ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690 > [ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40 > [ 2.502775] [<90000000041b4508>] kthread+0x138/0x158 > [ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4 > > The backtrace is always similar but not exactly the same. It is always > triggered from cryptomgr_test, but not always from the same test. > > Analysis shows that with CONFIG_KFENCE active, the address returned from > kmalloc() and friends is not always below vm_map_base. It is allocated by > kfence_alloc() which at least sometimes seems to get its memory from an > address space above vm_map_base. This causes virt_addr_valid() to return > false for the affected objects. Oops, Xuerui has been haunted by some "random" kernel crashes only occurring with CONFIG_KFENCE=y for months but we weren't able to triage the issue: https://github.com/loongson-community/discussions/issues/34 Maybe the same issue or not. > I have only seen this if CONFIG_DEBUG_SG is enabled because sg_set_buf() > otherwise does not call virt_addr_valid(), but I found that many memory > allocation calls return addresses above vm_map_base, making this a > potential problem when running loongarch images with CONFIG_KFENCE enabled > whenever some code calls virt_addr_valid(). > > I don't know how to solve the problem, but I did notice that virt_to_page() > does handle situations with addr >= vm_map_base. Maybe a similar solution > would be possible for virt_addr_valid(). > > Thanks, > Guenter > -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG 2024-03-27 19:33 ` Xi Ruoyao @ 2024-03-27 23:38 ` Guenter Roeck 2024-03-29 2:17 ` Huacai Chen 0 siblings, 1 reply; 5+ messages in thread From: Guenter Roeck @ 2024-03-27 23:38 UTC (permalink / raw) To: Xi Ruoyao Cc: loongarch, Huacai Chen, WANG Xuerui, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev On Thu, Mar 28, 2024 at 03:33:03AM +0800, Xi Ruoyao wrote: > On Wed, 2024-03-27 at 12:11 -0700, Guenter Roeck wrote: > > Hi, > > > > when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following > > backtraces when running loongarch images in qemu. > > > > [ 2.496257] kernel BUG at include/linux/scatterlist.h:187! > > ... > > [ 2.501925] Call Trace: > > [ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0 > > [ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4 > > [ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4 > > [ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690 > > [ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40 > > [ 2.502775] [<90000000041b4508>] kthread+0x138/0x158 > > [ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4 > > > > The backtrace is always similar but not exactly the same. It is always > > triggered from cryptomgr_test, but not always from the same test. > > > > Analysis shows that with CONFIG_KFENCE active, the address returned from > > kmalloc() and friends is not always below vm_map_base. It is allocated by > > kfence_alloc() which at least sometimes seems to get its memory from an > > address space above vm_map_base. This causes virt_addr_valid() to return > > false for the affected objects. > > Oops, Xuerui has been haunted by some "random" kernel crashes only > occurring with CONFIG_KFENCE=y for months but we weren't able to triage > the issue: > > https://github.com/loongson-community/discussions/issues/34 > > Maybe the same issue or not. > Good question. I suspect it might at least be related. Maybe people can try the patch below. It seems to fix the probem for me. It might well be, though, that there are other instances in the code where the same or a similar check is needed. Thanks, Guenter --- diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c index a9630a81b38a..89af7c12e8c0 100644 --- a/arch/loongarch/mm/mmap.c +++ b/arch/loongarch/mm/mmap.c @@ -4,6 +4,7 @@ */ #include <linux/export.h> #include <linux/io.h> +#include <linux/kfence.h> #include <linux/memblock.h> #include <linux/mm.h> #include <linux/mman.h> @@ -111,6 +112,9 @@ int __virt_addr_valid(volatile void *kaddr) { unsigned long vaddr = (unsigned long)kaddr; + if (is_kfence_address((void *)kaddr)) + return 1; + if ((vaddr < PAGE_OFFSET) || (vaddr >= vm_map_base)) return 0; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG 2024-03-27 23:38 ` Guenter Roeck @ 2024-03-29 2:17 ` Huacai Chen 2024-03-29 16:32 ` Guenter Roeck 0 siblings, 1 reply; 5+ messages in thread From: Huacai Chen @ 2024-03-29 2:17 UTC (permalink / raw) To: Guenter Roeck Cc: Xi Ruoyao, loongarch, WANG Xuerui, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev Hi, Guenter, Thank you for your report, we find there are several kfence-related problems, and we have solved part of them. Link: https://github.com/chenhuacai/linux/commits/loongarch-next Huacai On Thu, Mar 28, 2024 at 7:39 AM Guenter Roeck <linux@roeck-us.net> wrote: > > On Thu, Mar 28, 2024 at 03:33:03AM +0800, Xi Ruoyao wrote: > > On Wed, 2024-03-27 at 12:11 -0700, Guenter Roeck wrote: > > > Hi, > > > > > > when enabling both CONFIG_KFENCE and CONFIG_DEBUG_SG, I get the following > > > backtraces when running loongarch images in qemu. > > > > > > [ 2.496257] kernel BUG at include/linux/scatterlist.h:187! > > > ... > > > [ 2.501925] Call Trace: > > > [ 2.501950] [<9000000004ad59c4>] sg_init_one+0xac/0xc0 > > > [ 2.502204] [<9000000004a438f8>] do_test_kpp+0x278/0x6e4 > > > [ 2.502353] [<9000000004a43dd4>] alg_test_kpp+0x70/0xf4 > > > [ 2.502494] [<9000000004a41b48>] alg_test+0x128/0x690 > > > [ 2.502631] [<9000000004a3d898>] cryptomgr_test+0x20/0x40 > > > [ 2.502775] [<90000000041b4508>] kthread+0x138/0x158 > > > [ 2.502912] [<9000000004161c48>] ret_from_kernel_thread+0xc/0xa4 > > > > > > The backtrace is always similar but not exactly the same. It is always > > > triggered from cryptomgr_test, but not always from the same test. > > > > > > Analysis shows that with CONFIG_KFENCE active, the address returned from > > > kmalloc() and friends is not always below vm_map_base. It is allocated by > > > kfence_alloc() which at least sometimes seems to get its memory from an > > > address space above vm_map_base. This causes virt_addr_valid() to return > > > false for the affected objects. > > > > Oops, Xuerui has been haunted by some "random" kernel crashes only > > occurring with CONFIG_KFENCE=y for months but we weren't able to triage > > the issue: > > > > https://github.com/loongson-community/discussions/issues/34 > > > > Maybe the same issue or not. > > > > Good question. I suspect it might at least be related. > > Maybe people can try the patch below. It seems to fix the probem for me. > It might well be, though, that there are other instances in the code > where the same or a similar check is needed. > > Thanks, > Guenter > > --- > diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c > index a9630a81b38a..89af7c12e8c0 100644 > --- a/arch/loongarch/mm/mmap.c > +++ b/arch/loongarch/mm/mmap.c > @@ -4,6 +4,7 @@ > */ > #include <linux/export.h> > #include <linux/io.h> > +#include <linux/kfence.h> > #include <linux/memblock.h> > #include <linux/mm.h> > #include <linux/mman.h> > @@ -111,6 +112,9 @@ int __virt_addr_valid(volatile void *kaddr) > { > unsigned long vaddr = (unsigned long)kaddr; > > + if (is_kfence_address((void *)kaddr)) > + return 1; > + > if ((vaddr < PAGE_OFFSET) || (vaddr >= vm_map_base)) > return 0; > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG 2024-03-29 2:17 ` Huacai Chen @ 2024-03-29 16:32 ` Guenter Roeck 0 siblings, 0 replies; 5+ messages in thread From: Guenter Roeck @ 2024-03-29 16:32 UTC (permalink / raw) To: Huacai Chen Cc: Xi Ruoyao, loongarch, WANG Xuerui, Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev On 3/28/24 19:17, Huacai Chen wrote: > Hi, Guenter, > > Thank you for your report, we find there are several kfence-related > problems, and we have solved part of them. > Link: https://github.com/chenhuacai/linux/commits/loongarch-next > Thanks a lot for the update. A note regarding the patches in that tree, not related to the kfence problem: I don't immediately see why the hwmon driver should reside outside drivers/hwmon/, and hwmon_device_register_with_groups() is deprecated and should not be used in new drivers. On top of that, shutting off the system in case of thermal issues is not the responsibility of a hardware monitoring driver. That functionality should be handled by the thermal subsystem. Thanks, Guenter ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-29 16:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-27 19:11 Kernel BUG with loongarch and CONFIG_KFENCE and CONFIG_DEBUG_SG Guenter Roeck 2024-03-27 19:33 ` Xi Ruoyao 2024-03-27 23:38 ` Guenter Roeck 2024-03-29 2:17 ` Huacai Chen 2024-03-29 16:32 ` Guenter Roeck
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.