* [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree @ 2021-05-12 20:29 akpm 2021-05-12 22:56 ` Stephen Rothwell 0 siblings, 1 reply; 17+ messages in thread From: akpm @ 2021-05-12 20:29 UTC (permalink / raw) To: hdanton, mgorman, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, sfr, urezki, willy The patch titled Subject: mm/vmalloc: print a warning message first on failure has been removed from the -mm tree. Its filename was mm-vmalloc-print-a-warning-message-first-on-failure.patch This patch was dropped because it had testing failures ------------------------------------------------------ From: Uladzislau Rezki <urezki@gmail.com> Subject: mm/vmalloc: print a warning message first on failure When a memory allocation for array of pages are not succeed emit a warning message as a first step and then perform the further cleanup. The reason it should be done in a right order is the clean up function which is free_vm_area() can potentially also follow its error paths what can lead to confusion what was broken first. Link: https://lkml.kernel.org/r/20210510103342.GA2169@pc638.lan Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmalloc.c~mm-vmalloc-print-a-warning-message-first-on-failure +++ a/mm/vmalloc.c @@ -2780,11 +2780,11 @@ static void *__vmalloc_area_node(struct } if (!area->pages) { - free_vm_area(area); warn_alloc(gfp_mask, NULL, "vmalloc size %lu allocation failure: " "page array size %lu allocation failed", nr_small_pages * PAGE_SIZE, array_size); + free_vm_area(area); return NULL; } _ Patches currently in -mm which might be from urezki@gmail.com are ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-12 20:29 [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree akpm @ 2021-05-12 22:56 ` Stephen Rothwell 2021-05-13 10:31 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Stephen Rothwell @ 2021-05-12 22:56 UTC (permalink / raw) To: akpm Cc: hdanton, mgorman, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, urezki, willy [-- Attachment #1: Type: text/plain, Size: 423 bytes --] Hi Andrew, On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote: > > The patch titled > Subject: mm/vmalloc: print a warning message first on failure > has been removed from the -mm tree. Its filename was > mm-vmalloc-print-a-warning-message-first-on-failure.patch > > This patch was dropped because it had testing failures Removed from linux-next. -- Cheers, Stephen Rothwell [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-12 22:56 ` Stephen Rothwell @ 2021-05-13 10:31 ` Uladzislau Rezki 2021-05-13 11:11 ` Mel Gorman 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-13 10:31 UTC (permalink / raw) To: Stephen Rothwell, akpm Cc: akpm, hdanton, mgorman, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, urezki, willy On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote: > Hi Andrew, > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote: > > > > The patch titled > > Subject: mm/vmalloc: print a warning message first on failure > > has been removed from the -mm tree. Its filename was > > mm-vmalloc-print-a-warning-message-first-on-failure.patch > > > > This patch was dropped because it had testing failures > > Removed from linux-next. > What can of testing failures does it trigger? Where can i find the details, logs or tracers of it? -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-13 10:31 ` Uladzislau Rezki @ 2021-05-13 11:11 ` Mel Gorman 2021-05-13 12:46 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-13 11:11 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, willy On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote: > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote: > > Hi Andrew, > > > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote: > > > > > > The patch titled > > > Subject: mm/vmalloc: print a warning message first on failure > > > has been removed from the -mm tree. Its filename was > > > mm-vmalloc-print-a-warning-message-first-on-failure.patch > > > > > > This patch was dropped because it had testing failures > > > > Removed from linux-next. > > > What can of testing failures does it trigger? Where can i find the > details, logs or tracers of it? https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/ -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-13 11:11 ` Mel Gorman @ 2021-05-13 12:46 ` Uladzislau Rezki 2021-05-13 13:24 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-13 12:46 UTC (permalink / raw) To: Mel Gorman, Stephen Rothwell, akpm Cc: Uladzislau Rezki, Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, willy On Thu, May 13, 2021 at 12:11:53PM +0100, Mel Gorman wrote: > On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote: > > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote: > > > Hi Andrew, > > > > > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote: > > > > > > > > The patch titled > > > > Subject: mm/vmalloc: print a warning message first on failure > > > > has been removed from the -mm tree. Its filename was > > > > mm-vmalloc-print-a-warning-message-first-on-failure.patch > > > > > > > > This patch was dropped because it had testing failures > > > > > > Removed from linux-next. > > > > > What can of testing failures does it trigger? Where can i find the > > details, logs or tracers of it? > > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/ > Thanks, Mel. OK. Now i see. The problem is with this patch: mm/vmalloc: switch to bulk allocator in __vmalloc_area_node() <snip> [ 0.097819][ T1] BUG: Unable to handle kernel data access on read at 0x200000c0a [ 0.098533][ T1] Faulting instruction address: 0xc0000000003f6fa0 [ 0.099044][ T1] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.099182][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 0.099506][ T1] Modules linked in: [ 0.099896][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00142-g6053672bb612 #12 [ 0.100254][ T1] NIP: c0000000003f6fa0 LR: c0000000003f6f68 CTR: 0000000000000000 [ 0.100342][ T1] REGS: c0000000063a3480 TRAP: 0380 Not tainted (5.13.0-rc1-00142-g6053672bb612) [ 0.100550][ T1] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24402840 XER: 00000000 [ 0.100900][ T1] CFAR: c0000000003f6f7c IRQMASK: 0 [ 0.100900][ T1] GPR00: c0000000003f6f68 c0000000063a3720 c00000000146b100 0000000000000000 [ 0.100900][ T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002 [ 0.100900][ T1] GPR08: c0000000015219e8 0000000000000000 0000000200000c02 c000000006030010 [ 0.100900][ T1] GPR12: 0000000000008000 c000000001640000 0000000000000001 c000000000262f84 [ 0.100900][ T1] GPR16: c00a000000000000 c008000000000000 0000000000000dc0 0000000000000008 [ 0.100900][ T1] GPR20: 0000000000000522 0000000000010000 0000000000000cc0 c008000000000000 [ 0.100900][ T1] GPR24: 0000000000000001 0000000000000000 0000000000002cc2 0000000000000000 [ 0.100900][ T1] GPR28: 0000000000000000 0000000000000000 0000000200000c02 0000000000002cc2 [ 0.101927][ T1] NIP [c0000000003f6fa0] __alloc_pages+0x140/0x3f0 [ 0.102733][ T1] LR [c0000000003f6f68] __alloc_pages+0x108/0x3f0 [ 0.103032][ T1] Call Trace: [ 0.103165][ T1] [c0000000063a3720] [0000000000000900] 0x900 (unreliable) [ 0.103616][ T1] [c0000000063a37b0] [c0000000003f7810] __alloc_pages_bulk+0x5c0/0x840 [ 0.103787][ T1] [c0000000063a3890] [c0000000003ecf74] __vmalloc_node_range+0x4c4/0x600 [ 0.103871][ T1] [c0000000063a39b0] [c00000000004f598] module_alloc+0x58/0x70 [ 0.103962][ T1] [c0000000063a3a20] [c000000000262f84] alloc_insn_page+0x24/0x40 [ 0.104046][ T1] [c0000000063a3a40] [c00000000026629c] __get_insn_slot+0x1dc/0x280 [ 0.104143][ T1] [c0000000063a3a80] [c00000000005770c] arch_prepare_kprobe+0x15c/0x1f0 [ 0.104290][ T1] [c0000000063a3b00] [c000000000267880] register_kprobe+0x6d0/0x850 [ 0.104392][ T1] [c0000000063a3b60] [c00000000108fe2c] arch_init_kprobes+0x28/0x3c [ 0.104524][ T1] [c0000000063a3b80] [c0000000010addb0] init_kprobes+0x120/0x174 [ 0.104629][ T1] [c0000000063a3bf0] [c000000000012190] do_one_initcall+0x60/0x2c0 [ 0.104722][ T1] [c0000000063a3cc0] [c0000000010845a0] kernel_init_freeable+0x1bc/0x3a0 [ 0.104826][ T1] [c0000000063a3da0] [c000000000012764] kernel_init+0x2c/0x168 [ 0.104911][ T1] [c0000000063a3e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70 [ 0.105178][ T1] Instruction dump: [ 0.105516][ T1] 40920018 57e9efbe 2c090001 4082000c 63050080 78b80020 e8a10028 57e9a7fe [ 0.105759][ T1] 7fcaf378 99210040 2c250000 408201f4 <813e0008> 7c09c840 418101e8 57e50528 [ 0.107188][ T1] ---[ end trace 9bd7c2fac4d061e2 ]--- [ 0.107319][ T1] [ 1.108818][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b <snip> So during the boot process when the module is about to be loaded, the vmalloc allocation gets failed in the __alloc_pages_bulk(). Will try to reproduce. It would be good to get a kernel config. Appreciate for any thoughts about it? Thanks! -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-13 12:46 ` Uladzislau Rezki @ 2021-05-13 13:24 ` Uladzislau Rezki 2021-05-13 14:18 ` Mel Gorman 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-13 13:24 UTC (permalink / raw) To: Mel Gorman Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, willy On Thu, May 13, 2021 at 02:46:05PM +0200, Uladzislau Rezki wrote: > On Thu, May 13, 2021 at 12:11:53PM +0100, Mel Gorman wrote: > > On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote: > > > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote: > > > > Hi Andrew, > > > > > > > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote: > > > > > > > > > > The patch titled > > > > > Subject: mm/vmalloc: print a warning message first on failure > > > > > has been removed from the -mm tree. Its filename was > > > > > mm-vmalloc-print-a-warning-message-first-on-failure.patch > > > > > > > > > > This patch was dropped because it had testing failures > > > > > > > > Removed from linux-next. > > > > > > > What can of testing failures does it trigger? Where can i find the > > > details, logs or tracers of it? > > > > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/ > > > Thanks, Mel. > > OK. Now i see. The problem is with this patch: > > mm/vmalloc: switch to bulk allocator in __vmalloc_area_node() > > <snip> > [ 0.097819][ T1] BUG: Unable to handle kernel data access on read at 0x200000c0a > [ 0.098533][ T1] Faulting instruction address: 0xc0000000003f6fa0 > [ 0.099044][ T1] Oops: Kernel access of bad area, sig: 11 [#1] > [ 0.099182][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 0.099506][ T1] Modules linked in: > [ 0.099896][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00142-g6053672bb612 #12 > [ 0.100254][ T1] NIP: c0000000003f6fa0 LR: c0000000003f6f68 CTR: 0000000000000000 > [ 0.100342][ T1] REGS: c0000000063a3480 TRAP: 0380 Not tainted (5.13.0-rc1-00142-g6053672bb612) > [ 0.100550][ T1] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24402840 XER: 00000000 > [ 0.100900][ T1] CFAR: c0000000003f6f7c IRQMASK: 0 > [ 0.100900][ T1] GPR00: c0000000003f6f68 c0000000063a3720 c00000000146b100 0000000000000000 > [ 0.100900][ T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002 > [ 0.100900][ T1] GPR08: c0000000015219e8 0000000000000000 0000000200000c02 c000000006030010 > [ 0.100900][ T1] GPR12: 0000000000008000 c000000001640000 0000000000000001 c000000000262f84 > [ 0.100900][ T1] GPR16: c00a000000000000 c008000000000000 0000000000000dc0 0000000000000008 > [ 0.100900][ T1] GPR20: 0000000000000522 0000000000010000 0000000000000cc0 c008000000000000 > [ 0.100900][ T1] GPR24: 0000000000000001 0000000000000000 0000000000002cc2 0000000000000000 > [ 0.100900][ T1] GPR28: 0000000000000000 0000000000000000 0000000200000c02 0000000000002cc2 > [ 0.101927][ T1] NIP [c0000000003f6fa0] __alloc_pages+0x140/0x3f0 > [ 0.102733][ T1] LR [c0000000003f6f68] __alloc_pages+0x108/0x3f0 > [ 0.103032][ T1] Call Trace: > [ 0.103165][ T1] [c0000000063a3720] [0000000000000900] 0x900 (unreliable) > [ 0.103616][ T1] [c0000000063a37b0] [c0000000003f7810] __alloc_pages_bulk+0x5c0/0x840 > [ 0.103787][ T1] [c0000000063a3890] [c0000000003ecf74] __vmalloc_node_range+0x4c4/0x600 > [ 0.103871][ T1] [c0000000063a39b0] [c00000000004f598] module_alloc+0x58/0x70 > [ 0.103962][ T1] [c0000000063a3a20] [c000000000262f84] alloc_insn_page+0x24/0x40 > [ 0.104046][ T1] [c0000000063a3a40] [c00000000026629c] __get_insn_slot+0x1dc/0x280 > [ 0.104143][ T1] [c0000000063a3a80] [c00000000005770c] arch_prepare_kprobe+0x15c/0x1f0 > [ 0.104290][ T1] [c0000000063a3b00] [c000000000267880] register_kprobe+0x6d0/0x850 > [ 0.104392][ T1] [c0000000063a3b60] [c00000000108fe2c] arch_init_kprobes+0x28/0x3c > [ 0.104524][ T1] [c0000000063a3b80] [c0000000010addb0] init_kprobes+0x120/0x174 > [ 0.104629][ T1] [c0000000063a3bf0] [c000000000012190] do_one_initcall+0x60/0x2c0 > [ 0.104722][ T1] [c0000000063a3cc0] [c0000000010845a0] kernel_init_freeable+0x1bc/0x3a0 > [ 0.104826][ T1] [c0000000063a3da0] [c000000000012764] kernel_init+0x2c/0x168 > [ 0.104911][ T1] [c0000000063a3e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70 > [ 0.105178][ T1] Instruction dump: > [ 0.105516][ T1] 40920018 57e9efbe 2c090001 4082000c 63050080 78b80020 e8a10028 57e9a7fe > [ 0.105759][ T1] 7fcaf378 99210040 2c250000 408201f4 <813e0008> 7c09c840 418101e8 57e50528 > [ 0.107188][ T1] ---[ end trace 9bd7c2fac4d061e2 ]--- > [ 0.107319][ T1] > [ 1.108818][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > <snip> > > So during the boot process when the module is about to be loaded, the vmalloc allocation > gets failed in the __alloc_pages_bulk(). > > Will try to reproduce. It would be good to get a kernel config. > Appreciate for any thoughts about it? > I see that on the target machine when the problem occurs the PAGE_SIZE is 64K. Can it be somehow connected to it? Also one question, just guessing, the crash happens during the boot, therefore the question is: is __alloc_pages_bulk() fully initialized by that time? Thanks! -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-13 13:24 ` Uladzislau Rezki @ 2021-05-13 14:18 ` Mel Gorman [not found] ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com> 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-13 14:18 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin, oleksiy.avramchenko, rostedt, willy On Thu, May 13, 2021 at 03:24:18PM +0200, Uladzislau Rezki wrote: > > > > What can of testing failures does it trigger? Where can i find the > > > > details, logs or tracers of it? > > > > > > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/ > > > > > Thanks, Mel. > > > > OK. Now i see. The problem is with this patch: > > > > mm/vmalloc: switch to bulk allocator in __vmalloc_area_node() > > > > <SNIP> > > > > So during the boot process when the module is about to be loaded, the vmalloc allocation > > gets failed in the __alloc_pages_bulk(). > > > > Will try to reproduce. It would be good to get a kernel config. > > Appreciate for any thoughts about it? > > > > I see that on the target machine when the problem occurs the PAGE_SIZE is 64K. > > Can it be somehow connected to it? Also one question, just guessing, the crash > happens during the boot, therefore the question is: is __alloc_pages_bulk() > fully initialized by that time? > A boot test on KVM using my distribution config (openSUSE Leap 15.2) failed on x86-64 so it's not related to PAGE_SIZE. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>]
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree [not found] ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com> @ 2021-05-13 15:51 ` Mel Gorman 2021-05-13 20:18 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-13 15:51 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox [-- Attachment #1: Type: text/plain, Size: 176 bytes --] On Thu, May 13, 2021 at 05:29:05PM +0200, Uladzislau Rezki wrote: > Could you please send your config? I will try to reproduce with it. > Attached. -- Mel Gorman SUSE Labs [-- Attachment #2: config-bulkvmalloc.gz --] [-- Type: application/x-gzip, Size: 59818 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-13 15:51 ` Mel Gorman @ 2021-05-13 20:18 ` Uladzislau Rezki 2021-05-14 10:19 ` Mel Gorman 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-13 20:18 UTC (permalink / raw) To: Mel Gorman Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Thu, May 13, 2021 at 04:51:33PM +0100, Mel Gorman wrote: > On Thu, May 13, 2021 at 05:29:05PM +0200, Uladzislau Rezki wrote: > > Could you please send your config? I will try to reproduce with it. > > > > Attached. > Thanks. With your .config file i am able to reproduce the kernel panic. Actually when a one page is requested the __alloc_pages_bulk() enters to the single page allocator: <snip> /* Use the single page allocator for one page. */ if (nr_pages - nr_populated == 1) goto failed; ... failed: page = __alloc_pages(gfp, 0, preferred_nid, nodemask); if (page) { if (page_list) list_add(&page->lru, page_list); else page_array[nr_populated] = page; nr_populated++; } return nr_populated; <snip> From the trace i get: <snip> [ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310 [ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00 [ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246 [ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004 [ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102 [ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff [ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000 [ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000 [ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000 [ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0 [ 0.243916] Call Trace: [ 0.243916] __alloc_pages_bulk+0xaa1/0xb50 <snip> (gdb) l *__alloc_pages+0x11e 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095). 1090 return zoneref->zone; 1091 } 1092 1093 static inline int zonelist_zone_idx(struct zoneref *zoneref) 1094 { 1095 return zoneref->zone_idx; 1096 } 1097 1098 static inline int zonelist_node_idx(struct zoneref *zoneref) 1099 { (gdb) Seems like "zoneref" refers to invalid address. Thoughts? -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-13 20:18 ` Uladzislau Rezki @ 2021-05-14 10:19 ` Mel Gorman 2021-05-14 11:45 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-14 10:19 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote: > <SNIP> > > From the trace i get: > > <snip> > [ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310 > [ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00 > [ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246 > [ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004 > [ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102 > [ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff > [ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000 > [ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000 > [ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000 > [ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0 > [ 0.243916] Call Trace: > [ 0.243916] __alloc_pages_bulk+0xaa1/0xb50 > <snip> > > (gdb) l *__alloc_pages+0x11e > 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095). > 1090 return zoneref->zone; > 1091 } > 1092 > 1093 static inline int zonelist_zone_idx(struct zoneref *zoneref) > 1094 { > 1095 return zoneref->zone_idx; > 1096 } > 1097 > 1098 static inline int zonelist_node_idx(struct zoneref *zoneref) > 1099 { > (gdb) > > Seems like "zoneref" refers to invalid address. > > Thoughts? I have not previously read the patch but there are a few concerns and it's probably just as well this blew up early. The bulk allocator assumes a valid node but the patch can send in NUMA_NO_NODE (-1). On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE. Also, area->pages is not necessarily initialised so that could be interpreted as a partially populated array so minmally you need. diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f3c5dd4ccc5b9..b638ff31b07e1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, page_order = vm_area_page_order(area); if (!page_order) { + memset(area->pages, 0, array_size); + if (node == NUMA_NO_NODE) + node = numa_mem_id(); area->nr_pages = __alloc_pages_bulk(gfp_mask, node, NULL, nr_small_pages, NULL, area->pages); } else { However, the high-order path also looks suspicious. area->nr_pages is advanced before the allocation attempt so in the event alloc_pages_node() returns NULL prematurely, area->nr_pages does not reflect the number of pages allocated so that needs examination. As an aside, where or what is test_vmalloc.sh? It appears to have been used a few times but it's not clear it's representative so are you aware of workloads that are vmalloc-intensive? It does not matter for the patch as such but it would be nice to know examples of vmalloc-intensive workloads because I cannot recall a time during the last few years where I saw vmalloc.c high in profiles. -- Mel Gorman SUSE Labs ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-14 10:19 ` Mel Gorman @ 2021-05-14 11:45 ` Uladzislau Rezki 2021-05-14 13:45 ` Mel Gorman 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-14 11:45 UTC (permalink / raw) To: Mel Gorman Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Fri, May 14, 2021 at 11:19:20AM +0100, Mel Gorman wrote: > On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote: > > <SNIP> > > > > From the trace i get: > > > > <snip> > > [ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310 > > [ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00 > > [ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246 > > [ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004 > > [ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102 > > [ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff > > [ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000 > > [ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000 > > [ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000 > > [ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0 > > [ 0.243916] Call Trace: > > [ 0.243916] __alloc_pages_bulk+0xaa1/0xb50 > > <snip> > > > > (gdb) l *__alloc_pages+0x11e > > 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095). > > 1090 return zoneref->zone; > > 1091 } > > 1092 > > 1093 static inline int zonelist_zone_idx(struct zoneref *zoneref) > > 1094 { > > 1095 return zoneref->zone_idx; > > 1096 } > > 1097 > > 1098 static inline int zonelist_node_idx(struct zoneref *zoneref) > > 1099 { > > (gdb) > > > > Seems like "zoneref" refers to invalid address. > > > > Thoughts? > > I have not previously read the patch but there are a few concerns and it's > probably just as well this blew up early. The bulk allocator assumes a > valid node but the patch can send in NUMA_NO_NODE (-1). > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed. > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE. > Also, area->pages is not necessarily initialised so that could be interpreted > as a partially populated array so minmally you need. > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array. > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index f3c5dd4ccc5b9..b638ff31b07e1 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > page_order = vm_area_page_order(area); > > if (!page_order) { > + memset(area->pages, 0, array_size); This memset is not needed as explained above. > + if (node == NUMA_NO_NODE) > + node = numa_mem_id(); This patch works. Again should it be processed by allocator? > area->nr_pages = __alloc_pages_bulk(gfp_mask, node, > NULL, nr_small_pages, NULL, area->pages); > } else { > > However, the high-order path also looks suspicious. area->nr_pages is > advanced before the allocation attempt so in the event alloc_pages_node() > returns NULL prematurely, area->nr_pages does not reflect the number of > pages allocated so that needs examination. > <snip> for (area->nr_pages = 0; area->nr_pages < nr_small_pages; area->nr_pages += 1U << page_order) { <snip> if alloc_pages_node() fails we break the loop. area->nr_pages is initialized inside the for(...) loop, thus it will be zero if the single page allocator fails on a first iteration. Or i miss your point? > As an aside, where or what is test_vmalloc.sh? It appears to have been > used a few times but it's not clear it's representative so are you aware > of workloads that are vmalloc-intensive? It does not matter for the > patch as such but it would be nice to know examples of vmalloc-intensive > workloads because I cannot recall a time during the last few years where > I saw vmalloc.c high in profiles. > test_vmalloc.sh is a shell script that is used for stressing and testing a vmalloc subsystem as well as performance evaluation. You can find it here: ./tools/testing/selftests/vm/test_vmalloc.sh As for workloads. Most of them which are critical to time and latency. For example audio/video, especially in the mobile area. I did a big rework of the KVA allocator because i found it not optimal to allocation time. To be more specific, a high resolution audio playback was suffering from the glitches due to a long allocation time and preemption issues. -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-14 11:45 ` Uladzislau Rezki @ 2021-05-14 13:45 ` Mel Gorman 2021-05-14 14:50 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-14 13:45 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote: > > > Seems like "zoneref" refers to invalid address. > > > > > > Thoughts? > > > > I have not previously read the patch but there are a few concerns and it's > > probably just as well this blew up early. The bulk allocator assumes a > > valid node but the patch can send in NUMA_NO_NODE (-1). > > > > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead > of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed. > No for API similarity reasons. __alloc_pages_bulk is the API bulk equivalent to __alloc_pages() and both expect valid node IDs. vmalloc is using alloc_pages_node for high-order pages which first checks the node ID so your options are to check it within vmalloc.c or add a alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node as a prerequisite to your patch. > > > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE. > > Also, area->pages is not necessarily initialised so that could be interpreted > > as a partially populated array so minmally you need. > > > > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array. > Ah, yes. > > However, the high-order path also looks suspicious. area->nr_pages is > > advanced before the allocation attempt so in the event alloc_pages_node() > > returns NULL prematurely, area->nr_pages does not reflect the number of > > pages allocated so that needs examination. > > > <snip> > for (area->nr_pages = 0; area->nr_pages < nr_small_pages; > area->nr_pages += 1U << page_order) { > <snip> > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized > inside the for(...) loop, thus it will be zero if the single page allocator > fails on a first iteration. > > Or i miss your point? > At the time of the break, area->nr_pages += 1U << page_order happened before the allocation failure happens. That looks very suspicious. > > As an aside, where or what is test_vmalloc.sh? It appears to have been > > used a few times but it's not clear it's representative so are you aware > > of workloads that are vmalloc-intensive? It does not matter for the > > patch as such but it would be nice to know examples of vmalloc-intensive > > workloads because I cannot recall a time during the last few years where > > I saw vmalloc.c high in profiles. > > > test_vmalloc.sh is a shell script that is used for stressing and testing a > vmalloc subsystem as well as performance evaluation. You can find it here: > > ./tools/testing/selftests/vm/test_vmalloc.sh > Thanks. > As for workloads. Most of them which are critical to time and latency. For > example audio/video, especially in the mobile area. I did a big rework of > the KVA allocator because i found it not optimal to allocation time. > Can you give an example benchmark that triggers it or is it somewhat specific to mobile platforms with drivers that use vmalloc heavily? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-14 13:45 ` Mel Gorman @ 2021-05-14 14:50 ` Uladzislau Rezki 2021-05-14 15:41 ` Mel Gorman 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-14 14:50 UTC (permalink / raw) To: Mel Gorman Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox > On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote: > > > > Seems like "zoneref" refers to invalid address. > > > > > > > > Thoughts? > > > > > > I have not previously read the patch but there are a few concerns and it's > > > probably just as well this blew up early. The bulk allocator assumes a > > > valid node but the patch can send in NUMA_NO_NODE (-1). > > > > > > > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead > > of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed. > > > > No for API similarity reasons. __alloc_pages_bulk is the API bulk > equivalent to __alloc_pages() and both expect valid node IDs. vmalloc > is using alloc_pages_node for high-order pages which first checks > the node ID so your options are to check it within vmalloc.c or add a > alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node > as a prerequisite to your patch. > OK. Thanks. > > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE. > > > Also, area->pages is not necessarily initialised so that could be interpreted > > > as a partially populated array so minmally you need. > > > > > > > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array. > > > > Ah, yes. > > > > However, the high-order path also looks suspicious. area->nr_pages is > > > advanced before the allocation attempt so in the event alloc_pages_node() > > > returns NULL prematurely, area->nr_pages does not reflect the number of > > > pages allocated so that needs examination. > > > > > <snip> > > for (area->nr_pages = 0; area->nr_pages < nr_small_pages; > > area->nr_pages += 1U << page_order) { > > <snip> > > > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized > > inside the for(...) loop, thus it will be zero if the single page allocator > > fails on a first iteration. > > > > Or i miss your point? > > > > At the time of the break, area->nr_pages += 1U << page_order happened > before the allocation failure happens. That looks very suspicious. > The "for" loop does not work that way. If you break the loop the "area->nr_pages += 1U << page_order" or an "increment" is not increased. It is increased only after the body of the "for" loop executes and it goes to next iteration. > > > As an aside, where or what is test_vmalloc.sh? It appears to have been > > > used a few times but it's not clear it's representative so are you aware > > > of workloads that are vmalloc-intensive? It does not matter for the > > > patch as such but it would be nice to know examples of vmalloc-intensive > > > workloads because I cannot recall a time during the last few years where > > > I saw vmalloc.c high in profiles. > > > > > test_vmalloc.sh is a shell script that is used for stressing and testing a > > vmalloc subsystem as well as performance evaluation. You can find it here: > > > > ./tools/testing/selftests/vm/test_vmalloc.sh > > > > Thanks. > > > As for workloads. Most of them which are critical to time and latency. For > > example audio/video, especially in the mobile area. I did a big rework of > > the KVA allocator because i found it not optimal to allocation time. > > > > Can you give an example benchmark that triggers it or is it somewhat > specific to mobile platforms with drivers that use vmalloc heavily? > See below an example of audio glitches. That was related to our phones and audio workloads: # Explanation is here wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt # Audio 10 seconds sample is here. # The drop occurs at 00:09.295 you can hear it wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav Apart of that a slow allocation can course two type of issues. First one is direct. When for example a high-priority RT thread does some allocation to bypass data to DSP. Long latency courses a delay of data to be passed to DSP. This is drivers area. Another example is when a task is doing an allocation and the RT task is placed onto a same CPU. In that case a long preemption-off(milliseconds) section can lead the RT task for starvation. For mobile devices it is UI stack where RT tasks are used. As a result we face frame drops. All such issues have been solved after a rework: wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-14 14:50 ` Uladzislau Rezki @ 2021-05-14 15:41 ` Mel Gorman 2021-05-14 17:16 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-14 15:41 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Fri, May 14, 2021 at 04:50:26PM +0200, Uladzislau Rezki wrote: > > > > However, the high-order path also looks suspicious. area->nr_pages is > > > > advanced before the allocation attempt so in the event alloc_pages_node() > > > > returns NULL prematurely, area->nr_pages does not reflect the number of > > > > pages allocated so that needs examination. > > > > > > > <snip> > > > for (area->nr_pages = 0; area->nr_pages < nr_small_pages; > > > area->nr_pages += 1U << page_order) { > > > <snip> > > > > > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized > > > inside the for(...) loop, thus it will be zero if the single page allocator > > > fails on a first iteration. > > > > > > Or i miss your point? > > > > > > > At the time of the break, area->nr_pages += 1U << page_order happened > > before the allocation failure happens. That looks very suspicious. > > > The "for" loop does not work that way. If you break the loop the > "area->nr_pages += 1U << page_order" or an "increment" is not increased. > It is increased only after the body of the "for" loop executes and it > goes to next iteration. > Yeah, I don't know what I was thinking -- not properly anyway. > > > As for workloads. Most of them which are critical to time and latency. For > > > example audio/video, especially in the mobile area. I did a big rework of > > > the KVA allocator because i found it not optimal to allocation time. > > > > > > > Can you give an example benchmark that triggers it or is it somewhat > > specific to mobile platforms with drivers that use vmalloc heavily? > > > > See below an example of audio glitches. That was related to our phones > and audio workloads: > > # Explanation is here > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt > > # Audio 10 seconds sample is here. > # The drop occurs at 00:09.295 you can hear it > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav > > Apart of that a slow allocation can course two type of issues. First one > is direct. When for example a high-priority RT thread does some allocation > to bypass data to DSP. Long latency courses a delay of data to be passed to > DSP. This is drivers area. > > Another example is when a task is doing an allocation and the RT task is > placed onto a same CPU. In that case a long preemption-off(milliseconds) > section can lead the RT task for starvation. For mobile devices it is UI > stack where RT tasks are used. As a result we face frame drops. > > All such issues have been solved after a rework: > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > Thanks. That was enough for me to search to see what sort of general workload would be affected. Mostly it's driver specific. A lot of the users that would be potentially hot are already using kvmalloc so probably not worth the effort so test_vmalloc.sh makes sense. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-14 15:41 ` Mel Gorman @ 2021-05-14 17:16 ` Uladzislau Rezki 2021-05-16 17:17 ` Mel Gorman 0 siblings, 1 reply; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-14 17:16 UTC (permalink / raw) To: Mel Gorman Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Fri, May 14, 2021 at 04:41:53PM +0100, Mel Gorman wrote: > On Fri, May 14, 2021 at 04:50:26PM +0200, Uladzislau Rezki wrote: > > > > > However, the high-order path also looks suspicious. area->nr_pages is > > > > > advanced before the allocation attempt so in the event alloc_pages_node() > > > > > returns NULL prematurely, area->nr_pages does not reflect the number of > > > > > pages allocated so that needs examination. > > > > > > > > > <snip> > > > > for (area->nr_pages = 0; area->nr_pages < nr_small_pages; > > > > area->nr_pages += 1U << page_order) { > > > > <snip> > > > > > > > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized > > > > inside the for(...) loop, thus it will be zero if the single page allocator > > > > fails on a first iteration. > > > > > > > > Or i miss your point? > > > > > > > > > > At the time of the break, area->nr_pages += 1U << page_order happened > > > before the allocation failure happens. That looks very suspicious. > > > > > The "for" loop does not work that way. If you break the loop the > > "area->nr_pages += 1U << page_order" or an "increment" is not increased. > > It is increased only after the body of the "for" loop executes and it > > goes to next iteration. > > > > Yeah, I don't know what I was thinking -- not properly anyway. > Hm.. To me it looks properly enough. Will see what i can do with it. > > > > As for workloads. Most of them which are critical to time and latency. For > > > > example audio/video, especially in the mobile area. I did a big rework of > > > > the KVA allocator because i found it not optimal to allocation time. > > > > > > > > > > Can you give an example benchmark that triggers it or is it somewhat > > > specific to mobile platforms with drivers that use vmalloc heavily? > > > > > > > See below an example of audio glitches. That was related to our phones > > and audio workloads: > > > > # Explanation is here > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt > > > > # Audio 10 seconds sample is here. > > # The drop occurs at 00:09.295 you can hear it > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav > > > > Apart of that a slow allocation can course two type of issues. First one > > is direct. When for example a high-priority RT thread does some allocation > > to bypass data to DSP. Long latency courses a delay of data to be passed to > > DSP. This is drivers area. > > > > Another example is when a task is doing an allocation and the RT task is > > placed onto a same CPU. In that case a long preemption-off(milliseconds) > > section can lead the RT task for starvation. For mobile devices it is UI > > stack where RT tasks are used. As a result we face frame drops. > > > > All such issues have been solved after a rework: > > > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > > > Thanks. That was enough for me to search to see what sort of general > workload would be affected. Mostly it's driver specific. A lot of the users > that would be potentially hot are already using kvmalloc so probably not > worth the effort so test_vmalloc.sh makes sense. > You are welcome. As for a helper. Does it sound good for you? BTW, once upon a time i had asked for it :) From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001 From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Date: Fri, 14 May 2021 18:39:08 +0200 Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper Add a "node" variant of the alloc_pages_bulk_array() function. The helper guarantees that a __alloc_pages_bulk() is invoked with a valid NUMA node id. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- include/linux/gfp.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 11da8af06704..94f0b8b1cb55 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -536,6 +536,15 @@ alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_arr return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array); } +static inline unsigned long +alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array) +{ + if (nid == NUMA_NO_NODE) + nid = numa_mem_id(); + + return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array); +} + /* * Allocate pages, preferring the node given as nid. The node must be valid and * online. For more general interface, see alloc_pages_node(). -- 2.20.1 -- Vlad Rezki ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-14 17:16 ` Uladzislau Rezki @ 2021-05-16 17:17 ` Mel Gorman 2021-05-16 20:31 ` Uladzislau Rezki 0 siblings, 1 reply; 17+ messages in thread From: Mel Gorman @ 2021-05-16 17:17 UTC (permalink / raw) To: Uladzislau Rezki Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Fri, May 14, 2021 at 07:16:23PM +0200, Uladzislau Rezki wrote: > > > See below an example of audio glitches. That was related to our phones > > > and audio workloads: > > > > > > # Explanation is here > > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt > > > > > > # Audio 10 seconds sample is here. > > > # The drop occurs at 00:09.295 you can hear it > > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav > > > > > > Apart of that a slow allocation can course two type of issues. First one > > > is direct. When for example a high-priority RT thread does some allocation > > > to bypass data to DSP. Long latency courses a delay of data to be passed to > > > DSP. This is drivers area. > > > > > > Another example is when a task is doing an allocation and the RT task is > > > placed onto a same CPU. In that case a long preemption-off(milliseconds) > > > section can lead the RT task for starvation. For mobile devices it is UI > > > stack where RT tasks are used. As a result we face frame drops. > > > > > > All such issues have been solved after a rework: > > > > > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > > > > > > Thanks. That was enough for me to search to see what sort of general > > workload would be affected. Mostly it's driver specific. A lot of the users > > that would be potentially hot are already using kvmalloc so probably not > > worth the effort so test_vmalloc.sh makes sense. > > > You are welcome. > > As for a helper. Does it sound good for you? BTW, once upon a time i had > asked for it :) > The intent was that instead of guessing in advance what APIs would be needed that users would add an API helper where appropriate. > From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001 > From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> > Date: Fri, 14 May 2021 18:39:08 +0200 > Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper > > Add a "node" variant of the alloc_pages_bulk_array() function. > The helper guarantees that a __alloc_pages_bulk() is invoked > with a valid NUMA node id. > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Include it as part of your series adding the vmalloc user. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree 2021-05-16 17:17 ` Mel Gorman @ 2021-05-16 20:31 ` Uladzislau Rezki 0 siblings, 0 replies; 17+ messages in thread From: Uladzislau Rezki @ 2021-05-16 20:31 UTC (permalink / raw) To: Mel Gorman Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt, Matthew Wilcox On Sun, May 16, 2021 at 06:17:38PM +0100, Mel Gorman wrote: > On Fri, May 14, 2021 at 07:16:23PM +0200, Uladzislau Rezki wrote: > > > > See below an example of audio glitches. That was related to our phones > > > > and audio workloads: > > > > > > > > # Explanation is here > > > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt > > > > > > > > # Audio 10 seconds sample is here. > > > > # The drop occurs at 00:09.295 you can hear it > > > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav > > > > > > > > Apart of that a slow allocation can course two type of issues. First one > > > > is direct. When for example a high-priority RT thread does some allocation > > > > to bypass data to DSP. Long latency courses a delay of data to be passed to > > > > DSP. This is drivers area. > > > > > > > > Another example is when a task is doing an allocation and the RT task is > > > > placed onto a same CPU. In that case a long preemption-off(milliseconds) > > > > section can lead the RT task for starvation. For mobile devices it is UI > > > > stack where RT tasks are used. As a result we face frame drops. > > > > > > > > All such issues have been solved after a rework: > > > > > > > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf > > > > > > > > > > Thanks. That was enough for me to search to see what sort of general > > > workload would be affected. Mostly it's driver specific. A lot of the users > > > that would be potentially hot are already using kvmalloc so probably not > > > worth the effort so test_vmalloc.sh makes sense. > > > > > You are welcome. > > > > As for a helper. Does it sound good for you? BTW, once upon a time i had > > asked for it :) > > > > The intent was that instead of guessing in advance what APIs would be > needed that users would add an API helper where appropriate. > > > From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001 > > From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> > > Date: Fri, 14 May 2021 18:39:08 +0200 > > Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper > > > > Add a "node" variant of the alloc_pages_bulk_array() function. > > The helper guarantees that a __alloc_pages_bulk() is invoked > > with a valid NUMA node id. > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Acked-by: Mel Gorman <mgorman@suse.de> > Thanks! -- Vlad Rezki ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-05-16 20:31 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-12 20:29 [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree akpm 2021-05-12 22:56 ` Stephen Rothwell 2021-05-13 10:31 ` Uladzislau Rezki 2021-05-13 11:11 ` Mel Gorman 2021-05-13 12:46 ` Uladzislau Rezki 2021-05-13 13:24 ` Uladzislau Rezki 2021-05-13 14:18 ` Mel Gorman [not found] ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com> 2021-05-13 15:51 ` Mel Gorman 2021-05-13 20:18 ` Uladzislau Rezki 2021-05-14 10:19 ` Mel Gorman 2021-05-14 11:45 ` Uladzislau Rezki 2021-05-14 13:45 ` Mel Gorman 2021-05-14 14:50 ` Uladzislau Rezki 2021-05-14 15:41 ` Mel Gorman 2021-05-14 17:16 ` Uladzislau Rezki 2021-05-16 17:17 ` Mel Gorman 2021-05-16 20:31 ` Uladzislau Rezki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.