* [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
@ 2021-05-12 20:29 akpm
2021-05-12 22:56 ` Stephen Rothwell
0 siblings, 1 reply; 17+ messages in thread
From: akpm @ 2021-05-12 20:29 UTC (permalink / raw)
To: hdanton, mgorman, mhocko, mm-commits, npiggin,
oleksiy.avramchenko, rostedt, sfr, urezki, willy
The patch titled
Subject: mm/vmalloc: print a warning message first on failure
has been removed from the -mm tree. Its filename was
mm-vmalloc-print-a-warning-message-first-on-failure.patch
This patch was dropped because it had testing failures
------------------------------------------------------
From: Uladzislau Rezki <urezki@gmail.com>
Subject: mm/vmalloc: print a warning message first on failure
When a memory allocation for array of pages are not succeed emit a warning
message as a first step and then perform the further cleanup.
The reason it should be done in a right order is the clean up function
which is free_vm_area() can potentially also follow its error paths what
can lead to confusion what was broken first.
Link: https://lkml.kernel.org/r/20210510103342.GA2169@pc638.lan
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-print-a-warning-message-first-on-failure
+++ a/mm/vmalloc.c
@@ -2780,11 +2780,11 @@ static void *__vmalloc_area_node(struct
}
if (!area->pages) {
- free_vm_area(area);
warn_alloc(gfp_mask, NULL,
"vmalloc size %lu allocation failure: "
"page array size %lu allocation failed",
nr_small_pages * PAGE_SIZE, array_size);
+ free_vm_area(area);
return NULL;
}
_
Patches currently in -mm which might be from urezki@gmail.com are
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-12 20:29 [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree akpm
@ 2021-05-12 22:56 ` Stephen Rothwell
2021-05-13 10:31 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Stephen Rothwell @ 2021-05-12 22:56 UTC (permalink / raw)
To: akpm
Cc: hdanton, mgorman, mhocko, mm-commits, npiggin,
oleksiy.avramchenko, rostedt, urezki, willy
[-- Attachment #1: Type: text/plain, Size: 423 bytes --]
Hi Andrew,
On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
>
> The patch titled
> Subject: mm/vmalloc: print a warning message first on failure
> has been removed from the -mm tree. Its filename was
> mm-vmalloc-print-a-warning-message-first-on-failure.patch
>
> This patch was dropped because it had testing failures
Removed from linux-next.
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-12 22:56 ` Stephen Rothwell
@ 2021-05-13 10:31 ` Uladzislau Rezki
2021-05-13 11:11 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 10:31 UTC (permalink / raw)
To: Stephen Rothwell, akpm
Cc: akpm, hdanton, mgorman, mhocko, mm-commits, npiggin,
oleksiy.avramchenko, rostedt, urezki, willy
On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> Hi Andrew,
>
> On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> >
> > The patch titled
> > Subject: mm/vmalloc: print a warning message first on failure
> > has been removed from the -mm tree. Its filename was
> > mm-vmalloc-print-a-warning-message-first-on-failure.patch
> >
> > This patch was dropped because it had testing failures
>
> Removed from linux-next.
>
What can of testing failures does it trigger? Where can i find the
details, logs or tracers of it?
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-13 10:31 ` Uladzislau Rezki
@ 2021-05-13 11:11 ` Mel Gorman
2021-05-13 12:46 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-13 11:11 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin,
oleksiy.avramchenko, rostedt, willy
On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote:
> On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> > Hi Andrew,
> >
> > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> > >
> > > The patch titled
> > > Subject: mm/vmalloc: print a warning message first on failure
> > > has been removed from the -mm tree. Its filename was
> > > mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > >
> > > This patch was dropped because it had testing failures
> >
> > Removed from linux-next.
> >
> What can of testing failures does it trigger? Where can i find the
> details, logs or tracers of it?
https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-13 11:11 ` Mel Gorman
@ 2021-05-13 12:46 ` Uladzislau Rezki
2021-05-13 13:24 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 12:46 UTC (permalink / raw)
To: Mel Gorman, Stephen Rothwell, akpm
Cc: Uladzislau Rezki, Stephen Rothwell, akpm, hdanton, mhocko,
mm-commits, npiggin, oleksiy.avramchenko, rostedt, willy
On Thu, May 13, 2021 at 12:11:53PM +0100, Mel Gorman wrote:
> On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote:
> > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> > > Hi Andrew,
> > >
> > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> > > >
> > > > The patch titled
> > > > Subject: mm/vmalloc: print a warning message first on failure
> > > > has been removed from the -mm tree. Its filename was
> > > > mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > > >
> > > > This patch was dropped because it had testing failures
> > >
> > > Removed from linux-next.
> > >
> > What can of testing failures does it trigger? Where can i find the
> > details, logs or tracers of it?
>
> https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
>
Thanks, Mel.
OK. Now i see. The problem is with this patch:
mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
<snip>
[ 0.097819][ T1] BUG: Unable to handle kernel data access on read at 0x200000c0a
[ 0.098533][ T1] Faulting instruction address: 0xc0000000003f6fa0
[ 0.099044][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.099182][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 0.099506][ T1] Modules linked in:
[ 0.099896][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00142-g6053672bb612 #12
[ 0.100254][ T1] NIP: c0000000003f6fa0 LR: c0000000003f6f68 CTR: 0000000000000000
[ 0.100342][ T1] REGS: c0000000063a3480 TRAP: 0380 Not tainted (5.13.0-rc1-00142-g6053672bb612)
[ 0.100550][ T1] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24402840 XER: 00000000
[ 0.100900][ T1] CFAR: c0000000003f6f7c IRQMASK: 0
[ 0.100900][ T1] GPR00: c0000000003f6f68 c0000000063a3720 c00000000146b100 0000000000000000
[ 0.100900][ T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
[ 0.100900][ T1] GPR08: c0000000015219e8 0000000000000000 0000000200000c02 c000000006030010
[ 0.100900][ T1] GPR12: 0000000000008000 c000000001640000 0000000000000001 c000000000262f84
[ 0.100900][ T1] GPR16: c00a000000000000 c008000000000000 0000000000000dc0 0000000000000008
[ 0.100900][ T1] GPR20: 0000000000000522 0000000000010000 0000000000000cc0 c008000000000000
[ 0.100900][ T1] GPR24: 0000000000000001 0000000000000000 0000000000002cc2 0000000000000000
[ 0.100900][ T1] GPR28: 0000000000000000 0000000000000000 0000000200000c02 0000000000002cc2
[ 0.101927][ T1] NIP [c0000000003f6fa0] __alloc_pages+0x140/0x3f0
[ 0.102733][ T1] LR [c0000000003f6f68] __alloc_pages+0x108/0x3f0
[ 0.103032][ T1] Call Trace:
[ 0.103165][ T1] [c0000000063a3720] [0000000000000900] 0x900 (unreliable)
[ 0.103616][ T1] [c0000000063a37b0] [c0000000003f7810] __alloc_pages_bulk+0x5c0/0x840
[ 0.103787][ T1] [c0000000063a3890] [c0000000003ecf74] __vmalloc_node_range+0x4c4/0x600
[ 0.103871][ T1] [c0000000063a39b0] [c00000000004f598] module_alloc+0x58/0x70
[ 0.103962][ T1] [c0000000063a3a20] [c000000000262f84] alloc_insn_page+0x24/0x40
[ 0.104046][ T1] [c0000000063a3a40] [c00000000026629c] __get_insn_slot+0x1dc/0x280
[ 0.104143][ T1] [c0000000063a3a80] [c00000000005770c] arch_prepare_kprobe+0x15c/0x1f0
[ 0.104290][ T1] [c0000000063a3b00] [c000000000267880] register_kprobe+0x6d0/0x850
[ 0.104392][ T1] [c0000000063a3b60] [c00000000108fe2c] arch_init_kprobes+0x28/0x3c
[ 0.104524][ T1] [c0000000063a3b80] [c0000000010addb0] init_kprobes+0x120/0x174
[ 0.104629][ T1] [c0000000063a3bf0] [c000000000012190] do_one_initcall+0x60/0x2c0
[ 0.104722][ T1] [c0000000063a3cc0] [c0000000010845a0] kernel_init_freeable+0x1bc/0x3a0
[ 0.104826][ T1] [c0000000063a3da0] [c000000000012764] kernel_init+0x2c/0x168
[ 0.104911][ T1] [c0000000063a3e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70
[ 0.105178][ T1] Instruction dump:
[ 0.105516][ T1] 40920018 57e9efbe 2c090001 4082000c 63050080 78b80020 e8a10028 57e9a7fe
[ 0.105759][ T1] 7fcaf378 99210040 2c250000 408201f4 <813e0008> 7c09c840 418101e8 57e50528
[ 0.107188][ T1] ---[ end trace 9bd7c2fac4d061e2 ]---
[ 0.107319][ T1]
[ 1.108818][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
<snip>
So during the boot process when the module is about to be loaded, the vmalloc allocation
gets failed in the __alloc_pages_bulk().
Will try to reproduce. It would be good to get a kernel config.
Appreciate for any thoughts about it?
Thanks!
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-13 12:46 ` Uladzislau Rezki
@ 2021-05-13 13:24 ` Uladzislau Rezki
2021-05-13 14:18 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 13:24 UTC (permalink / raw)
To: Mel Gorman
Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin,
oleksiy.avramchenko, rostedt, willy
On Thu, May 13, 2021 at 02:46:05PM +0200, Uladzislau Rezki wrote:
> On Thu, May 13, 2021 at 12:11:53PM +0100, Mel Gorman wrote:
> > On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote:
> > > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> > > > Hi Andrew,
> > > >
> > > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> > > > >
> > > > > The patch titled
> > > > > Subject: mm/vmalloc: print a warning message first on failure
> > > > > has been removed from the -mm tree. Its filename was
> > > > > mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > > > >
> > > > > This patch was dropped because it had testing failures
> > > >
> > > > Removed from linux-next.
> > > >
> > > What can of testing failures does it trigger? Where can i find the
> > > details, logs or tracers of it?
> >
> > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
> >
> Thanks, Mel.
>
> OK. Now i see. The problem is with this patch:
>
> mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
>
> <snip>
> [ 0.097819][ T1] BUG: Unable to handle kernel data access on read at 0x200000c0a
> [ 0.098533][ T1] Faulting instruction address: 0xc0000000003f6fa0
> [ 0.099044][ T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 0.099182][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [ 0.099506][ T1] Modules linked in:
> [ 0.099896][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00142-g6053672bb612 #12
> [ 0.100254][ T1] NIP: c0000000003f6fa0 LR: c0000000003f6f68 CTR: 0000000000000000
> [ 0.100342][ T1] REGS: c0000000063a3480 TRAP: 0380 Not tainted (5.13.0-rc1-00142-g6053672bb612)
> [ 0.100550][ T1] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24402840 XER: 00000000
> [ 0.100900][ T1] CFAR: c0000000003f6f7c IRQMASK: 0
> [ 0.100900][ T1] GPR00: c0000000003f6f68 c0000000063a3720 c00000000146b100 0000000000000000
> [ 0.100900][ T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
> [ 0.100900][ T1] GPR08: c0000000015219e8 0000000000000000 0000000200000c02 c000000006030010
> [ 0.100900][ T1] GPR12: 0000000000008000 c000000001640000 0000000000000001 c000000000262f84
> [ 0.100900][ T1] GPR16: c00a000000000000 c008000000000000 0000000000000dc0 0000000000000008
> [ 0.100900][ T1] GPR20: 0000000000000522 0000000000010000 0000000000000cc0 c008000000000000
> [ 0.100900][ T1] GPR24: 0000000000000001 0000000000000000 0000000000002cc2 0000000000000000
> [ 0.100900][ T1] GPR28: 0000000000000000 0000000000000000 0000000200000c02 0000000000002cc2
> [ 0.101927][ T1] NIP [c0000000003f6fa0] __alloc_pages+0x140/0x3f0
> [ 0.102733][ T1] LR [c0000000003f6f68] __alloc_pages+0x108/0x3f0
> [ 0.103032][ T1] Call Trace:
> [ 0.103165][ T1] [c0000000063a3720] [0000000000000900] 0x900 (unreliable)
> [ 0.103616][ T1] [c0000000063a37b0] [c0000000003f7810] __alloc_pages_bulk+0x5c0/0x840
> [ 0.103787][ T1] [c0000000063a3890] [c0000000003ecf74] __vmalloc_node_range+0x4c4/0x600
> [ 0.103871][ T1] [c0000000063a39b0] [c00000000004f598] module_alloc+0x58/0x70
> [ 0.103962][ T1] [c0000000063a3a20] [c000000000262f84] alloc_insn_page+0x24/0x40
> [ 0.104046][ T1] [c0000000063a3a40] [c00000000026629c] __get_insn_slot+0x1dc/0x280
> [ 0.104143][ T1] [c0000000063a3a80] [c00000000005770c] arch_prepare_kprobe+0x15c/0x1f0
> [ 0.104290][ T1] [c0000000063a3b00] [c000000000267880] register_kprobe+0x6d0/0x850
> [ 0.104392][ T1] [c0000000063a3b60] [c00000000108fe2c] arch_init_kprobes+0x28/0x3c
> [ 0.104524][ T1] [c0000000063a3b80] [c0000000010addb0] init_kprobes+0x120/0x174
> [ 0.104629][ T1] [c0000000063a3bf0] [c000000000012190] do_one_initcall+0x60/0x2c0
> [ 0.104722][ T1] [c0000000063a3cc0] [c0000000010845a0] kernel_init_freeable+0x1bc/0x3a0
> [ 0.104826][ T1] [c0000000063a3da0] [c000000000012764] kernel_init+0x2c/0x168
> [ 0.104911][ T1] [c0000000063a3e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70
> [ 0.105178][ T1] Instruction dump:
> [ 0.105516][ T1] 40920018 57e9efbe 2c090001 4082000c 63050080 78b80020 e8a10028 57e9a7fe
> [ 0.105759][ T1] 7fcaf378 99210040 2c250000 408201f4 <813e0008> 7c09c840 418101e8 57e50528
> [ 0.107188][ T1] ---[ end trace 9bd7c2fac4d061e2 ]---
> [ 0.107319][ T1]
> [ 1.108818][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> <snip>
>
> So during the boot process when the module is about to be loaded, the vmalloc allocation
> gets failed in the __alloc_pages_bulk().
>
> Will try to reproduce. It would be good to get a kernel config.
> Appreciate for any thoughts about it?
>
I see that on the target machine when the problem occurs the PAGE_SIZE is 64K.
Can it be somehow connected to it? Also one question, just guessing, the crash
happens during the boot, therefore the question is: is __alloc_pages_bulk()
fully initialized by that time?
Thanks!
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-13 13:24 ` Uladzislau Rezki
@ 2021-05-13 14:18 ` Mel Gorman
[not found] ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-13 14:18 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin,
oleksiy.avramchenko, rostedt, willy
On Thu, May 13, 2021 at 03:24:18PM +0200, Uladzislau Rezki wrote:
> > > > What can of testing failures does it trigger? Where can i find the
> > > > details, logs or tracers of it?
> > >
> > > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
> > >
> > Thanks, Mel.
> >
> > OK. Now i see. The problem is with this patch:
> >
> > mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
> >
> > <SNIP>
> >
> > So during the boot process when the module is about to be loaded, the vmalloc allocation
> > gets failed in the __alloc_pages_bulk().
> >
> > Will try to reproduce. It would be good to get a kernel config.
> > Appreciate for any thoughts about it?
> >
>
> I see that on the target machine when the problem occurs the PAGE_SIZE is 64K.
>
> Can it be somehow connected to it? Also one question, just guessing, the crash
> happens during the boot, therefore the question is: is __alloc_pages_bulk()
> fully initialized by that time?
>
A boot test on KVM using my distribution config (openSUSE Leap 15.2)
failed on x86-64 so it's not related to PAGE_SIZE.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
[not found] ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>
@ 2021-05-13 15:51 ` Mel Gorman
2021-05-13 20:18 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-13 15:51 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
Matthew Wilcox
[-- Attachment #1: Type: text/plain, Size: 176 bytes --]
On Thu, May 13, 2021 at 05:29:05PM +0200, Uladzislau Rezki wrote:
> Could you please send your config? I will try to reproduce with it.
>
Attached.
--
Mel Gorman
SUSE Labs
[-- Attachment #2: config-bulkvmalloc.gz --]
[-- Type: application/x-gzip, Size: 59818 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-13 15:51 ` Mel Gorman
@ 2021-05-13 20:18 ` Uladzislau Rezki
2021-05-14 10:19 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 20:18 UTC (permalink / raw)
To: Mel Gorman
Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
Steven Rostedt, Matthew Wilcox
On Thu, May 13, 2021 at 04:51:33PM +0100, Mel Gorman wrote:
> On Thu, May 13, 2021 at 05:29:05PM +0200, Uladzislau Rezki wrote:
> > Could you please send your config? I will try to reproduce with it.
> >
>
> Attached.
>
Thanks.
With your .config file i am able to reproduce the kernel panic. Actually
when a one page is requested the __alloc_pages_bulk() enters to the single
page allocator:
<snip>
/* Use the single page allocator for one page. */
if (nr_pages - nr_populated == 1)
goto failed;
...
failed:
page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
if (page) {
if (page_list)
list_add(&page->lru, page_list);
else
page_array[nr_populated] = page;
nr_populated++;
}
return nr_populated;
<snip>
From the trace i get:
<snip>
[ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310
[ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00
[ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246
[ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004
[ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102
[ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
[ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000
[ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000
[ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000
[ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0
[ 0.243916] Call Trace:
[ 0.243916] __alloc_pages_bulk+0xaa1/0xb50
<snip>
(gdb) l *__alloc_pages+0x11e
0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095).
1090 return zoneref->zone;
1091 }
1092
1093 static inline int zonelist_zone_idx(struct zoneref *zoneref)
1094 {
1095 return zoneref->zone_idx;
1096 }
1097
1098 static inline int zonelist_node_idx(struct zoneref *zoneref)
1099 {
(gdb)
Seems like "zoneref" refers to invalid address.
Thoughts?
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-13 20:18 ` Uladzislau Rezki
@ 2021-05-14 10:19 ` Mel Gorman
2021-05-14 11:45 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-14 10:19 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
Matthew Wilcox
On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote:
> <SNIP>
>
> From the trace i get:
>
> <snip>
> [ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310
> [ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00
> [ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246
> [ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004
> [ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102
> [ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
> [ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000
> [ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000
> [ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000
> [ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0
> [ 0.243916] Call Trace:
> [ 0.243916] __alloc_pages_bulk+0xaa1/0xb50
> <snip>
>
> (gdb) l *__alloc_pages+0x11e
> 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095).
> 1090 return zoneref->zone;
> 1091 }
> 1092
> 1093 static inline int zonelist_zone_idx(struct zoneref *zoneref)
> 1094 {
> 1095 return zoneref->zone_idx;
> 1096 }
> 1097
> 1098 static inline int zonelist_node_idx(struct zoneref *zoneref)
> 1099 {
> (gdb)
>
> Seems like "zoneref" refers to invalid address.
>
> Thoughts?
I have not previously read the patch but there are a few concerns and it's
probably just as well this blew up early. The bulk allocator assumes a
valid node but the patch can send in NUMA_NO_NODE (-1). On the high-order
path alloc_pages_node is used which checks nid == NUMA_NO_NODE. Also,
area->pages is not necessarily initialised so that could be interpreted
as a partially populated array so minmally you need.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f3c5dd4ccc5b9..b638ff31b07e1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
page_order = vm_area_page_order(area);
if (!page_order) {
+ memset(area->pages, 0, array_size);
+ if (node == NUMA_NO_NODE)
+ node = numa_mem_id();
area->nr_pages = __alloc_pages_bulk(gfp_mask, node,
NULL, nr_small_pages, NULL, area->pages);
} else {
However, the high-order path also looks suspicious. area->nr_pages is
advanced before the allocation attempt so in the event alloc_pages_node()
returns NULL prematurely, area->nr_pages does not reflect the number of
pages allocated so that needs examination.
As an aside, where or what is test_vmalloc.sh? It appears to have been
used a few times but it's not clear it's representative so are you aware
of workloads that are vmalloc-intensive? It does not matter for the
patch as such but it would be nice to know examples of vmalloc-intensive
workloads because I cannot recall a time during the last few years where
I saw vmalloc.c high in profiles.
--
Mel Gorman
SUSE Labs
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-14 10:19 ` Mel Gorman
@ 2021-05-14 11:45 ` Uladzislau Rezki
2021-05-14 13:45 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-14 11:45 UTC (permalink / raw)
To: Mel Gorman
Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
Steven Rostedt, Matthew Wilcox
On Fri, May 14, 2021 at 11:19:20AM +0100, Mel Gorman wrote:
> On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote:
> > <SNIP>
> >
> > From the trace i get:
> >
> > <snip>
> > [ 0.243916] RIP: 0010:__alloc_pages+0x11e/0x310
> > [ 0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00
> > [ 0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246
> > [ 0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004
> > [ 0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102
> > [ 0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
> > [ 0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000
> > [ 0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000
> > [ 0.243916] FS: 0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000
> > [ 0.243916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0
> > [ 0.243916] Call Trace:
> > [ 0.243916] __alloc_pages_bulk+0xaa1/0xb50
> > <snip>
> >
> > (gdb) l *__alloc_pages+0x11e
> > 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095).
> > 1090 return zoneref->zone;
> > 1091 }
> > 1092
> > 1093 static inline int zonelist_zone_idx(struct zoneref *zoneref)
> > 1094 {
> > 1095 return zoneref->zone_idx;
> > 1096 }
> > 1097
> > 1098 static inline int zonelist_node_idx(struct zoneref *zoneref)
> > 1099 {
> > (gdb)
> >
> > Seems like "zoneref" refers to invalid address.
> >
> > Thoughts?
>
> I have not previously read the patch but there are a few concerns and it's
> probably just as well this blew up early. The bulk allocator assumes a
> valid node but the patch can send in NUMA_NO_NODE (-1).
>
Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.
>
> On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> Also, area->pages is not necessarily initialised so that could be interpreted
> as a partially populated array so minmally you need.
>
area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index f3c5dd4ccc5b9..b638ff31b07e1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> page_order = vm_area_page_order(area);
>
> if (!page_order) {
> + memset(area->pages, 0, array_size);
This memset is not needed as explained above.
> + if (node == NUMA_NO_NODE)
> + node = numa_mem_id();
This patch works. Again should it be processed by allocator?
> area->nr_pages = __alloc_pages_bulk(gfp_mask, node,
> NULL, nr_small_pages, NULL, area->pages);
> } else {
>
> However, the high-order path also looks suspicious. area->nr_pages is
> advanced before the allocation attempt so in the event alloc_pages_node()
> returns NULL prematurely, area->nr_pages does not reflect the number of
> pages allocated so that needs examination.
>
<snip>
for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
area->nr_pages += 1U << page_order) {
<snip>
if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
inside the for(...) loop, thus it will be zero if the single page allocator
fails on a first iteration.
Or i miss your point?
> As an aside, where or what is test_vmalloc.sh? It appears to have been
> used a few times but it's not clear it's representative so are you aware
> of workloads that are vmalloc-intensive? It does not matter for the
> patch as such but it would be nice to know examples of vmalloc-intensive
> workloads because I cannot recall a time during the last few years where
> I saw vmalloc.c high in profiles.
>
test_vmalloc.sh is a shell script that is used for stressing and testing a
vmalloc subsystem as well as performance evaluation. You can find it here:
./tools/testing/selftests/vm/test_vmalloc.sh
As for workloads. Most of them which are critical to time and latency. For
example audio/video, especially in the mobile area. I did a big rework of
the KVA allocator because i found it not optimal to allocation time.
To be more specific, a high resolution audio playback was suffering from
the glitches due to a long allocation time and preemption issues.
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-14 11:45 ` Uladzislau Rezki
@ 2021-05-14 13:45 ` Mel Gorman
2021-05-14 14:50 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-14 13:45 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
Matthew Wilcox
On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote:
> > > Seems like "zoneref" refers to invalid address.
> > >
> > > Thoughts?
> >
> > I have not previously read the patch but there are a few concerns and it's
> > probably just as well this blew up early. The bulk allocator assumes a
> > valid node but the patch can send in NUMA_NO_NODE (-1).
> >
>
> Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
> of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.
>
No for API similarity reasons. __alloc_pages_bulk is the API bulk
equivalent to __alloc_pages() and both expect valid node IDs. vmalloc
is using alloc_pages_node for high-order pages which first checks
the node ID so your options are to check it within vmalloc.c or add a
alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node
as a prerequisite to your patch.
> >
> > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> > Also, area->pages is not necessarily initialised so that could be interpreted
> > as a partially populated array so minmally you need.
> >
>
> area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.
>
Ah, yes.
> > However, the high-order path also looks suspicious. area->nr_pages is
> > advanced before the allocation attempt so in the event alloc_pages_node()
> > returns NULL prematurely, area->nr_pages does not reflect the number of
> > pages allocated so that needs examination.
> >
> <snip>
> for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> area->nr_pages += 1U << page_order) {
> <snip>
>
> if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> inside the for(...) loop, thus it will be zero if the single page allocator
> fails on a first iteration.
>
> Or i miss your point?
>
At the time of the break, area->nr_pages += 1U << page_order happened
before the allocation failure happens. That looks very suspicious.
> > As an aside, where or what is test_vmalloc.sh? It appears to have been
> > used a few times but it's not clear it's representative so are you aware
> > of workloads that are vmalloc-intensive? It does not matter for the
> > patch as such but it would be nice to know examples of vmalloc-intensive
> > workloads because I cannot recall a time during the last few years where
> > I saw vmalloc.c high in profiles.
> >
> test_vmalloc.sh is a shell script that is used for stressing and testing a
> vmalloc subsystem as well as performance evaluation. You can find it here:
>
> ./tools/testing/selftests/vm/test_vmalloc.sh
>
Thanks.
> As for workloads. Most of them which are critical to time and latency. For
> example audio/video, especially in the mobile area. I did a big rework of
> the KVA allocator because i found it not optimal to allocation time.
>
Can you give an example benchmark that triggers it or is it somewhat
specific to mobile platforms with drivers that use vmalloc heavily?
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-14 13:45 ` Mel Gorman
@ 2021-05-14 14:50 ` Uladzislau Rezki
2021-05-14 15:41 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-14 14:50 UTC (permalink / raw)
To: Mel Gorman
Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
Steven Rostedt, Matthew Wilcox
> On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote:
> > > > Seems like "zoneref" refers to invalid address.
> > > >
> > > > Thoughts?
> > >
> > > I have not previously read the patch but there are a few concerns and it's
> > > probably just as well this blew up early. The bulk allocator assumes a
> > > valid node but the patch can send in NUMA_NO_NODE (-1).
> > >
> >
> > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
> > of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.
> >
>
> No for API similarity reasons. __alloc_pages_bulk is the API bulk
> equivalent to __alloc_pages() and both expect valid node IDs. vmalloc
> is using alloc_pages_node for high-order pages which first checks
> the node ID so your options are to check it within vmalloc.c or add a
> alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node
> as a prerequisite to your patch.
>
OK. Thanks.
> > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> > > Also, area->pages is not necessarily initialised so that could be interpreted
> > > as a partially populated array so minmally you need.
> > >
> >
> > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.
> >
>
> Ah, yes.
>
> > > However, the high-order path also looks suspicious. area->nr_pages is
> > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > pages allocated so that needs examination.
> > >
> > <snip>
> > for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> > area->nr_pages += 1U << page_order) {
> > <snip>
> >
> > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > inside the for(...) loop, thus it will be zero if the single page allocator
> > fails on a first iteration.
> >
> > Or i miss your point?
> >
>
> At the time of the break, area->nr_pages += 1U << page_order happened
> before the allocation failure happens. That looks very suspicious.
>
The "for" loop does not work that way. If you break the loop the
"area->nr_pages += 1U << page_order" or an "increment" is not increased.
It is increased only after the body of the "for" loop executes and it
goes to next iteration.
> > > As an aside, where or what is test_vmalloc.sh? It appears to have been
> > > used a few times but it's not clear it's representative so are you aware
> > > of workloads that are vmalloc-intensive? It does not matter for the
> > > patch as such but it would be nice to know examples of vmalloc-intensive
> > > workloads because I cannot recall a time during the last few years where
> > > I saw vmalloc.c high in profiles.
> > >
> > test_vmalloc.sh is a shell script that is used for stressing and testing a
> > vmalloc subsystem as well as performance evaluation. You can find it here:
> >
> > ./tools/testing/selftests/vm/test_vmalloc.sh
> >
>
> Thanks.
>
> > As for workloads. Most of them which are critical to time and latency. For
> > example audio/video, especially in the mobile area. I did a big rework of
> > the KVA allocator because i found it not optimal to allocation time.
> >
>
> Can you give an example benchmark that triggers it or is it somewhat
> specific to mobile platforms with drivers that use vmalloc heavily?
>
See below an example of audio glitches. That was related to our phones
and audio workloads:
# Explanation is here
wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
# Audio 10 seconds sample is here.
# The drop occurs at 00:09.295 you can hear it
wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
Apart of that a slow allocation can course two type of issues. First one
is direct. When for example a high-priority RT thread does some allocation
to bypass data to DSP. Long latency courses a delay of data to be passed to
DSP. This is drivers area.
Another example is when a task is doing an allocation and the RT task is
placed onto a same CPU. In that case a long preemption-off(milliseconds)
section can lead the RT task for starvation. For mobile devices it is UI
stack where RT tasks are used. As a result we face frame drops.
All such issues have been solved after a rework:
wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-14 14:50 ` Uladzislau Rezki
@ 2021-05-14 15:41 ` Mel Gorman
2021-05-14 17:16 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-14 15:41 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
Matthew Wilcox
On Fri, May 14, 2021 at 04:50:26PM +0200, Uladzislau Rezki wrote:
> > > > However, the high-order path also looks suspicious. area->nr_pages is
> > > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > > pages allocated so that needs examination.
> > > >
> > > <snip>
> > > for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> > > area->nr_pages += 1U << page_order) {
> > > <snip>
> > >
> > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > > inside the for(...) loop, thus it will be zero if the single page allocator
> > > fails on a first iteration.
> > >
> > > Or i miss your point?
> > >
> >
> > At the time of the break, area->nr_pages += 1U << page_order happened
> > before the allocation failure happens. That looks very suspicious.
> >
> The "for" loop does not work that way. If you break the loop the
> "area->nr_pages += 1U << page_order" or an "increment" is not increased.
> It is increased only after the body of the "for" loop executes and it
> goes to next iteration.
>
Yeah, I don't know what I was thinking -- not properly anyway.
> > > As for workloads. Most of them which are critical to time and latency. For
> > > example audio/video, especially in the mobile area. I did a big rework of
> > > the KVA allocator because i found it not optimal to allocation time.
> > >
> >
> > Can you give an example benchmark that triggers it or is it somewhat
> > specific to mobile platforms with drivers that use vmalloc heavily?
> >
>
> See below an example of audio glitches. That was related to our phones
> and audio workloads:
>
> # Explanation is here
> wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
>
> # Audio 10 seconds sample is here.
> # The drop occurs at 00:09.295 you can hear it
> wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
>
> Apart of that a slow allocation can course two type of issues. First one
> is direct. When for example a high-priority RT thread does some allocation
> to bypass data to DSP. Long latency courses a delay of data to be passed to
> DSP. This is drivers area.
>
> Another example is when a task is doing an allocation and the RT task is
> placed onto a same CPU. In that case a long preemption-off(milliseconds)
> section can lead the RT task for starvation. For mobile devices it is UI
> stack where RT tasks are used. As a result we face frame drops.
>
> All such issues have been solved after a rework:
>
> wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
>
Thanks. That was enough for me to search to see what sort of general
workload would be affected. Mostly it's driver specific. A lot of the users
that would be potentially hot are already using kvmalloc so probably not
worth the effort so test_vmalloc.sh makes sense.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-14 15:41 ` Mel Gorman
@ 2021-05-14 17:16 ` Uladzislau Rezki
2021-05-16 17:17 ` Mel Gorman
0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-14 17:16 UTC (permalink / raw)
To: Mel Gorman
Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
Steven Rostedt, Matthew Wilcox
On Fri, May 14, 2021 at 04:41:53PM +0100, Mel Gorman wrote:
> On Fri, May 14, 2021 at 04:50:26PM +0200, Uladzislau Rezki wrote:
> > > > > However, the high-order path also looks suspicious. area->nr_pages is
> > > > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > > > pages allocated so that needs examination.
> > > > >
> > > > <snip>
> > > > for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> > > > area->nr_pages += 1U << page_order) {
> > > > <snip>
> > > >
> > > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > > > inside the for(...) loop, thus it will be zero if the single page allocator
> > > > fails on a first iteration.
> > > >
> > > > Or i miss your point?
> > > >
> > >
> > > At the time of the break, area->nr_pages += 1U << page_order happened
> > > before the allocation failure happens. That looks very suspicious.
> > >
> > The "for" loop does not work that way. If you break the loop the
> > "area->nr_pages += 1U << page_order" or an "increment" is not increased.
> > It is increased only after the body of the "for" loop executes and it
> > goes to next iteration.
> >
>
> Yeah, I don't know what I was thinking -- not properly anyway.
>
Hm.. To me it looks properly enough. Will see what i can do with it.
> > > > As for workloads. Most of them which are critical to time and latency. For
> > > > example audio/video, especially in the mobile area. I did a big rework of
> > > > the KVA allocator because i found it not optimal to allocation time.
> > > >
> > >
> > > Can you give an example benchmark that triggers it or is it somewhat
> > > specific to mobile platforms with drivers that use vmalloc heavily?
> > >
> >
> > See below an example of audio glitches. That was related to our phones
> > and audio workloads:
> >
> > # Explanation is here
> > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> >
> > # Audio 10 seconds sample is here.
> > # The drop occurs at 00:09.295 you can hear it
> > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> >
> > Apart of that a slow allocation can course two type of issues. First one
> > is direct. When for example a high-priority RT thread does some allocation
> > to bypass data to DSP. Long latency courses a delay of data to be passed to
> > DSP. This is drivers area.
> >
> > Another example is when a task is doing an allocation and the RT task is
> > placed onto a same CPU. In that case a long preemption-off(milliseconds)
> > section can lead the RT task for starvation. For mobile devices it is UI
> > stack where RT tasks are used. As a result we face frame drops.
> >
> > All such issues have been solved after a rework:
> >
> > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> >
>
> Thanks. That was enough for me to search to see what sort of general
> workload would be affected. Mostly it's driver specific. A lot of the users
> that would be potentially hot are already using kvmalloc so probably not
> worth the effort so test_vmalloc.sh makes sense.
>
You are welcome.
As for a helper. Does it sound good for you? BTW, once upon a time i had
asked for it :)
From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Date: Fri, 14 May 2021 18:39:08 +0200
Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
Add a "node" variant of the alloc_pages_bulk_array() function.
The helper guarantees that a __alloc_pages_bulk() is invoked
with a valid NUMA node id.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
include/linux/gfp.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 11da8af06704..94f0b8b1cb55 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -536,6 +536,15 @@ alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_arr
return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
}
+static inline unsigned long
+alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
+{
+ if (nid == NUMA_NO_NODE)
+ nid = numa_mem_id();
+
+ return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
+}
+
/*
* Allocate pages, preferring the node given as nid. The node must be valid and
* online. For more general interface, see alloc_pages_node().
--
2.20.1
--
Vlad Rezki
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-14 17:16 ` Uladzislau Rezki
@ 2021-05-16 17:17 ` Mel Gorman
2021-05-16 20:31 ` Uladzislau Rezki
0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-16 17:17 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
Matthew Wilcox
On Fri, May 14, 2021 at 07:16:23PM +0200, Uladzislau Rezki wrote:
> > > See below an example of audio glitches. That was related to our phones
> > > and audio workloads:
> > >
> > > # Explanation is here
> > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> > >
> > > # Audio 10 seconds sample is here.
> > > # The drop occurs at 00:09.295 you can hear it
> > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> > >
> > > Apart of that a slow allocation can course two type of issues. First one
> > > is direct. When for example a high-priority RT thread does some allocation
> > > to bypass data to DSP. Long latency courses a delay of data to be passed to
> > > DSP. This is drivers area.
> > >
> > > Another example is when a task is doing an allocation and the RT task is
> > > placed onto a same CPU. In that case a long preemption-off(milliseconds)
> > > section can lead the RT task for starvation. For mobile devices it is UI
> > > stack where RT tasks are used. As a result we face frame drops.
> > >
> > > All such issues have been solved after a rework:
> > >
> > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > >
> >
> > Thanks. That was enough for me to search to see what sort of general
> > workload would be affected. Mostly it's driver specific. A lot of the users
> > that would be potentially hot are already using kvmalloc so probably not
> > worth the effort so test_vmalloc.sh makes sense.
> >
> You are welcome.
>
> As for a helper. Does it sound good for you? BTW, once upon a time i had
> asked for it :)
>
The intent was that instead of guessing in advance what APIs would be
needed that users would add an API helper where appropriate.
> From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001
> From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> Date: Fri, 14 May 2021 18:39:08 +0200
> Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
>
> Add a "node" variant of the alloc_pages_bulk_array() function.
> The helper guarantees that a __alloc_pages_bulk() is invoked
> with a valid NUMA node id.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Include it as part of your series adding the vmalloc user.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
2021-05-16 17:17 ` Mel Gorman
@ 2021-05-16 20:31 ` Uladzislau Rezki
0 siblings, 0 replies; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-16 20:31 UTC (permalink / raw)
To: Mel Gorman
Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
Steven Rostedt, Matthew Wilcox
On Sun, May 16, 2021 at 06:17:38PM +0100, Mel Gorman wrote:
> On Fri, May 14, 2021 at 07:16:23PM +0200, Uladzislau Rezki wrote:
> > > > See below an example of audio glitches. That was related to our phones
> > > > and audio workloads:
> > > >
> > > > # Explanation is here
> > > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> > > >
> > > > # Audio 10 seconds sample is here.
> > > > # The drop occurs at 00:09.295 you can hear it
> > > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> > > >
> > > > Apart of that a slow allocation can course two type of issues. First one
> > > > is direct. When for example a high-priority RT thread does some allocation
> > > > to bypass data to DSP. Long latency courses a delay of data to be passed to
> > > > DSP. This is drivers area.
> > > >
> > > > Another example is when a task is doing an allocation and the RT task is
> > > > placed onto a same CPU. In that case a long preemption-off(milliseconds)
> > > > section can lead the RT task for starvation. For mobile devices it is UI
> > > > stack where RT tasks are used. As a result we face frame drops.
> > > >
> > > > All such issues have been solved after a rework:
> > > >
> > > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > > >
> > >
> > > Thanks. That was enough for me to search to see what sort of general
> > > workload would be affected. Mostly it's driver specific. A lot of the users
> > > that would be potentially hot are already using kvmalloc so probably not
> > > worth the effort so test_vmalloc.sh makes sense.
> > >
> > You are welcome.
> >
> > As for a helper. Does it sound good for you? BTW, once upon a time i had
> > asked for it :)
> >
>
> The intent was that instead of guessing in advance what APIs would be
> needed that users would add an API helper where appropriate.
>
> > From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001
> > From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> > Date: Fri, 14 May 2021 18:39:08 +0200
> > Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
> >
> > Add a "node" variant of the alloc_pages_bulk_array() function.
> > The helper guarantees that a __alloc_pages_bulk() is invoked
> > with a valid NUMA node id.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>
> Acked-by: Mel Gorman <mgorman@suse.de>
>
Thanks!
--
Vlad Rezki
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-05-16 20:31 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 20:29 [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree akpm
2021-05-12 22:56 ` Stephen Rothwell
2021-05-13 10:31 ` Uladzislau Rezki
2021-05-13 11:11 ` Mel Gorman
2021-05-13 12:46 ` Uladzislau Rezki
2021-05-13 13:24 ` Uladzislau Rezki
2021-05-13 14:18 ` Mel Gorman
[not found] ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>
2021-05-13 15:51 ` Mel Gorman
2021-05-13 20:18 ` Uladzislau Rezki
2021-05-14 10:19 ` Mel Gorman
2021-05-14 11:45 ` Uladzislau Rezki
2021-05-14 13:45 ` Mel Gorman
2021-05-14 14:50 ` Uladzislau Rezki
2021-05-14 15:41 ` Mel Gorman
2021-05-14 17:16 ` Uladzislau Rezki
2021-05-16 17:17 ` Mel Gorman
2021-05-16 20:31 ` Uladzislau Rezki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.