All of lore.kernel.org
 help / color / mirror / Atom feed
* [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
@ 2021-05-12 20:29 akpm
  2021-05-12 22:56 ` Stephen Rothwell
  0 siblings, 1 reply; 17+ messages in thread
From: akpm @ 2021-05-12 20:29 UTC (permalink / raw)
  To: hdanton, mgorman, mhocko, mm-commits, npiggin,
	oleksiy.avramchenko, rostedt, sfr, urezki, willy


The patch titled
     Subject: mm/vmalloc: print a warning message first on failure
has been removed from the -mm tree.  Its filename was
     mm-vmalloc-print-a-warning-message-first-on-failure.patch

This patch was dropped because it had testing failures

------------------------------------------------------
From: Uladzislau Rezki <urezki@gmail.com>
Subject: mm/vmalloc: print a warning message first on failure

When a memory allocation for array of pages are not succeed emit a warning
message as a first step and then perform the further cleanup.

The reason it should be done in a right order is the clean up function
which is free_vm_area() can potentially also follow its error paths what
can lead to confusion what was broken first.

Link: https://lkml.kernel.org/r/20210510103342.GA2169@pc638.lan
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmalloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/vmalloc.c~mm-vmalloc-print-a-warning-message-first-on-failure
+++ a/mm/vmalloc.c
@@ -2780,11 +2780,11 @@ static void *__vmalloc_area_node(struct
 	}
 
 	if (!area->pages) {
-		free_vm_area(area);
 		warn_alloc(gfp_mask, NULL,
 			   "vmalloc size %lu allocation failure: "
 			   "page array size %lu allocation failed",
 			   nr_small_pages * PAGE_SIZE, array_size);
+		free_vm_area(area);
 		return NULL;
 	}
 
_

Patches currently in -mm which might be from urezki@gmail.com are



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-12 20:29 [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree akpm
@ 2021-05-12 22:56 ` Stephen Rothwell
  2021-05-13 10:31   ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Rothwell @ 2021-05-12 22:56 UTC (permalink / raw)
  To: akpm
  Cc: hdanton, mgorman, mhocko, mm-commits, npiggin,
	oleksiy.avramchenko, rostedt, urezki, willy

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

Hi Andrew,

On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
>
> The patch titled
>      Subject: mm/vmalloc: print a warning message first on failure
> has been removed from the -mm tree.  Its filename was
>      mm-vmalloc-print-a-warning-message-first-on-failure.patch
> 
> This patch was dropped because it had testing failures

Removed from linux-next.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-12 22:56 ` Stephen Rothwell
@ 2021-05-13 10:31   ` Uladzislau Rezki
  2021-05-13 11:11     ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 10:31 UTC (permalink / raw)
  To: Stephen Rothwell, akpm
  Cc: akpm, hdanton, mgorman, mhocko, mm-commits, npiggin,
	oleksiy.avramchenko, rostedt, urezki, willy

On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> Hi Andrew,
> 
> On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> >
> > The patch titled
> >      Subject: mm/vmalloc: print a warning message first on failure
> > has been removed from the -mm tree.  Its filename was
> >      mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > 
> > This patch was dropped because it had testing failures
> 
> Removed from linux-next.
> 
What can of testing failures does it trigger? Where can i find the
details, logs or tracers of it?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-13 10:31   ` Uladzislau Rezki
@ 2021-05-13 11:11     ` Mel Gorman
  2021-05-13 12:46       ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-13 11:11 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin,
	oleksiy.avramchenko, rostedt, willy

On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote:
> On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> > Hi Andrew,
> > 
> > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> > >
> > > The patch titled
> > >      Subject: mm/vmalloc: print a warning message first on failure
> > > has been removed from the -mm tree.  Its filename was
> > >      mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > > 
> > > This patch was dropped because it had testing failures
> > 
> > Removed from linux-next.
> > 
> What can of testing failures does it trigger? Where can i find the
> details, logs or tracers of it?

https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-13 11:11     ` Mel Gorman
@ 2021-05-13 12:46       ` Uladzislau Rezki
  2021-05-13 13:24         ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 12:46 UTC (permalink / raw)
  To: Mel Gorman, Stephen Rothwell, akpm
  Cc: Uladzislau Rezki, Stephen Rothwell, akpm, hdanton, mhocko,
	mm-commits, npiggin, oleksiy.avramchenko, rostedt, willy

On Thu, May 13, 2021 at 12:11:53PM +0100, Mel Gorman wrote:
> On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote:
> > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> > > Hi Andrew,
> > > 
> > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> > > >
> > > > The patch titled
> > > >      Subject: mm/vmalloc: print a warning message first on failure
> > > > has been removed from the -mm tree.  Its filename was
> > > >      mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > > > 
> > > > This patch was dropped because it had testing failures
> > > 
> > > Removed from linux-next.
> > > 
> > What can of testing failures does it trigger? Where can i find the
> > details, logs or tracers of it?
> 
> https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
> 
Thanks, Mel.

OK. Now i see. The problem is with this patch:

mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()

<snip>
[    0.097819][    T1] BUG: Unable to handle kernel data access on read at 0x200000c0a
[    0.098533][    T1] Faulting instruction address: 0xc0000000003f6fa0
[    0.099044][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.099182][    T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[    0.099506][    T1] Modules linked in:
[    0.099896][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00142-g6053672bb612 #12
[    0.100254][    T1] NIP:  c0000000003f6fa0 LR: c0000000003f6f68 CTR: 0000000000000000
[    0.100342][    T1] REGS: c0000000063a3480 TRAP: 0380   Not tainted  (5.13.0-rc1-00142-g6053672bb612)
[    0.100550][    T1] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24402840  XER: 00000000
[    0.100900][    T1] CFAR: c0000000003f6f7c IRQMASK: 0 
[    0.100900][    T1] GPR00: c0000000003f6f68 c0000000063a3720 c00000000146b100 0000000000000000 
[    0.100900][    T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002 
[    0.100900][    T1] GPR08: c0000000015219e8 0000000000000000 0000000200000c02 c000000006030010 
[    0.100900][    T1] GPR12: 0000000000008000 c000000001640000 0000000000000001 c000000000262f84 
[    0.100900][    T1] GPR16: c00a000000000000 c008000000000000 0000000000000dc0 0000000000000008 
[    0.100900][    T1] GPR20: 0000000000000522 0000000000010000 0000000000000cc0 c008000000000000 
[    0.100900][    T1] GPR24: 0000000000000001 0000000000000000 0000000000002cc2 0000000000000000 
[    0.100900][    T1] GPR28: 0000000000000000 0000000000000000 0000000200000c02 0000000000002cc2 
[    0.101927][    T1] NIP [c0000000003f6fa0] __alloc_pages+0x140/0x3f0
[    0.102733][    T1] LR [c0000000003f6f68] __alloc_pages+0x108/0x3f0
[    0.103032][    T1] Call Trace:
[    0.103165][    T1] [c0000000063a3720] [0000000000000900] 0x900 (unreliable)
[    0.103616][    T1] [c0000000063a37b0] [c0000000003f7810] __alloc_pages_bulk+0x5c0/0x840
[    0.103787][    T1] [c0000000063a3890] [c0000000003ecf74] __vmalloc_node_range+0x4c4/0x600
[    0.103871][    T1] [c0000000063a39b0] [c00000000004f598] module_alloc+0x58/0x70
[    0.103962][    T1] [c0000000063a3a20] [c000000000262f84] alloc_insn_page+0x24/0x40
[    0.104046][    T1] [c0000000063a3a40] [c00000000026629c] __get_insn_slot+0x1dc/0x280
[    0.104143][    T1] [c0000000063a3a80] [c00000000005770c] arch_prepare_kprobe+0x15c/0x1f0
[    0.104290][    T1] [c0000000063a3b00] [c000000000267880] register_kprobe+0x6d0/0x850
[    0.104392][    T1] [c0000000063a3b60] [c00000000108fe2c] arch_init_kprobes+0x28/0x3c
[    0.104524][    T1] [c0000000063a3b80] [c0000000010addb0] init_kprobes+0x120/0x174
[    0.104629][    T1] [c0000000063a3bf0] [c000000000012190] do_one_initcall+0x60/0x2c0
[    0.104722][    T1] [c0000000063a3cc0] [c0000000010845a0] kernel_init_freeable+0x1bc/0x3a0
[    0.104826][    T1] [c0000000063a3da0] [c000000000012764] kernel_init+0x2c/0x168
[    0.104911][    T1] [c0000000063a3e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70
[    0.105178][    T1] Instruction dump:
[    0.105516][    T1] 40920018 57e9efbe 2c090001 4082000c 63050080 78b80020 e8a10028 57e9a7fe 
[    0.105759][    T1] 7fcaf378 99210040 2c250000 408201f4 <813e0008> 7c09c840 418101e8 57e50528 
[    0.107188][    T1] ---[ end trace 9bd7c2fac4d061e2 ]---
[    0.107319][    T1] 
[    1.108818][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
<snip>

So during the boot process when the module is about to be loaded, the vmalloc allocation
gets failed in the __alloc_pages_bulk(). 

Will try to reproduce. It would be good to get a kernel config.
Appreciate for any thoughts about it?

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-13 12:46       ` Uladzislau Rezki
@ 2021-05-13 13:24         ` Uladzislau Rezki
  2021-05-13 14:18           ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 13:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin,
	oleksiy.avramchenko, rostedt, willy

On Thu, May 13, 2021 at 02:46:05PM +0200, Uladzislau Rezki wrote:
> On Thu, May 13, 2021 at 12:11:53PM +0100, Mel Gorman wrote:
> > On Thu, May 13, 2021 at 12:31:56PM +0200, Uladzislau Rezki wrote:
> > > On Thu, May 13, 2021 at 08:56:02AM +1000, Stephen Rothwell wrote:
> > > > Hi Andrew,
> > > > 
> > > > On Wed, 12 May 2021 13:29:52 -0700 akpm@linux-foundation.org wrote:
> > > > >
> > > > > The patch titled
> > > > >      Subject: mm/vmalloc: print a warning message first on failure
> > > > > has been removed from the -mm tree.  Its filename was
> > > > >      mm-vmalloc-print-a-warning-message-first-on-failure.patch
> > > > > 
> > > > > This patch was dropped because it had testing failures
> > > > 
> > > > Removed from linux-next.
> > > > 
> > > What can of testing failures does it trigger? Where can i find the
> > > details, logs or tracers of it?
> > 
> > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
> > 
> Thanks, Mel.
> 
> OK. Now i see. The problem is with this patch:
> 
> mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
> 
> <snip>
> [    0.097819][    T1] BUG: Unable to handle kernel data access on read at 0x200000c0a
> [    0.098533][    T1] Faulting instruction address: 0xc0000000003f6fa0
> [    0.099044][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.099182][    T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [    0.099506][    T1] Modules linked in:
> [    0.099896][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1-00142-g6053672bb612 #12
> [    0.100254][    T1] NIP:  c0000000003f6fa0 LR: c0000000003f6f68 CTR: 0000000000000000
> [    0.100342][    T1] REGS: c0000000063a3480 TRAP: 0380   Not tainted  (5.13.0-rc1-00142-g6053672bb612)
> [    0.100550][    T1] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24402840  XER: 00000000
> [    0.100900][    T1] CFAR: c0000000003f6f7c IRQMASK: 0 
> [    0.100900][    T1] GPR00: c0000000003f6f68 c0000000063a3720 c00000000146b100 0000000000000000 
> [    0.100900][    T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002 
> [    0.100900][    T1] GPR08: c0000000015219e8 0000000000000000 0000000200000c02 c000000006030010 
> [    0.100900][    T1] GPR12: 0000000000008000 c000000001640000 0000000000000001 c000000000262f84 
> [    0.100900][    T1] GPR16: c00a000000000000 c008000000000000 0000000000000dc0 0000000000000008 
> [    0.100900][    T1] GPR20: 0000000000000522 0000000000010000 0000000000000cc0 c008000000000000 
> [    0.100900][    T1] GPR24: 0000000000000001 0000000000000000 0000000000002cc2 0000000000000000 
> [    0.100900][    T1] GPR28: 0000000000000000 0000000000000000 0000000200000c02 0000000000002cc2 
> [    0.101927][    T1] NIP [c0000000003f6fa0] __alloc_pages+0x140/0x3f0
> [    0.102733][    T1] LR [c0000000003f6f68] __alloc_pages+0x108/0x3f0
> [    0.103032][    T1] Call Trace:
> [    0.103165][    T1] [c0000000063a3720] [0000000000000900] 0x900 (unreliable)
> [    0.103616][    T1] [c0000000063a37b0] [c0000000003f7810] __alloc_pages_bulk+0x5c0/0x840
> [    0.103787][    T1] [c0000000063a3890] [c0000000003ecf74] __vmalloc_node_range+0x4c4/0x600
> [    0.103871][    T1] [c0000000063a39b0] [c00000000004f598] module_alloc+0x58/0x70
> [    0.103962][    T1] [c0000000063a3a20] [c000000000262f84] alloc_insn_page+0x24/0x40
> [    0.104046][    T1] [c0000000063a3a40] [c00000000026629c] __get_insn_slot+0x1dc/0x280
> [    0.104143][    T1] [c0000000063a3a80] [c00000000005770c] arch_prepare_kprobe+0x15c/0x1f0
> [    0.104290][    T1] [c0000000063a3b00] [c000000000267880] register_kprobe+0x6d0/0x850
> [    0.104392][    T1] [c0000000063a3b60] [c00000000108fe2c] arch_init_kprobes+0x28/0x3c
> [    0.104524][    T1] [c0000000063a3b80] [c0000000010addb0] init_kprobes+0x120/0x174
> [    0.104629][    T1] [c0000000063a3bf0] [c000000000012190] do_one_initcall+0x60/0x2c0
> [    0.104722][    T1] [c0000000063a3cc0] [c0000000010845a0] kernel_init_freeable+0x1bc/0x3a0
> [    0.104826][    T1] [c0000000063a3da0] [c000000000012764] kernel_init+0x2c/0x168
> [    0.104911][    T1] [c0000000063a3e10] [c00000000000d5ec] ret_from_kernel_thread+0x5c/0x70
> [    0.105178][    T1] Instruction dump:
> [    0.105516][    T1] 40920018 57e9efbe 2c090001 4082000c 63050080 78b80020 e8a10028 57e9a7fe 
> [    0.105759][    T1] 7fcaf378 99210040 2c250000 408201f4 <813e0008> 7c09c840 418101e8 57e50528 
> [    0.107188][    T1] ---[ end trace 9bd7c2fac4d061e2 ]---
> [    0.107319][    T1] 
> [    1.108818][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> <snip>
> 
> So during the boot process when the module is about to be loaded, the vmalloc allocation
> gets failed in the __alloc_pages_bulk(). 
> 
> Will try to reproduce. It would be good to get a kernel config.
> Appreciate for any thoughts about it?
> 

I see that on the target machine when the problem occurs the PAGE_SIZE is 64K. 

Can it be somehow connected to it? Also one question, just guessing, the crash
happens during the boot, therefore the question is: is __alloc_pages_bulk()
fully initialized by that time? 

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-13 13:24         ` Uladzislau Rezki
@ 2021-05-13 14:18           ` Mel Gorman
       [not found]             ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-13 14:18 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, akpm, hdanton, mhocko, mm-commits, npiggin,
	oleksiy.avramchenko, rostedt, willy

On Thu, May 13, 2021 at 03:24:18PM +0200, Uladzislau Rezki wrote:
> > > > What can of testing failures does it trigger? Where can i find the
> > > > details, logs or tracers of it?
> > > 
> > > https://lore.kernel.org/linux-next/20210512175359.17793d34@canb.auug.org.au/
> > > 
> > Thanks, Mel.
> > 
> > OK. Now i see. The problem is with this patch:
> > 
> > mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
> > 
> > <SNIP>
> > 
> > So during the boot process when the module is about to be loaded, the vmalloc allocation
> > gets failed in the __alloc_pages_bulk(). 
> > 
> > Will try to reproduce. It would be good to get a kernel config.
> > Appreciate for any thoughts about it?
> > 
> 
> I see that on the target machine when the problem occurs the PAGE_SIZE is 64K. 
> 
> Can it be somehow connected to it? Also one question, just guessing, the crash
> happens during the boot, therefore the question is: is __alloc_pages_bulk()
> fully initialized by that time? 
> 

A boot test on KVM using my distribution config (openSUSE Leap 15.2)
failed on x86-64 so it's not related to PAGE_SIZE.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
       [not found]             ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>
@ 2021-05-13 15:51               ` Mel Gorman
  2021-05-13 20:18                 ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-13 15:51 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
	mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
	Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 176 bytes --]

On Thu, May 13, 2021 at 05:29:05PM +0200, Uladzislau Rezki wrote:
> Could you please send your config? I will try to reproduce with it.
> 

Attached.

-- 
Mel Gorman
SUSE Labs

[-- Attachment #2: config-bulkvmalloc.gz --]
[-- Type: application/x-gzip, Size: 59818 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-13 15:51               ` Mel Gorman
@ 2021-05-13 20:18                 ` Uladzislau Rezki
  2021-05-14 10:19                   ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-13 20:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
	Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
	Steven Rostedt, Matthew Wilcox

On Thu, May 13, 2021 at 04:51:33PM +0100, Mel Gorman wrote:
> On Thu, May 13, 2021 at 05:29:05PM +0200, Uladzislau Rezki wrote:
> > Could you please send your config? I will try to reproduce with it.
> > 
> 
> Attached.
> 
Thanks.

With your .config file i am able to reproduce the kernel panic. Actually
when a one page is requested the __alloc_pages_bulk() enters to the single
page allocator:

<snip>
    /* Use the single page allocator for one page. */
    if (nr_pages - nr_populated == 1)
        goto failed;

...

failed:
    page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
    if (page) {
        if (page_list)
            list_add(&page->lru, page_list);
        else
            page_array[nr_populated] = page;
                nr_populated++;
    }

    return nr_populated;
<snip>

From the trace i get:

<snip>
[    0.243916] RIP: 0010:__alloc_pages+0x11e/0x310
[    0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00
[    0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246
[    0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004
[    0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102
[    0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
[    0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000
[    0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000
[    0.243916] FS:  0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000
[    0.243916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0
[    0.243916] Call Trace:
[    0.243916]  __alloc_pages_bulk+0xaa1/0xb50
<snip>

(gdb) l *__alloc_pages+0x11e
0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095).
1090            return zoneref->zone;
1091    }
1092
1093    static inline int zonelist_zone_idx(struct zoneref *zoneref)
1094    {
1095            return zoneref->zone_idx;
1096    }
1097
1098    static inline int zonelist_node_idx(struct zoneref *zoneref)
1099    {
(gdb)

Seems like "zoneref" refers to invalid address.

Thoughts?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-13 20:18                 ` Uladzislau Rezki
@ 2021-05-14 10:19                   ` Mel Gorman
  2021-05-14 11:45                     ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-14 10:19 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
	mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
	Matthew Wilcox

On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote:
> <SNIP>
>
> From the trace i get:
> 
> <snip>
> [    0.243916] RIP: 0010:__alloc_pages+0x11e/0x310
> [    0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00
> [    0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246
> [    0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004
> [    0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102
> [    0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
> [    0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000
> [    0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000
> [    0.243916] FS:  0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000
> [    0.243916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0
> [    0.243916] Call Trace:
> [    0.243916]  __alloc_pages_bulk+0xaa1/0xb50
> <snip>
> 
> (gdb) l *__alloc_pages+0x11e
> 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095).
> 1090            return zoneref->zone;
> 1091    }
> 1092
> 1093    static inline int zonelist_zone_idx(struct zoneref *zoneref)
> 1094    {
> 1095            return zoneref->zone_idx;
> 1096    }
> 1097
> 1098    static inline int zonelist_node_idx(struct zoneref *zoneref)
> 1099    {
> (gdb)
> 
> Seems like "zoneref" refers to invalid address.
> 
> Thoughts?

I have not previously read the patch but there are a few concerns and it's
probably just as well this blew up early. The bulk allocator assumes a
valid node but the patch can send in NUMA_NO_NODE (-1). On the high-order
path alloc_pages_node is used which checks nid == NUMA_NO_NODE. Also,
area->pages is not necessarily initialised so that could be interpreted
as a partially populated array so minmally you need.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f3c5dd4ccc5b9..b638ff31b07e1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	page_order = vm_area_page_order(area);
 
 	if (!page_order) {
+		memset(area->pages, 0, array_size);
+		if (node == NUMA_NO_NODE)
+			node = numa_mem_id();
 		area->nr_pages = __alloc_pages_bulk(gfp_mask, node,
 			NULL, nr_small_pages, NULL, area->pages);
 	} else {

However, the high-order path also looks suspicious. area->nr_pages is
advanced before the allocation attempt so in the event alloc_pages_node()
returns NULL prematurely, area->nr_pages does not reflect the number of
pages allocated so that needs examination.

As an aside, where or what is test_vmalloc.sh? It appears to have been
used a few times but it's not clear it's representative so are you aware
of workloads that are vmalloc-intensive? It does not matter for the
patch as such but it would be nice to know examples of vmalloc-intensive
workloads because I cannot recall a time during the last few years where
I saw vmalloc.c high in profiles.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-14 10:19                   ` Mel Gorman
@ 2021-05-14 11:45                     ` Uladzislau Rezki
  2021-05-14 13:45                       ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-14 11:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
	Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
	Steven Rostedt, Matthew Wilcox

On Fri, May 14, 2021 at 11:19:20AM +0100, Mel Gorman wrote:
> On Thu, May 13, 2021 at 10:18:51PM +0200, Uladzislau Rezki wrote:
> > <SNIP>
> >
> > From the trace i get:
> > 
> > <snip>
> > [    0.243916] RIP: 0010:__alloc_pages+0x11e/0x310
> > [    0.243916] Code: 84 c0 0f 85 02 01 00 00 89 d8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 83 e0 01 88 44 24 20 48 8b 04 24 48 85 d2 0f 85 71 01 00 00 <3b> 70 08 0f 82 68 01 00 00 48 89 44 24 10 48 8b 00 89 da 81 e2 00
> > [    0.243916] RSP: 0000:ffffffffae803c38 EFLAGS: 00010246
> > [    0.243916] RAX: 0000000000001cc0 RBX: 0000000000002102 RCX: 0000000000000004
> > [    0.243916] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002102
> > [    0.243916] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffffdfff
> > [    0.243916] R10: 0000000000000001 R11: ffffffffae803ac0 R12: 0000000000000000
> > [    0.243916] R13: 0000000000002102 R14: 0000000000000001 R15: ffffa0938000d000
> > [    0.243916] FS:  0000000000000000(0000) GS:ffff893ab7c00000(0000) knlGS:0000000000000000
> > [    0.243916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.243916] CR2: 0000000000001cc8 CR3: 0000000176e10000 CR4: 00000000000006b0
> > [    0.243916] Call Trace:
> > [    0.243916]  __alloc_pages_bulk+0xaa1/0xb50
> > <snip>
> > 
> > (gdb) l *__alloc_pages+0x11e
> > 0xffffffff8129d87e is in __alloc_pages (./include/linux/mmzone.h:1095).
> > 1090            return zoneref->zone;
> > 1091    }
> > 1092
> > 1093    static inline int zonelist_zone_idx(struct zoneref *zoneref)
> > 1094    {
> > 1095            return zoneref->zone_idx;
> > 1096    }
> > 1097
> > 1098    static inline int zonelist_node_idx(struct zoneref *zoneref)
> > 1099    {
> > (gdb)
> > 
> > Seems like "zoneref" refers to invalid address.
> > 
> > Thoughts?
> 
> I have not previously read the patch but there are a few concerns and it's
> probably just as well this blew up early. The bulk allocator assumes a
> valid node but the patch can send in NUMA_NO_NODE (-1). 
>
Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.

>
> On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> Also, area->pages is not necessarily initialised so that could be interpreted
> as a partially populated array so minmally you need.
>
area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.

> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index f3c5dd4ccc5b9..b638ff31b07e1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2792,6 +2792,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	page_order = vm_area_page_order(area);
>  
>  	if (!page_order) {
> +		memset(area->pages, 0, array_size);
This memset is not needed as explained above.

> +		if (node == NUMA_NO_NODE)
> +			node = numa_mem_id();
This patch works. Again should it be processed by allocator?

>  		area->nr_pages = __alloc_pages_bulk(gfp_mask, node,
>  			NULL, nr_small_pages, NULL, area->pages);
>  	} else {
> 
> However, the high-order path also looks suspicious. area->nr_pages is
> advanced before the allocation attempt so in the event alloc_pages_node()
> returns NULL prematurely, area->nr_pages does not reflect the number of
> pages allocated so that needs examination.
> 
<snip>
    for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
        area->nr_pages += 1U << page_order) {
<snip>

if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
inside the for(...) loop, thus it will be zero if the single page allocator
fails on a first iteration.

Or i miss your point?

> As an aside, where or what is test_vmalloc.sh? It appears to have been
> used a few times but it's not clear it's representative so are you aware
> of workloads that are vmalloc-intensive? It does not matter for the
> patch as such but it would be nice to know examples of vmalloc-intensive
> workloads because I cannot recall a time during the last few years where
> I saw vmalloc.c high in profiles.
> 
test_vmalloc.sh is a shell script that is used for stressing and testing a
vmalloc subsystem as well as performance evaluation. You can find it here:

./tools/testing/selftests/vm/test_vmalloc.sh

As for workloads. Most of them which are critical to time and latency. For
example audio/video, especially in the mobile area. I did a big rework of
the KVA allocator because i found it not optimal to allocation time.

To be more specific, a high resolution audio playback was suffering from
the glitches due to a long allocation time and preemption issues.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-14 11:45                     ` Uladzislau Rezki
@ 2021-05-14 13:45                       ` Mel Gorman
  2021-05-14 14:50                         ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-14 13:45 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
	mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
	Matthew Wilcox

On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote:
> > > Seems like "zoneref" refers to invalid address.
> > > 
> > > Thoughts?
> > 
> > I have not previously read the patch but there are a few concerns and it's
> > probably just as well this blew up early. The bulk allocator assumes a
> > valid node but the patch can send in NUMA_NO_NODE (-1). 
> >
>
> Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
> of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.
> 

No for API similarity reasons. __alloc_pages_bulk is the API bulk
equivalent to __alloc_pages() and both expect valid node IDs. vmalloc
is using alloc_pages_node for high-order pages which first checks
the node ID so your options are to check it within vmalloc.c or add a
alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node
as a prerequisite to your patch.

> >
> > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> > Also, area->pages is not necessarily initialised so that could be interpreted
> > as a partially populated array so minmally you need.
> >
>
> area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.
> 

Ah, yes.

> > However, the high-order path also looks suspicious. area->nr_pages is
> > advanced before the allocation attempt so in the event alloc_pages_node()
> > returns NULL prematurely, area->nr_pages does not reflect the number of
> > pages allocated so that needs examination.
> > 
> <snip>
>     for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
>         area->nr_pages += 1U << page_order) {
> <snip>
> 
> if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> inside the for(...) loop, thus it will be zero if the single page allocator
> fails on a first iteration.
> 
> Or i miss your point?
> 

At the time of the break, area->nr_pages += 1U << page_order happened
before the allocation failure happens. That looks very suspicious.

> > As an aside, where or what is test_vmalloc.sh? It appears to have been
> > used a few times but it's not clear it's representative so are you aware
> > of workloads that are vmalloc-intensive? It does not matter for the
> > patch as such but it would be nice to know examples of vmalloc-intensive
> > workloads because I cannot recall a time during the last few years where
> > I saw vmalloc.c high in profiles.
> > 
> test_vmalloc.sh is a shell script that is used for stressing and testing a
> vmalloc subsystem as well as performance evaluation. You can find it here:
> 
> ./tools/testing/selftests/vm/test_vmalloc.sh
> 

Thanks.

> As for workloads. Most of them which are critical to time and latency. For
> example audio/video, especially in the mobile area. I did a big rework of
> the KVA allocator because i found it not optimal to allocation time.
> 

Can you give an example benchmark that triggers it or is it somewhat
specific to mobile platforms with drivers that use vmalloc heavily?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-14 13:45                       ` Mel Gorman
@ 2021-05-14 14:50                         ` Uladzislau Rezki
  2021-05-14 15:41                           ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-14 14:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
	Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
	Steven Rostedt, Matthew Wilcox

> On Fri, May 14, 2021 at 01:45:43PM +0200, Uladzislau Rezki wrote:
> > > > Seems like "zoneref" refers to invalid address.
> > > > 
> > > > Thoughts?
> > > 
> > > I have not previously read the patch but there are a few concerns and it's
> > > probably just as well this blew up early. The bulk allocator assumes a
> > > valid node but the patch can send in NUMA_NO_NODE (-1). 
> > >
> >
> > Should the bulk-allocator handle the NUMA_NO_NODE on its own? I mean instead
> > of handling by user the allocator itself fixes it if NUMA_NO_NODE is passed.
> > 
> 
> No for API similarity reasons. __alloc_pages_bulk is the API bulk
> equivalent to __alloc_pages() and both expect valid node IDs. vmalloc
> is using alloc_pages_node for high-order pages which first checks
> the node ID so your options are to check it within vmalloc.c or add a
> alloc_pages_node_bulk helper that is API equivalent to alloc_pages_node
> as a prerequisite to your patch.
> 
OK. Thanks.

> > > On the high-order path alloc_pages_node is used which checks nid == NUMA_NO_NODE.
> > > Also, area->pages is not necessarily initialised so that could be interpreted
> > > as a partially populated array so minmally you need.
> > >
> >
> > area->pages are zeroed, because __GFP_ZERO is sued during allocating an array.
> > 
> 
> Ah, yes.
> 
> > > However, the high-order path also looks suspicious. area->nr_pages is
> > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > pages allocated so that needs examination.
> > > 
> > <snip>
> >     for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> >         area->nr_pages += 1U << page_order) {
> > <snip>
> > 
> > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > inside the for(...) loop, thus it will be zero if the single page allocator
> > fails on a first iteration.
> > 
> > Or i miss your point?
> > 
> 
> At the time of the break, area->nr_pages += 1U << page_order happened
> before the allocation failure happens. That looks very suspicious.
> 
The "for" loop does not work that way. If you break the loop the
"area->nr_pages += 1U << page_order" or an "increment" is not increased.
It is increased only after the body of the "for" loop executes and it
goes to next iteration.

> > > As an aside, where or what is test_vmalloc.sh? It appears to have been
> > > used a few times but it's not clear it's representative so are you aware
> > > of workloads that are vmalloc-intensive? It does not matter for the
> > > patch as such but it would be nice to know examples of vmalloc-intensive
> > > workloads because I cannot recall a time during the last few years where
> > > I saw vmalloc.c high in profiles.
> > > 
> > test_vmalloc.sh is a shell script that is used for stressing and testing a
> > vmalloc subsystem as well as performance evaluation. You can find it here:
> > 
> > ./tools/testing/selftests/vm/test_vmalloc.sh
> > 
> 
> Thanks.
> 
> > As for workloads. Most of them which are critical to time and latency. For
> > example audio/video, especially in the mobile area. I did a big rework of
> > the KVA allocator because i found it not optimal to allocation time.
> > 
> 
> Can you give an example benchmark that triggers it or is it somewhat
> specific to mobile platforms with drivers that use vmalloc heavily?
> 
See below an example of audio glitches. That was related to our phones
and audio workloads:

# Explanation is here
wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt

# Audio 10 seconds sample is here.
# The drop occurs at 00:09.295 you can hear it
wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav

Apart of that a slow allocation can course two type of issues. First one
is direct. When for example a high-priority RT thread does some allocation
to bypass data to DSP. Long latency courses a delay of data to be passed to
DSP. This is drivers area.

Another example is when a task is doing an allocation and the RT task is
placed onto a same CPU. In that case a long preemption-off(milliseconds)
section can lead the RT task for starvation. For mobile devices it is UI
stack where RT tasks are used. As a result we face frame drops.

All such issues have been solved after a rework:

wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-14 14:50                         ` Uladzislau Rezki
@ 2021-05-14 15:41                           ` Mel Gorman
  2021-05-14 17:16                             ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-14 15:41 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
	mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
	Matthew Wilcox

On Fri, May 14, 2021 at 04:50:26PM +0200, Uladzislau Rezki wrote:
> > > > However, the high-order path also looks suspicious. area->nr_pages is
> > > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > > pages allocated so that needs examination.
> > > > 
> > > <snip>
> > >     for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> > >         area->nr_pages += 1U << page_order) {
> > > <snip>
> > > 
> > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > > inside the for(...) loop, thus it will be zero if the single page allocator
> > > fails on a first iteration.
> > > 
> > > Or i miss your point?
> > > 
> > 
> > At the time of the break, area->nr_pages += 1U << page_order happened
> > before the allocation failure happens. That looks very suspicious.
> > 
> The "for" loop does not work that way. If you break the loop the
> "area->nr_pages += 1U << page_order" or an "increment" is not increased.
> It is increased only after the body of the "for" loop executes and it
> goes to next iteration.
> 

Yeah, I don't know what I was thinking -- not properly anyway.

> > > As for workloads. Most of them which are critical to time and latency. For
> > > example audio/video, especially in the mobile area. I did a big rework of
> > > the KVA allocator because i found it not optimal to allocation time.
> > > 
> > 
> > Can you give an example benchmark that triggers it or is it somewhat
> > specific to mobile platforms with drivers that use vmalloc heavily?
> > 
>
> See below an example of audio glitches. That was related to our phones
> and audio workloads:
> 
> # Explanation is here
> wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> 
> # Audio 10 seconds sample is here.
> # The drop occurs at 00:09.295 you can hear it
> wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> 
> Apart of that a slow allocation can course two type of issues. First one
> is direct. When for example a high-priority RT thread does some allocation
> to bypass data to DSP. Long latency courses a delay of data to be passed to
> DSP. This is drivers area.
> 
> Another example is when a task is doing an allocation and the RT task is
> placed onto a same CPU. In that case a long preemption-off(milliseconds)
> section can lead the RT task for starvation. For mobile devices it is UI
> stack where RT tasks are used. As a result we face frame drops.
> 
> All such issues have been solved after a rework:
> 
> wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> 

Thanks. That was enough for me to search to see what sort of general
workload would be affected. Mostly it's driver specific. A lot of the users
that would be potentially hot are already using kvmalloc so probably not
worth the effort so test_vmalloc.sh makes sense.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-14 15:41                           ` Mel Gorman
@ 2021-05-14 17:16                             ` Uladzislau Rezki
  2021-05-16 17:17                               ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-14 17:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
	Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
	Steven Rostedt, Matthew Wilcox

On Fri, May 14, 2021 at 04:41:53PM +0100, Mel Gorman wrote:
> On Fri, May 14, 2021 at 04:50:26PM +0200, Uladzislau Rezki wrote:
> > > > > However, the high-order path also looks suspicious. area->nr_pages is
> > > > > advanced before the allocation attempt so in the event alloc_pages_node()
> > > > > returns NULL prematurely, area->nr_pages does not reflect the number of
> > > > > pages allocated so that needs examination.
> > > > > 
> > > > <snip>
> > > >     for (area->nr_pages = 0; area->nr_pages < nr_small_pages;
> > > >         area->nr_pages += 1U << page_order) {
> > > > <snip>
> > > > 
> > > > if alloc_pages_node() fails we break the loop. area->nr_pages is initialized
> > > > inside the for(...) loop, thus it will be zero if the single page allocator
> > > > fails on a first iteration.
> > > > 
> > > > Or i miss your point?
> > > > 
> > > 
> > > At the time of the break, area->nr_pages += 1U << page_order happened
> > > before the allocation failure happens. That looks very suspicious.
> > > 
> > The "for" loop does not work that way. If you break the loop the
> > "area->nr_pages += 1U << page_order" or an "increment" is not increased.
> > It is increased only after the body of the "for" loop executes and it
> > goes to next iteration.
> > 
> 
> Yeah, I don't know what I was thinking -- not properly anyway.
> 
Hm.. To me it looks properly enough. Will see what i can do with it.

> > > > As for workloads. Most of them which are critical to time and latency. For
> > > > example audio/video, especially in the mobile area. I did a big rework of
> > > > the KVA allocator because i found it not optimal to allocation time.
> > > > 
> > > 
> > > Can you give an example benchmark that triggers it or is it somewhat
> > > specific to mobile platforms with drivers that use vmalloc heavily?
> > > 
> >
> > See below an example of audio glitches. That was related to our phones
> > and audio workloads:
> > 
> > # Explanation is here
> > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> > 
> > # Audio 10 seconds sample is here.
> > # The drop occurs at 00:09.295 you can hear it
> > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> > 
> > Apart of that a slow allocation can course two type of issues. First one
> > is direct. When for example a high-priority RT thread does some allocation
> > to bypass data to DSP. Long latency courses a delay of data to be passed to
> > DSP. This is drivers area.
> > 
> > Another example is when a task is doing an allocation and the RT task is
> > placed onto a same CPU. In that case a long preemption-off(milliseconds)
> > section can lead the RT task for starvation. For mobile devices it is UI
> > stack where RT tasks are used. As a result we face frame drops.
> > 
> > All such issues have been solved after a rework:
> > 
> > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > 
> 
> Thanks. That was enough for me to search to see what sort of general
> workload would be affected. Mostly it's driver specific. A lot of the users
> that would be potentially hot are already using kvmalloc so probably not
> worth the effort so test_vmalloc.sh makes sense.
> 
You are welcome.

As for a helper. Does it sound good for you? BTW, once upon a time i had
asked for it :)

From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Date: Fri, 14 May 2021 18:39:08 +0200
Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper

Add a "node" variant of the alloc_pages_bulk_array() function.
The helper guarantees that a __alloc_pages_bulk() is invoked
with a valid NUMA node id.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/gfp.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 11da8af06704..94f0b8b1cb55 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -536,6 +536,15 @@ alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_arr
 	return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
 }
 
+static inline unsigned long
+alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
+{
+	if (nid == NUMA_NO_NODE)
+		nid = numa_mem_id();
+
+	return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
+}
+
 /*
  * Allocate pages, preferring the node given as nid. The node must be valid and
  * online. For more general interface, see alloc_pages_node().
-- 
2.20.1

--
Vlad Rezki

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-14 17:16                             ` Uladzislau Rezki
@ 2021-05-16 17:17                               ` Mel Gorman
  2021-05-16 20:31                                 ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Mel Gorman @ 2021-05-16 17:17 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Stephen Rothwell, Andrew Morton, Hillf Danton, Michal Hocko,
	mm-commits, Nicholas Piggin, Oleksiy Avramchenko, Steven Rostedt,
	Matthew Wilcox

On Fri, May 14, 2021 at 07:16:23PM +0200, Uladzislau Rezki wrote:
> > > See below an example of audio glitches. That was related to our phones
> > > and audio workloads:
> > > 
> > > # Explanation is here
> > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> > > 
> > > # Audio 10 seconds sample is here.
> > > # The drop occurs at 00:09.295 you can hear it
> > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> > > 
> > > Apart of that a slow allocation can course two type of issues. First one
> > > is direct. When for example a high-priority RT thread does some allocation
> > > to bypass data to DSP. Long latency courses a delay of data to be passed to
> > > DSP. This is drivers area.
> > > 
> > > Another example is when a task is doing an allocation and the RT task is
> > > placed onto a same CPU. In that case a long preemption-off(milliseconds)
> > > section can lead the RT task for starvation. For mobile devices it is UI
> > > stack where RT tasks are used. As a result we face frame drops.
> > > 
> > > All such issues have been solved after a rework:
> > > 
> > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > > 
> > 
> > Thanks. That was enough for me to search to see what sort of general
> > workload would be affected. Mostly it's driver specific. A lot of the users
> > that would be potentially hot are already using kvmalloc so probably not
> > worth the effort so test_vmalloc.sh makes sense.
> > 
> You are welcome.
> 
> As for a helper. Does it sound good for you? BTW, once upon a time i had
> asked for it :)
> 

The intent was that instead of guessing in advance what APIs would be
needed that users would add an API helper where appropriate.

> From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001
> From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> Date: Fri, 14 May 2021 18:39:08 +0200
> Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
> 
> Add a "node" variant of the alloc_pages_bulk_array() function.
> The helper guarantees that a __alloc_pages_bulk() is invoked
> with a valid NUMA node id.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Acked-by: Mel Gorman <mgorman@suse.de>

Include it as part of your series adding the vmalloc user.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree
  2021-05-16 17:17                               ` Mel Gorman
@ 2021-05-16 20:31                                 ` Uladzislau Rezki
  0 siblings, 0 replies; 17+ messages in thread
From: Uladzislau Rezki @ 2021-05-16 20:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Uladzislau Rezki, Stephen Rothwell, Andrew Morton, Hillf Danton,
	Michal Hocko, mm-commits, Nicholas Piggin, Oleksiy Avramchenko,
	Steven Rostedt, Matthew Wilcox

On Sun, May 16, 2021 at 06:17:38PM +0100, Mel Gorman wrote:
> On Fri, May 14, 2021 at 07:16:23PM +0200, Uladzislau Rezki wrote:
> > > > See below an example of audio glitches. That was related to our phones
> > > > and audio workloads:
> > > > 
> > > > # Explanation is here
> > > > wget ftp://vps418301.ovh.net/incoming/analysis_audio_glitches.txt
> > > > 
> > > > # Audio 10 seconds sample is here.
> > > > # The drop occurs at 00:09.295 you can hear it
> > > > wget ftp://vps418301.ovh.net/incoming/tst_440_HZ_tmp_1.wav
> > > > 
> > > > Apart of that a slow allocation can course two type of issues. First one
> > > > is direct. When for example a high-priority RT thread does some allocation
> > > > to bypass data to DSP. Long latency courses a delay of data to be passed to
> > > > DSP. This is drivers area.
> > > > 
> > > > Another example is when a task is doing an allocation and the RT task is
> > > > placed onto a same CPU. In that case a long preemption-off(milliseconds)
> > > > section can lead the RT task for starvation. For mobile devices it is UI
> > > > stack where RT tasks are used. As a result we face frame drops.
> > > > 
> > > > All such issues have been solved after a rework:
> > > > 
> > > > wget ftp://vps418301.ovh.net/incoming/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> > > > 
> > > 
> > > Thanks. That was enough for me to search to see what sort of general
> > > workload would be affected. Mostly it's driver specific. A lot of the users
> > > that would be potentially hot are already using kvmalloc so probably not
> > > worth the effort so test_vmalloc.sh makes sense.
> > > 
> > You are welcome.
> > 
> > As for a helper. Does it sound good for you? BTW, once upon a time i had
> > asked for it :)
> > 
> 
> The intent was that instead of guessing in advance what APIs would be
> needed that users would add an API helper where appropriate.
> 
> > From b4b0de2990defd43453ddcd2839521d117cb3bd9 Mon Sep 17 00:00:00 2001
> > From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> > Date: Fri, 14 May 2021 18:39:08 +0200
> > Subject: [PATCH] mm/page_alloc: Add an alloc_pages_bulk_array_node() helper
> > 
> > Add a "node" variant of the alloc_pages_bulk_array() function.
> > The helper guarantees that a __alloc_pages_bulk() is invoked
> > with a valid NUMA node id.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> 
Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-16 20:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 20:29 [failures] mm-vmalloc-print-a-warning-message-first-on-failure.patch removed from -mm tree akpm
2021-05-12 22:56 ` Stephen Rothwell
2021-05-13 10:31   ` Uladzislau Rezki
2021-05-13 11:11     ` Mel Gorman
2021-05-13 12:46       ` Uladzislau Rezki
2021-05-13 13:24         ` Uladzislau Rezki
2021-05-13 14:18           ` Mel Gorman
     [not found]             ` <CA+KHdyXwdkosDYk4bKtRLVodrwUJnq3NN39xuRQzKJSPTn7+bQ@mail.gmail.com>
2021-05-13 15:51               ` Mel Gorman
2021-05-13 20:18                 ` Uladzislau Rezki
2021-05-14 10:19                   ` Mel Gorman
2021-05-14 11:45                     ` Uladzislau Rezki
2021-05-14 13:45                       ` Mel Gorman
2021-05-14 14:50                         ` Uladzislau Rezki
2021-05-14 15:41                           ` Mel Gorman
2021-05-14 17:16                             ` Uladzislau Rezki
2021-05-16 17:17                               ` Mel Gorman
2021-05-16 20:31                                 ` Uladzislau Rezki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.