linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* OOPSes in mem_cgroup_protected
@ 2018-06-13  1:02 John Stultz
  2018-06-13  4:08 ` John Stultz
  0 siblings, 1 reply; 5+ messages in thread
From: John Stultz @ 2018-06-13  1:02 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Hocko; +Cc: lkml

Hey Tejun,
  With the current linus/master, I'm able to fairly regularly trip
OOPSes (two examples below) in mem_cgroup_protected(), which seems to
be new.  I haven't managed to trigger this sort of thing with v4.17.

I've not had much time to dig in or bisect it - I only know that
enabling most of the memory debuging config options didn't seem to
trip anything prior to the issue. So I wanted to send you a heads up
to see if there was already known, or if there was anything you might
suggest to help chase this down.

Its fairly easy to reproduce for me, so let me know if you have
anything you'd like me to try.

thanks
-john

console:/ $ [  170.530896] Unable to handle kernel read from
unreadable memory at virtual address 0000000000000120
[  170.540158] Mem abort info:
[  170.543092]   ESR = 0x96000005
[  170.546193]   Exception class = DABT (current EL), IL = 32 bits
[  170.552251]   SET = 0, FnV = 0
[  170.555444]   EA = 0, S1PTW = 0
[  170.558698] Data abort info:
[  170.561624]   ISV = 0, ISS = 0x00000005
[  170.565572]   CM = 0, WnR = 0
[  170.568650] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000190bb04e
[  170.575374] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
[  170.582297] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  170.587929] CPU: 7 PID: 663 Comm: kswapd0 Not tainted
4.17.0-11699-gb4f23f3 #411
[  170.595358] Hardware name: HiKey Development Board (DT)
[  170.600623] pstate: a0400005 (NzCv daif +PAN -UAO)
[  170.605478] pc : mem_cgroup_protected+0x34/0x120
[  170.610142] lr : shrink_node+0x120/0x478
[  170.614093] sp : ffffff8009d23c50
[  170.617438] x29: ffffff8009d23c50 x28: ffffff8009d23d48
[  170.622808] x27: ffffffc074ca1000 x26: ffffff8009d23e28
[  170.628160] x25: ffffff8009d23d88 x24: 0000000000000000
[  170.633481] x23: 0000000000000000 x22: ffffff8009071f80
[  170.638802] x21: 0000000000000012 x20: 0000000000000012
[  170.644124] x19: 0000000000000000 x18: 0000000000000400
[  170.649444] x17: 0000000000000000 x16: ffffffc074ca2000
[  170.654765] x15: 0000000000000000 x14: 0000000000000400
[  170.660087] x13: 00000000000000b1 x12: 0000000000000003
[  170.665408] x11: 0000000000000020 x10: 0000000000000000
[  170.670729] x9 : 0000000000000001 x8 : 0000000000000004
[  170.676050] x7 : ffffffc074d43c00 x6 : 0000000000000000
[  170.681370] x5 : 0000000000000000 x4 : 0000000000000000
[  170.686690] x3 : 000000000000dafa x2 : 0000000000000000
[  170.692010] x1 : ffffffc074ca1000 x0 : ffffffc0386e8000
[  170.697335] Process kswapd0 (pid: 663, stack limit = 0x00000000e0f0ae51)
[  170.704039] Call trace:
[  170.706497]  mem_cgroup_protected+0x34/0x120
[  170.710775]  balance_pgdat+0x1cc/0x418
[  170.714529]  kswapd+0x180/0x3b8
[  170.717674]  kthread+0xf8/0x128
[  170.720824]  ret_from_fork+0x10/0x18
[  170.724411] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
[  170.730542] ---[ end trace 7c961b6d409886f1 ]---
[  170.839299] Kernel panic - not syncing: Fatal exception
[  170.844549] SMP: stopping secondary CPUs
[  170.848488] Kernel Offset: disabled
[  170.851982] CPU features: 0x24802004
[  170.855556] Memory Limit: none
[  170.888494] Rebooting in 5 seconds..




console:/ # [  348.612152] Unable to handle kernel read from
unreadable memory at virtual address 0000000000000120
[  348.617384] Unable to handle kernel access to user memory outside
uaccess routines at virtual address 0000000000000120
[  348.621360] Mem abort info:
[  348.632086] Mem abort info:
[  348.634870]   ESR = 0x96000005
[  348.634885]   Exception class = DABT (current EL), IL = 32 bits
[  348.637686]   ESR = 0x96000005
[  348.640785]   SET = 0, FnV = 0
[  348.646740]   Exception class = DABT (current EL), IL = 32 bits
[  348.649799]   EA = 0, S1PTW = 0
[  348.652892]   SET = 0, FnV = 0
[  348.652901]   EA = 0, S1PTW = 0
[  348.652913] Data abort info:
[  348.658905] Data abort info:
[  348.662041]   ISV = 0, ISS = 0x00000005
[  348.662050]   CM = 0, WnR = 0
[  348.662071] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000697cecc4
[  348.665129]   ISV = 0, ISS = 0x00000005
[  348.668298] [0000000000000120] pgd=000000003a915003, pud=000000003a915003
[  348.671224]   CM = 0, WnR = 0
[  348.671242] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c568bd29
[  348.674193] , pmd=0000000000000000
[  348.678021] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
[  348.691540] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  348.723733] CPU: 5 PID: 3246 Comm: CrRendererMain Not tainted
4.17.0-11699-gb4f23f3 #412
[  348.731857] Hardware name: HiKey Development Board (DT)
[  348.737121] pstate: a0400005 (NzCv daif +PAN -UAO)
[  348.741975] pc : mem_cgroup_protected+0x34/0x120
[  348.746640] lr : shrink_node+0x120/0x478
[  348.750590] sp : ffffff800ac9b8a0
[  348.753934] x29: ffffff800ac9b8a0 x28: ffffff800ac9b9d8
[  348.759304] x27: ffffffc071982480 x26: ffffff800ac9bb30
[  348.764673] x25: ffffff800ac9ba18 x24: 0000000000000000
[  348.770038] x23: 0000000000000000 x22: ffffff8009113d00
[  348.775404] x21: 000000000000000f x20: 000000000000000f
[  348.780769] x19: 0000000000000000 x18: 0000000000000000
[  348.786134] x17: 0000000000000000 x16: ffffffc071985a80
[  348.791500] x15: 0000000000000000 x14: 00000000d5e75c2f
[  348.796868] x13: 00000000d7237d18 x12: 0000000000000003
[  348.802233] x11: 0000000000000020 x10: 0000000000000000
[  348.807598] x9 : 0000000000000001 x8 : 0000000000000004
[  348.812963] x7 : ffffffc072d58c80 x6 : 0000000000000000
[  348.818311] x5 : 0000000000000000 x4 : 0000000000000000
[  348.823626] x3 : 000000000000e1fc x2 : 0000000000000000
[  348.828941] x1 : ffffffc071982480 x0 : ffffffc038700080
[  348.834258] Process CrRendererMain (pid: 3246, stack limit =
0x00000000b82069c1)
[  348.841652] Call trace:
[  348.844100]  mem_cgroup_protected+0x34/0x120
[  348.848370]  do_try_to_free_pages+0xd0/0x3c0
[  348.852639]  try_to_free_pages+0xf8/0x120
[  348.856651]  __alloc_pages_nodemask+0x460/0xb68
[  348.861181]  do_huge_pmd_anonymous_page+0x328/0x7d8
[  348.866061]  __handle_mm_fault+0x57c/0xea0
[  348.870157]  handle_mm_fault+0x128/0x1f8
[  348.874082]  do_page_fault+0x1d0/0x490
[  348.877830]  do_translation_fault+0x5c/0x68
[  348.882012]  do_mem_abort+0x54/0x118
[  348.885587]  el0_da+0x20/0x24
[  348.888557] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
[  348.894651] ---[ end trace 58afd90183767ac2 ]---
[  348.942150] Kernel panic - not syncing: Fatal exception
[  348.947448] SMP: stopping secondary CPUs
[  349.784747] SMP: failed to stop secondary CPUs 2,5
[  349.789569] Kernel Offset: disabled
[  349.793089] CPU features: 0x24802004
[  349.796691] Memory Limit: none
[  349.909567] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPSes in mem_cgroup_protected
  2018-06-13  1:02 OOPSes in mem_cgroup_protected John Stultz
@ 2018-06-13  4:08 ` John Stultz
  2018-06-13  4:33   ` Roman Gushchin
  0 siblings, 1 reply; 5+ messages in thread
From: John Stultz @ 2018-06-13  4:08 UTC (permalink / raw)
  To: Tejun Heo, Johannes Weiner, Michal Hocko, Roman Gushchin; +Cc: lkml

On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@linaro.org> wrote:
> Hey Tejun,
>   With the current linus/master, I'm able to fairly regularly trip
> OOPSes (two examples below) in mem_cgroup_protected(), which seems to
> be new.  I haven't managed to trigger this sort of thing with v4.17.
>
> I've not had much time to dig in or bisect it - I only know that
> enabling most of the memory debuging config options didn't seem to
> trip anything prior to the issue. So I wanted to send you a heads up
> to see if there was already known, or if there was anything you might
> suggest to help chase this down.


So the line where we're crashing seems to be in mem_cgroup_protected():
  parent_emin = READ_ONCE(parent->memory.emin);

where I'm guessing the parent->memory value is null, and emin is at
the 0x120 offset in the strucutre.

Reverting the following commits seems to avoid the issue.
bf8d5d52ffe8 ("memcg: introduce memory.min")
5f93ad67436b ("mm: treat memory.low value inclusive")
230671533d64 ("mm: memory.low hierarchical behavior")

I'm guessing I'm tripping over some path where the memory value never
gets initialized?

Any ideas or suggestions?

thanks
-john

(usually I'd trim the backtraces below, but keeping them as I added
Roman to the CC list)

> console:/ $ [  170.530896] Unable to handle kernel read from
> unreadable memory at virtual address 0000000000000120
> [  170.540158] Mem abort info:
> [  170.543092]   ESR = 0x96000005
> [  170.546193]   Exception class = DABT (current EL), IL = 32 bits
> [  170.552251]   SET = 0, FnV = 0
> [  170.555444]   EA = 0, S1PTW = 0
> [  170.558698] Data abort info:
> [  170.561624]   ISV = 0, ISS = 0x00000005
> [  170.565572]   CM = 0, WnR = 0
> [  170.568650] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000190bb04e
> [  170.575374] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
> [  170.582297] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  170.587929] CPU: 7 PID: 663 Comm: kswapd0 Not tainted
> 4.17.0-11699-gb4f23f3 #411
> [  170.595358] Hardware name: HiKey Development Board (DT)
> [  170.600623] pstate: a0400005 (NzCv daif +PAN -UAO)
> [  170.605478] pc : mem_cgroup_protected+0x34/0x120
> [  170.610142] lr : shrink_node+0x120/0x478
> [  170.614093] sp : ffffff8009d23c50
> [  170.617438] x29: ffffff8009d23c50 x28: ffffff8009d23d48
> [  170.622808] x27: ffffffc074ca1000 x26: ffffff8009d23e28
> [  170.628160] x25: ffffff8009d23d88 x24: 0000000000000000
> [  170.633481] x23: 0000000000000000 x22: ffffff8009071f80
> [  170.638802] x21: 0000000000000012 x20: 0000000000000012
> [  170.644124] x19: 0000000000000000 x18: 0000000000000400
> [  170.649444] x17: 0000000000000000 x16: ffffffc074ca2000
> [  170.654765] x15: 0000000000000000 x14: 0000000000000400
> [  170.660087] x13: 00000000000000b1 x12: 0000000000000003
> [  170.665408] x11: 0000000000000020 x10: 0000000000000000
> [  170.670729] x9 : 0000000000000001 x8 : 0000000000000004
> [  170.676050] x7 : ffffffc074d43c00 x6 : 0000000000000000
> [  170.681370] x5 : 0000000000000000 x4 : 0000000000000000
> [  170.686690] x3 : 000000000000dafa x2 : 0000000000000000
> [  170.692010] x1 : ffffffc074ca1000 x0 : ffffffc0386e8000
> [  170.697335] Process kswapd0 (pid: 663, stack limit = 0x00000000e0f0ae51)
> [  170.704039] Call trace:
> [  170.706497]  mem_cgroup_protected+0x34/0x120
> [  170.710775]  balance_pgdat+0x1cc/0x418
> [  170.714529]  kswapd+0x180/0x3b8
> [  170.717674]  kthread+0xf8/0x128
> [  170.720824]  ret_from_fork+0x10/0x18
> [  170.724411] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
> [  170.730542] ---[ end trace 7c961b6d409886f1 ]---
> [  170.839299] Kernel panic - not syncing: Fatal exception
> [  170.844549] SMP: stopping secondary CPUs
> [  170.848488] Kernel Offset: disabled
> [  170.851982] CPU features: 0x24802004
> [  170.855556] Memory Limit: none
> [  170.888494] Rebooting in 5 seconds..
>
>
>
>
> console:/ # [  348.612152] Unable to handle kernel read from
> unreadable memory at virtual address 0000000000000120
> [  348.617384] Unable to handle kernel access to user memory outside
> uaccess routines at virtual address 0000000000000120
> [  348.621360] Mem abort info:
> [  348.632086] Mem abort info:
> [  348.634870]   ESR = 0x96000005
> [  348.634885]   Exception class = DABT (current EL), IL = 32 bits
> [  348.637686]   ESR = 0x96000005
> [  348.640785]   SET = 0, FnV = 0
> [  348.646740]   Exception class = DABT (current EL), IL = 32 bits
> [  348.649799]   EA = 0, S1PTW = 0
> [  348.652892]   SET = 0, FnV = 0
> [  348.652901]   EA = 0, S1PTW = 0
> [  348.652913] Data abort info:
> [  348.658905] Data abort info:
> [  348.662041]   ISV = 0, ISS = 0x00000005
> [  348.662050]   CM = 0, WnR = 0
> [  348.662071] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000697cecc4
> [  348.665129]   ISV = 0, ISS = 0x00000005
> [  348.668298] [0000000000000120] pgd=000000003a915003, pud=000000003a915003
> [  348.671224]   CM = 0, WnR = 0
> [  348.671242] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c568bd29
> [  348.674193] , pmd=0000000000000000
> [  348.678021] [0000000000000120] pgd=0000000000000000, pud=0000000000000000
> [  348.691540] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  348.723733] CPU: 5 PID: 3246 Comm: CrRendererMain Not tainted
> 4.17.0-11699-gb4f23f3 #412
> [  348.731857] Hardware name: HiKey Development Board (DT)
> [  348.737121] pstate: a0400005 (NzCv daif +PAN -UAO)
> [  348.741975] pc : mem_cgroup_protected+0x34/0x120
> [  348.746640] lr : shrink_node+0x120/0x478
> [  348.750590] sp : ffffff800ac9b8a0
> [  348.753934] x29: ffffff800ac9b8a0 x28: ffffff800ac9b9d8
> [  348.759304] x27: ffffffc071982480 x26: ffffff800ac9bb30
> [  348.764673] x25: ffffff800ac9ba18 x24: 0000000000000000
> [  348.770038] x23: 0000000000000000 x22: ffffff8009113d00
> [  348.775404] x21: 000000000000000f x20: 000000000000000f
> [  348.780769] x19: 0000000000000000 x18: 0000000000000000
> [  348.786134] x17: 0000000000000000 x16: ffffffc071985a80
> [  348.791500] x15: 0000000000000000 x14: 00000000d5e75c2f
> [  348.796868] x13: 00000000d7237d18 x12: 0000000000000003
> [  348.802233] x11: 0000000000000020 x10: 0000000000000000
> [  348.807598] x9 : 0000000000000001 x8 : 0000000000000004
> [  348.812963] x7 : ffffffc072d58c80 x6 : 0000000000000000
> [  348.818311] x5 : 0000000000000000 x4 : 0000000000000000
> [  348.823626] x3 : 000000000000e1fc x2 : 0000000000000000
> [  348.828941] x1 : ffffffc071982480 x0 : ffffffc038700080
> [  348.834258] Process CrRendererMain (pid: 3246, stack limit =
> 0x00000000b82069c1)
> [  348.841652] Call trace:
> [  348.844100]  mem_cgroup_protected+0x34/0x120
> [  348.848370]  do_try_to_free_pages+0xd0/0x3c0
> [  348.852639]  try_to_free_pages+0xf8/0x120
> [  348.856651]  __alloc_pages_nodemask+0x460/0xb68
> [  348.861181]  do_huge_pmd_anonymous_page+0x328/0x7d8
> [  348.866061]  __handle_mm_fault+0x57c/0xea0
> [  348.870157]  handle_mm_fault+0x128/0x1f8
> [  348.874082]  do_page_fault+0x1d0/0x490
> [  348.877830]  do_translation_fault+0x5c/0x68
> [  348.882012]  do_mem_abort+0x54/0x118
> [  348.885587]  el0_da+0x20/0x24
> [  348.888557] Code: b40007a2 d103e042 eb02001f 540006c0 (f9409046)
> [  348.894651] ---[ end trace 58afd90183767ac2 ]---
> [  348.942150] Kernel panic - not syncing: Fatal exception
> [  348.947448] SMP: stopping secondary CPUs
> [  349.784747] SMP: failed to stop secondary CPUs 2,5
> [  349.789569] Kernel Offset: disabled
> [  349.793089] CPU features: 0x24802004
> [  349.796691] Memory Limit: none
> [  349.909567] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPSes in mem_cgroup_protected
  2018-06-13  4:08 ` John Stultz
@ 2018-06-13  4:33   ` Roman Gushchin
  2018-06-13 19:40     ` John Stultz
  0 siblings, 1 reply; 5+ messages in thread
From: Roman Gushchin @ 2018-06-13  4:33 UTC (permalink / raw)
  To: John Stultz; +Cc: Tejun Heo, Johannes Weiner, Michal Hocko, lkml

On Tue, Jun 12, 2018 at 09:08:27PM -0700, John Stultz wrote:
> On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@linaro.org> wrote:
> > Hey Tejun,
> >   With the current linus/master, I'm able to fairly regularly trip
> > OOPSes (two examples below) in mem_cgroup_protected(), which seems to
> > be new.  I haven't managed to trigger this sort of thing with v4.17.
> >
> > I've not had much time to dig in or bisect it - I only know that
> > enabling most of the memory debuging config options didn't seem to
> > trip anything prior to the issue. So I wanted to send you a heads up
> > to see if there was already known, or if there was anything you might
> > suggest to help chase this down.
> 
> 
> So the line where we're crashing seems to be in mem_cgroup_protected():
>   parent_emin = READ_ONCE(parent->memory.emin);
> 
> where I'm guessing the parent->memory value is null, and emin is at
> the 0x120 offset in the strucutre.
> 
> Reverting the following commits seems to avoid the issue.
> bf8d5d52ffe8 ("memcg: introduce memory.min")
> 5f93ad67436b ("mm: treat memory.low value inclusive")
> 230671533d64 ("mm: memory.low hierarchical behavior")
> 
> I'm guessing I'm tripping over some path where the memory value never
> gets initialized?
> 
> Any ideas or suggestions?

Hi, John!

The patch below should fix the problem.
It's in the mm tree right now, and hopefully will be merged upstream asap.
Sorry for the inconvenience.

Thanks!

--

From 276e916d62887b85c35a9d053543bb52b00a81bf Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Wed, 13 Jun 2018 01:01:43 +0000
Subject: [PATCH] mm: fix null pointer dereference in mem_cgroup_protected

Shakeel reported a crash in mem_cgroup_protected(), which can be triggered
by memcg reclaim if the legacy cgroup v1 use_hierarchy=0 mode is used:

[  226.060572] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000120
[  226.068310] PGD 8000001ff55da067 P4D 8000001ff55da067 PUD 1fdc7df067 PMD 0
[  226.075191] Oops: 0000 [#4] SMP PTI
[  226.078637] CPU: 0 PID: 15581 Comm: bash Tainted: G      D
 4.17.0-smp-clean #5
[  226.086635] Hardware name: ...
[  226.094546] RIP: 0010:mem_cgroup_protected+0x54/0x130
[  226.099533] Code: 4c 8b 8e 00 01 00 00 4c 8b 86 08 01 00 00 48 8d
8a 08 ff ff ff 48 85 d2 ba 00 00 00 00 48 0f 44 ca 48 39 c8 0f 84 cf
00 00 00 <48> 8b 81 20 01 00 00 4d 89 ca 4c 39 c8 4c 0f 46 d0 4d 85 d2
74 05
[  226.118194] RSP: 0000:ffffabe64dfafa58 EFLAGS: 00010286
[  226.123358] RAX: ffff9fb6ff03d000 RBX: ffff9fb6f5b1b000 RCX: 0000000000000000
[  226.130406] RDX: 0000000000000000 RSI: ffff9fb6f5b1b000 RDI: ffff9fb6f5b1b000
[  226.137454] RBP: ffffabe64dfafb08 R08: 0000000000000000 R09: 0000000000000000
[  226.144503] R10: 0000000000000000 R11: 000000000000c800 R12: ffffabe64dfafb88
[  226.151551] R13: ffff9fb6f5b1b000 R14: ffffabe64dfafb88 R15: ffff9fb77fffe000
[  226.158602] FS:  00007fed1f8ac700(0000) GS:ffff9fb6ff400000(0000)
knlGS:0000000000000000
[  226.166594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  226.172270] CR2: 0000000000000120 CR3: 0000001fdcf86003 CR4: 00000000001606f0
[  226.179317] Call Trace:
[  226.181732]  ? shrink_node+0x194/0x510
[  226.185435]  do_try_to_free_pages+0xfd/0x390
[  226.189653]  try_to_free_mem_cgroup_pages+0x123/0x210
[  226.194643]  try_charge+0x19e/0x700
[  226.198088]  mem_cgroup_try_charge+0x10b/0x1a0
[  226.202478]  wp_page_copy+0x134/0x5b0
[  226.206094]  do_wp_page+0x90/0x460
[  226.209453]  __handle_mm_fault+0x8e3/0xf30
[  226.213498]  handle_mm_fault+0xfe/0x220
[  226.217285]  __do_page_fault+0x262/0x500
[  226.221158]  do_page_fault+0x28/0xd0
[  226.224689]  ? page_fault+0x8/0x30
[  226.228048]  page_fault+0x1e/0x30
[  226.231323] RIP: 0033:0x485b72

The problem happens because parent_mem_cgroup() returns a NULL pointer,
which is dereferenced later without a check.

As cgroup v1 has no memory guarantee support, let's make
mem_cgroup_protected() immediately return MEMCG_PROT_NONE, if the given
cgroup has no parent (non-hierarchical mode is used).

Link: http://lkml.kernel.org/r/20180611175418.7007-2-guro@fb.com
Fixes: bf8d5d52ffe8 ("memcg: introduce memory.min")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Shakeel Butt <shakeelb@google.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c1e64d60ed02..5a3873e9d657 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5480,6 +5480,10 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
 	elow = memcg->memory.low;
 
 	parent = parent_mem_cgroup(memcg);
+	/* No parent means a non-hierarchical mode on v1 memcg */
+	if (!parent)
+		return MEMCG_PROT_NONE;
+
 	if (parent == root)
 		goto exit;
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: OOPSes in mem_cgroup_protected
  2018-06-13  4:33   ` Roman Gushchin
@ 2018-06-13 19:40     ` John Stultz
  2018-06-13 19:50       ` Roman Gushchin
  0 siblings, 1 reply; 5+ messages in thread
From: John Stultz @ 2018-06-13 19:40 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: Tejun Heo, Johannes Weiner, Michal Hocko, lkml

On Tue, Jun 12, 2018 at 9:33 PM, Roman Gushchin <guro@fb.com> wrote:
> On Tue, Jun 12, 2018 at 09:08:27PM -0700, John Stultz wrote:
>> On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@linaro.org> wrote:
>> > Hey Tejun,
>> >   With the current linus/master, I'm able to fairly regularly trip
>> > OOPSes (two examples below) in mem_cgroup_protected(), which seems to
>> > be new.  I haven't managed to trigger this sort of thing with v4.17.
>> >
>> > I've not had much time to dig in or bisect it - I only know that
>> > enabling most of the memory debuging config options didn't seem to
>> > trip anything prior to the issue. So I wanted to send you a heads up
>> > to see if there was already known, or if there was anything you might
>> > suggest to help chase this down.
>>
>>
>> So the line where we're crashing seems to be in mem_cgroup_protected():
>>   parent_emin = READ_ONCE(parent->memory.emin);
>>
>> where I'm guessing the parent->memory value is null, and emin is at
>> the 0x120 offset in the strucutre.
>>
>> Reverting the following commits seems to avoid the issue.
>> bf8d5d52ffe8 ("memcg: introduce memory.min")
>> 5f93ad67436b ("mm: treat memory.low value inclusive")
>> 230671533d64 ("mm: memory.low hierarchical behavior")
>>
>> I'm guessing I'm tripping over some path where the memory value never
>> gets initialized?
>>
>> Any ideas or suggestions?
>
> Hi, John!
>
> The patch below should fix the problem.
> It's in the mm tree right now, and hopefully will be merged upstream asap.
> Sorry for the inconvenience.

No worries, thanks for the quick fix! The patch you sent seems to be
working well!

Thanks again!
-john

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: OOPSes in mem_cgroup_protected
  2018-06-13 19:40     ` John Stultz
@ 2018-06-13 19:50       ` Roman Gushchin
  0 siblings, 0 replies; 5+ messages in thread
From: Roman Gushchin @ 2018-06-13 19:50 UTC (permalink / raw)
  To: John Stultz; +Cc: Tejun Heo, Johannes Weiner, Michal Hocko, lkml

On Wed, Jun 13, 2018 at 12:40:23PM -0700, John Stultz wrote:
> On Tue, Jun 12, 2018 at 9:33 PM, Roman Gushchin <guro@fb.com> wrote:
> > On Tue, Jun 12, 2018 at 09:08:27PM -0700, John Stultz wrote:
> >> On Tue, Jun 12, 2018 at 6:02 PM, John Stultz <john.stultz@linaro.org> wrote:
> >> > Hey Tejun,
> >> >   With the current linus/master, I'm able to fairly regularly trip
> >> > OOPSes (two examples below) in mem_cgroup_protected(), which seems to
> >> > be new.  I haven't managed to trigger this sort of thing with v4.17.
> >> >
> >> > I've not had much time to dig in or bisect it - I only know that
> >> > enabling most of the memory debuging config options didn't seem to
> >> > trip anything prior to the issue. So I wanted to send you a heads up
> >> > to see if there was already known, or if there was anything you might
> >> > suggest to help chase this down.
> >>
> >>
> >> So the line where we're crashing seems to be in mem_cgroup_protected():
> >>   parent_emin = READ_ONCE(parent->memory.emin);
> >>
> >> where I'm guessing the parent->memory value is null, and emin is at
> >> the 0x120 offset in the strucutre.
> >>
> >> Reverting the following commits seems to avoid the issue.
> >> bf8d5d52ffe8 ("memcg: introduce memory.min")
> >> 5f93ad67436b ("mm: treat memory.low value inclusive")
> >> 230671533d64 ("mm: memory.low hierarchical behavior")
> >>
> >> I'm guessing I'm tripping over some path where the memory value never
> >> gets initialized?
> >>
> >> Any ideas or suggestions?
> >
> > Hi, John!
> >
> > The patch below should fix the problem.
> > It's in the mm tree right now, and hopefully will be merged upstream asap.
> > Sorry for the inconvenience.
> 
> No worries, thanks for the quick fix! The patch you sent seems to be
> working well!

Perfect, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-13 19:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-13  1:02 OOPSes in mem_cgroup_protected John Stultz
2018-06-13  4:08 ` John Stultz
2018-06-13  4:33   ` Roman Gushchin
2018-06-13 19:40     ` John Stultz
2018-06-13 19:50       ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).