linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naresh Kamboju <naresh.kamboju@linaro.org>
To: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 linux-mm <linux-mm@kvack.org>,
	open list <linux-kernel@vger.kernel.org>,
	 lkft-triage@lists.linaro.org, Chris Down <chris@chrisdown.name>
Subject: Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup
Date: Thu, 2 Jul 2020 12:22:03 +0530	[thread overview]
Message-ID: <CA+G9fYvC_+Gmab=b5pJW+5m1cR=JnLKLuMfxh3p-j8XKBzseSA@mail.gmail.com> (raw)
In-Reply-To: <20200701184552.GA61684@carbon.DHCP.thefacebook.com>

On Thu, 2 Jul 2020 at 00:16, Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote:
> > Smells like a different observable problem with the same/similar culprit
> > as http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=xRQ@mail.gmail.com
> >
> > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote:
> > > While running LTP mm test suite on x86_64 device the BUG: Bad page
> > > state in process
> > > noticed on linux-next 20200630 tag.
> > >
> > > Steps to reproduce:
> > > - boot linux-next 20200630 kernel on x86_64 device
> > > - cd /opt/ltp
> > > - ./runltp -f mm
> > >
> > > metadata:
> > >   git branch: master
> > >   git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > >   git commit: f2b92b14533e646e434523abdbafddb727c23898
> > >   git describe: next-20200630
> > >   kernel-config:
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e=
> > >
> > > Test crash dump:
> > > [  803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB
> > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE)
> > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB
> > > [  803.922375] Node 0 hugepages_total=0 hugepages_free=0
> > > hugepages_surp=0 hugepages_size=2048kB
> > > [  803.930806] 2418 total pagecache pages
> > > [  803.934559] 0 pages in swap cache
> > > [  803.937878] Swap cache stats: add 0, delete 0, find 0/0
> > > [  803.943108] Free swap  = 0kB
> > > [  803.945997] Total swap = 0kB
> > > [  803.948885] 4181245 pages RAM
> > > [  803.951857] 0 pages HighMem/MovableOnly
> > > [  803.955695] 626062 pages reserved
> > > [  803.959016] Tasks state (memory values in pages):
> > > [  803.963722] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes
> > > swapents oom_score_adj name
> > > [  803.972336] [    332]     0   332     8529      507   106496
> > > 0             0 systemd-journal
> > > [  803.981387] [    349]     0   349    10730      508   118784
> > > 0         -1000 systemd-udevd
> > > [  803.990262] [    371]   993   371     8666      108   118784
> > > 0             0 systemd-network
> > > [  803.999306] [    379]   992   379     9529       99   110592
> > > 0             0 systemd-resolve
> > > [  804.008347] [    388]     0   388     2112       19    61440
> > > 0             0 syslogd
> > > [  804.016709] [    389]   995   389     9308      108   122880
> > > 0             0 avahi-daemon
> > > [  804.025517] [    391]     0   391     1075       21    57344
> > > 0             0 acpid
> > > [  804.033695] [    394]   995   394     9277       68   114688
> > > 0             0 avahi-daemon
> > > [  804.042476] [    396]   996   396     7241      154   102400
> > > 0          -900 dbus-daemon
> > > [  804.051170] [    397]     0   397     2313       72    65536
> > > 0             0 crond
> > > [  804.059349] [    399]     0   399    34025      161   167936
> > > 0             0 thermald
> > > [  804.067783] [    400]     0   400     8615      115   110592
> > > 0             0 systemd-logind
> > > [  804.076734] [    401]     0   401     2112       32    57344
> > > 0             0 klogd
> > > [  804.084907] [    449] 65534   449     3245       39    69632
> > > 0             0 dnsmasq
> > > [  804.093254] [    450]     0   450     3187       33    73728
> > > 0             0 agetty
> > > [  804.101541] [    452]     0   452     3187       33    73728
> > > 0             0 agetty
> > > [  804.109826] [    453]     0   453    14707      107   159744
> > > 0             0 login
> > > [  804.118007] [    463]     0   463     9532      163   122880
> > > 0             0 systemd
> > > [  804.126362] [    464]     0   464    16132      424   172032
> > > 0             0 (sd-pam)
> > > [  804.134803] [    468]     0   468     4538      105    81920
> > > 0             0 sh
> > > [  804.142741] [    472]     0   472    11102       83   131072
> > > 0             0 su
> > > [  804.150680] [    473]     0   473     4538       99    81920
> > > 0             0 sh
> > > [  804.158637] [    519]     0   519     2396       57    61440
> > > 0             0 lava-test-runne
> > > [  804.167700] [   1220]     0  1220     2396       52    61440
> > > 0             0 lava-test-shell
> > > [  804.176738] [   1221]     0  1221     2396       55    61440
> > > 0             0 sh
> > > [  804.184680] [   1223]     0  1223     2462      135    61440
> > > 0             0 ltp.sh
> > > [  804.192946] [   1242]     0  1242     2462      134    61440
> > > 0             0 ltp.sh
> > > [  804.201207] [   1243]     0  1243     2462      134    61440
> > > 0             0 ltp.sh
> > > [  804.209475] [   1244]     0  1244     2462      134    61440
> > > 0             0 ltp.sh
> > > [  804.217742] [   1245]     0  1245     2561      229    65536
> > > 0             0 runltp
> > > [  804.226010] [   1246]     0  1246     1072       15    53248
> > > 0             0 tee
> > > [  804.234012] [   1313]     0  1313     1070       29    53248
> > > 0             0 ltp-pan
> > > [  804.242374] [   3216]     0  3216     1613       20    53248
> > > 0             0 oom01
> > > [  804.250554] [   3217]     0  3217     1646       31    57344
> > > 0             0 oom01
> > > [  804.258728] [   3245]     0  3245    81271      469   266240
> > > 0             0 NetworkManager
> > > [  804.267688] [   3249]     0  3249     6422       54    98304
> > > 0             0 systemd-hostnam
> > > [  804.276734] [   3250]     0  3250    52976      178   172032
> > > 0             0 nm-dispatcher
> > > [  804.285603] [   3254]   998  3254   131113      828   245760
> > > 0             0 polkitd
> > > [  804.293956] [   3261]     0  3261  4726385  3349389 26939392
> > > 0             0 oom01
> > > [  804.302129] [   3265]     0  3265     3187       33    73728
> > > 0             0 agetty
> > > [  804.310397] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=oom01,pid=3261,uid=0
> > > [  804.322751] Out of memory: Killed process 3261 (oom01)
> > > total-vm:18905540kB, anon-rss:13397556kB, file-rss:0kB, shmem-rss:0kB,
> > > UID:0 pgtables:26308kB oom_score_adj:0
> > > [  806.652952] oom_reaper: reaped process 3261 (oom01), now
> > > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > > [  807.579373] BUG: Bad page state in process kworker/u8:12  pfn:374308
> > > [  807.579521] BUG: Bad page state in process kworker/u8:13  pfn:4182a4
> > > [  807.585734] page:ffffea000dd0c200 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0xffff88837430e000
> > > head:ffffea000dd0c200 order:3 compound_mapcount:0 compound_pincount:0
> > > [  807.585736] flags: 0x200000000010000(head)
> > > [  807.585740] raw: 0200000000010000 ffffea000dce6e00 0000000200000002
> > > 0000000000000000
> > > [  807.592099] page:ffffea001060a900 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0xffff8884182a5e00
> > > head:ffffea001060a900 order:1 compound_mapcount:0
> > > [  807.607719] raw: ffff88837430e000 0000000000040000 00000000ffffffff
> > > ffff8883bda6cac1
> > > [  807.607720] page dumped because: page still charged to cgroup
> > > [  807.607720] page->mem_cgroup:ffff8883bda6cac1
> > > [  807.607721] Modules linked in: x86_pkg_temp_thermal
> > > [  807.607725] CPU: 0 PID: 3242 Comm: kworker/u8:12 Not tainted
> > > 5.8.0-rc3-next-20200630 #1
> > > [  807.607727] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > > 2.0b 07/27/2017
> > > [  807.607731] Workqueue: rpciod rpc_async_schedule
> > > [  807.611836] flags: 0x200000000010000(head)
> > > [  807.619563] Call Trace:
> > > [  807.619567]  dump_stack+0x84/0xba
> > > [  807.619569]  bad_page.cold+0x7b/0xac
> > > [  807.619573]  __free_pages_ok+0x95b/0xab0
> > > [  807.633461] raw: 0200000000010000 dead000000000100 dead000000000122
> > > 0000000000000000
> > > [  807.641189]  __free_pages+0x42/0x50
> > > [  807.641191]  __free_slab+0xcd/0x1f0
>
> Hm, interesting, it means that page->obj_cgroups is still set.
> But before __free_pages() __free_slab() always calls uncharge_slab_page(),
> which sets page->obj_cgroups to NULL except when !memcg_kmem_enabled().
>
> So it makes me think that somehow memcg_kmem_enabled() became false
> after being true, which can cause refcounting problems as well.
>
> Naresh, can you, please, check if the following patch solves problems?
> And thank you for reporting the problem!
>

Thanks for providing a quick fix patch.

>
> From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001
> From: Roman Gushchin <guro@fb.com>
> Date: Wed, 1 Jul 2020 11:05:32 -0700
> Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible
>
> Historically the kernel memory accounting was an opt-in feature, which
> could be enabled for individual cgroups. But now it's not true, and
> it's on by default both on cgroup v1 and cgroup v2.  And as long as a
> user has at least one non-root memory cgroup, the kernel memory
> accounting is on. So in most setups it's either always on (if memory
> cgroups are in use and kmem accounting is not disabled), either always
> off (otherwise).
>
> memcg_kmem_enabled() is used in many places to guard the kernel memory
> accounting code. If memcg_kmem_enabled() can reverse from returning
> true to returning false (as now), we can't rely on it on release paths
> and have to check if it was on before.
>
> If we'll make memcg_kmem_enabled() irreversible (always returning true
> after returning it for the first time), it'll make the general logic
> more simple and robust. It also will allow to guard some checks which
> otherwise would stay unguarded.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

The below patch applied on top of linux-next 20200701and tested and
the reported bug got fixed on arm64, arm, x86_64 and i386.

> ---
>  mm/memcontrol.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 50ae77f3985e..2d018a51c941 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3582,7 +3582,8 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
>         objcg->memcg = memcg;
>         rcu_assign_pointer(memcg->objcg, objcg);
>
> -       static_branch_inc(&memcg_kmem_enabled_key);
> +       if (!memcg_kmem_enabled())
> +               static_branch_inc(&memcg_kmem_enabled_key);
>         /*
>          * A memory cgroup is considered kmem-online as soon as it gets
>          * kmemcg_id. Setting the id after enabling static branching will
> @@ -3643,9 +3644,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
>         /* css_alloc() failed, offlining didn't happen */
>         if (unlikely(memcg->kmem_state == KMEM_ONLINE))
>                 memcg_offline_kmem(memcg);
> -
> -       if (memcg->kmem_state == KMEM_ALLOCATED)
> -               static_branch_dec(&memcg_kmem_enabled_key);
>  }
>  #else
>  static int memcg_online_kmem(struct mem_cgroup *memcg)

- Naresh


  parent reply	other threads:[~2020-07-02  6:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-01  8:18 BUG: Bad page state in process - page dumped because: page still charged to cgroup Naresh Kamboju
2020-07-01  8:29 ` Michal Hocko
2020-07-01 12:31   ` Naresh Kamboju
2020-07-01 18:45   ` Roman Gushchin
2020-07-02  6:19     ` Michal Hocko
2020-07-02  6:52     ` Naresh Kamboju [this message]
2020-07-02 15:49       ` Roman Gushchin
2020-07-02 15:55         ` Naresh Kamboju
2020-07-02 15:59           ` Roman Gushchin
2020-07-02 16:02     ` Shakeel Butt
2020-07-02 16:22     ` Michal Hocko
2020-07-02 16:35       ` Vlastimil Babka
2020-07-02 17:07         ` Roman Gushchin
2020-07-02 17:10         ` Michal Hocko
2020-07-02 16:37       ` Roman Gushchin
2020-07-02 17:13         ` Michal Hocko
2020-07-02 17:19           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+G9fYvC_+Gmab=b5pJW+5m1cR=JnLKLuMfxh3p-j8XKBzseSA@mail.gmail.com' \
    --to=naresh.kamboju@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkft-triage@lists.linaro.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).