From: Qian Cai <cai@lca.pw> To: Johannes Weiner <hannes@cmpxchg.org>, Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrew Morton <akpm@linux-foundation.org>, Alex Shi <alex.shi@linux.alibaba.com>, Joonsoo Kim <js1304@gmail.com>, Shakeel Butt <shakeelb@google.com>, Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.com>, "Kirill A. Shutemov" <kirill@shutemov.name>, Roman Gushchin <guro@fb.com>, Linux-MM <linux-mm@kvack.org>, cgroups@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>, kernel-team@fb.com Subject: Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API Date: Tue, 12 May 2020 13:11:15 -0400 [thread overview] Message-ID: <76E2FF28-1CD8-4A96-B2E6-5EDF51F8E3AB@lca.pw> (raw) In-Reply-To: <45AA36A9-0C4D-49C2-BA3C-08753BBC30FB@lca.pw> > On May 12, 2020, at 10:38 AM, Qian Cai <cai@lca.pw> wrote: > > > >> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: >> >> With the page->mapping requirement gone from memcg, we can charge anon >> and file-thp pages in one single step, right after they're allocated. >> >> This removes two out of three API calls - especially the tricky commit >> step that needed to happen at just the right time between when the >> page is "set up" and when it's "published" - somewhat vague and fluid >> concepts that varied by page type. All we need is a freshly allocated >> page and a memcg context to charge. >> >> v2: prevent double charges on pre-allocated hugepages in khugepaged >> >> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> >> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> >> --- >> include/linux/mm.h | 4 +--- >> kernel/events/uprobes.c | 11 +++-------- >> mm/filemap.c | 2 +- >> mm/huge_memory.c | 9 +++------ >> mm/khugepaged.c | 35 ++++++++++------------------------- >> mm/memory.c | 36 ++++++++++-------------------------- >> mm/migrate.c | 5 +---- >> mm/swapfile.c | 6 +----- >> mm/userfaultfd.c | 5 +---- >> 9 files changed, 31 insertions(+), 82 deletions(-) > [] >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> >> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm, >> out_up_write: >> up_write(&mm->mmap_sem); >> out_nolock: >> + if (*hpage) >> + mem_cgroup_uncharge(*hpage); >> trace_mm_collapse_huge_page(mm, isolated, result); >> return; >> out: >> - mem_cgroup_cancel_charge(new_page, memcg); >> goto out_up_write; >> } > [] > > Some memory pressure will crash this new code. It looks like somewhat racy. Reverted the whole series fixed the crash, i.e., git revert --no-edit 6070efb8e52b..c986ddf58a95 There is a minor conflict during reverting due to another linux-next commit, 2a6b525f0de1 (“khugepaged: do not stop collapse if less than half PTEs are referenced”) which is trivial to resolve, --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@@ -1091,8 -1000,8 +1093,9 @@@ static void collapse_huge_page(struct m * If it fails, we release mmap_sem and jump out_nolock. * Continuing to collapse causes inconsistency. */ - if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) { + if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, + pmd, referenced)) { + mem_cgroup_cancel_charge(new_page, memcg, true); up_read(&mm->mmap_sem); goto out_nolock; } > > if (!page->mem_cgroup) > > where page == NULL in mem_cgroup_uncharge(). > > [ 2244.414421][ T726] BUG: Kernel NULL pointer dereference on read at 0x0000002c > [ 2244.414454][ T726] Faulting instruction address: 0xc0000000004f7e44 > [ 2244.414467][ T726] Oops: Kernel access of bad area, sig: 11 [#1] > [ 2244.414488][ T726] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV > [ 2244.414501][ T726] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci libphy mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod > [ 2244.414556][ T726] CPU: 11 PID: 726 Comm: khugepaged Not tainted 5.7.0-rc5-next-20200512+ #8 > [ 2244.414579][ T726] NIP: c0000000004f7e44 LR: c0000000004df95c CTR: c0000000001c1400 > [ 2244.414600][ T726] REGS: c000001a2398f6e0 TRAP: 0300 Not tainted (5.7.0-rc5-next-20200512+) > [ 2244.414630][ T726] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000244 XER: 20040000 > [ 2244.414656][ T726] CFAR: c0000000004df958 DAR: 000000000000002c DSISR: 40000000 IRQMASK: 0 > [ 2244.414656][ T726] GPR00: c0000000004df95c c000001a2398f970 c00000000168a700 fffffffffffffff4 > [ 2244.414656][ T726] GPR04: ffffffffffffffff c000000000bd0980 0000000000000005 0000000000000080 > [ 2244.414656][ T726] GPR08: 0000001ffc030000 0000000000000001 0000000000000000 c00000000152bb58 > [ 2244.414656][ T726] GPR12: 0000000024000222 c000001fffff5680 c0000001d818ce00 c0000001d818cd00 > [ 2244.414656][ T726] GPR16: 0000000000000000 c000001a2398fce0 fe7fffffffffefff fffffffffffffe7f > [ 2244.414656][ T726] GPR20: c000201320aa53c8 000000000000001e 0000000000000017 c00020047636b868 > [ 2244.414656][ T726] GPR24: 0000000000000000 0000000000000000 c000000001756080 c000001a2398fce0 > [ 2244.414656][ T726] GPR28: c000001a2398fa20 00007ffeeda00000 c000200f28547928 c000200f28547880 > [ 2244.414865][ T726] NIP [c0000000004f7e44] mem_cgroup_uncharge+0x34/0xb0 > mem_cgroup_uncharge at mm/memcontrol.c:6563 > [ 2244.414895][ T726] LR [c0000000004df95c] collapse_huge_page+0x24c/0x1000 > collapse_huge_page at mm/khugepaged.c:1197 > [ 2244.414924][ T726] Call Trace: > [ 2244.414940][ T726] [c000001a2398f970] [0000000000000001] 0x1 (unreliable) > [ 2244.414970][ T726] [c000001a2398f9c0] [c0000000004df814] collapse_huge_page+0x104/0x1000 > collapse_huge_page at mm/khugepaged.c:1064 (discriminator 10) > [ 2244.414991][ T726] [c000001a2398faf0] [c0000000004e0f84] khugepaged_scan_pmd+0x874/0xc70 > [ 2244.415021][ T726] [c000001a2398fbf0] [c0000000004e2a90] khugepaged+0x900/0x1920 > [ 2244.415043][ T726] [c000001a2398fdb0] [c000000000155aa4] kthread+0x1c4/0x1d0 > [ 2244.415075][ T726] [c000001a2398fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74 > [ 2244.415095][ T726] Instruction dump: > [ 2244.415113][ T726] 384228f0 7c0802a6 60000000 f821ffb1 e92d0c70 f9210048 39200000 3d22ffec > [ 2244.415146][ T726] 3929f9f4 81290000 2f890000 409d0048 <e9230038> 2fa90000 419e003c 7c0802a6 > [ 2244.415181][ T726] ---[ end trace 3488eb8818913a26 ]---
WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai-J5quhbR+WMc@public.gmane.org> To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>, Joonsoo Kim <js1304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>, "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>, Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>, Linux-MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, kernel-team-b10kYP2dOMg@public.gmane.org Subject: Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API Date: Tue, 12 May 2020 13:11:15 -0400 [thread overview] Message-ID: <76E2FF28-1CD8-4A96-B2E6-5EDF51F8E3AB@lca.pw> (raw) In-Reply-To: <45AA36A9-0C4D-49C2-BA3C-08753BBC30FB-J5quhbR+WMc@public.gmane.org> > On May 12, 2020, at 10:38 AM, Qian Cai <cai-J5quhbR+WMc@public.gmane.org> wrote: > > > >> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote: >> >> With the page->mapping requirement gone from memcg, we can charge anon >> and file-thp pages in one single step, right after they're allocated. >> >> This removes two out of three API calls - especially the tricky commit >> step that needed to happen at just the right time between when the >> page is "set up" and when it's "published" - somewhat vague and fluid >> concepts that varied by page type. All we need is a freshly allocated >> page and a memcg context to charge. >> >> v2: prevent double charges on pre-allocated hugepages in khugepaged >> >> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> >> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org> >> --- >> include/linux/mm.h | 4 +--- >> kernel/events/uprobes.c | 11 +++-------- >> mm/filemap.c | 2 +- >> mm/huge_memory.c | 9 +++------ >> mm/khugepaged.c | 35 ++++++++++------------------------- >> mm/memory.c | 36 ++++++++++-------------------------- >> mm/migrate.c | 5 +---- >> mm/swapfile.c | 6 +----- >> mm/userfaultfd.c | 5 +---- >> 9 files changed, 31 insertions(+), 82 deletions(-) > [] >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> >> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm, >> out_up_write: >> up_write(&mm->mmap_sem); >> out_nolock: >> + if (*hpage) >> + mem_cgroup_uncharge(*hpage); >> trace_mm_collapse_huge_page(mm, isolated, result); >> return; >> out: >> - mem_cgroup_cancel_charge(new_page, memcg); >> goto out_up_write; >> } > [] > > Some memory pressure will crash this new code. It looks like somewhat racy. Reverted the whole series fixed the crash, i.e., git revert --no-edit 6070efb8e52b..c986ddf58a95 There is a minor conflict during reverting due to another linux-next commit, 2a6b525f0de1 (“khugepaged: do not stop collapse if less than half PTEs are referenced”) which is trivial to resolve, --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@@ -1091,8 -1000,8 +1093,9 @@@ static void collapse_huge_page(struct m * If it fails, we release mmap_sem and jump out_nolock. * Continuing to collapse causes inconsistency. */ - if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) { + if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, + pmd, referenced)) { + mem_cgroup_cancel_charge(new_page, memcg, true); up_read(&mm->mmap_sem); goto out_nolock; } > > if (!page->mem_cgroup) > > where page == NULL in mem_cgroup_uncharge(). > > [ 2244.414421][ T726] BUG: Kernel NULL pointer dereference on read at 0x0000002c > [ 2244.414454][ T726] Faulting instruction address: 0xc0000000004f7e44 > [ 2244.414467][ T726] Oops: Kernel access of bad area, sig: 11 [#1] > [ 2244.414488][ T726] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV > [ 2244.414501][ T726] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci libphy mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod > [ 2244.414556][ T726] CPU: 11 PID: 726 Comm: khugepaged Not tainted 5.7.0-rc5-next-20200512+ #8 > [ 2244.414579][ T726] NIP: c0000000004f7e44 LR: c0000000004df95c CTR: c0000000001c1400 > [ 2244.414600][ T726] REGS: c000001a2398f6e0 TRAP: 0300 Not tainted (5.7.0-rc5-next-20200512+) > [ 2244.414630][ T726] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000244 XER: 20040000 > [ 2244.414656][ T726] CFAR: c0000000004df958 DAR: 000000000000002c DSISR: 40000000 IRQMASK: 0 > [ 2244.414656][ T726] GPR00: c0000000004df95c c000001a2398f970 c00000000168a700 fffffffffffffff4 > [ 2244.414656][ T726] GPR04: ffffffffffffffff c000000000bd0980 0000000000000005 0000000000000080 > [ 2244.414656][ T726] GPR08: 0000001ffc030000 0000000000000001 0000000000000000 c00000000152bb58 > [ 2244.414656][ T726] GPR12: 0000000024000222 c000001fffff5680 c0000001d818ce00 c0000001d818cd00 > [ 2244.414656][ T726] GPR16: 0000000000000000 c000001a2398fce0 fe7fffffffffefff fffffffffffffe7f > [ 2244.414656][ T726] GPR20: c000201320aa53c8 000000000000001e 0000000000000017 c00020047636b868 > [ 2244.414656][ T726] GPR24: 0000000000000000 0000000000000000 c000000001756080 c000001a2398fce0 > [ 2244.414656][ T726] GPR28: c000001a2398fa20 00007ffeeda00000 c000200f28547928 c000200f28547880 > [ 2244.414865][ T726] NIP [c0000000004f7e44] mem_cgroup_uncharge+0x34/0xb0 > mem_cgroup_uncharge at mm/memcontrol.c:6563 > [ 2244.414895][ T726] LR [c0000000004df95c] collapse_huge_page+0x24c/0x1000 > collapse_huge_page at mm/khugepaged.c:1197 > [ 2244.414924][ T726] Call Trace: > [ 2244.414940][ T726] [c000001a2398f970] [0000000000000001] 0x1 (unreliable) > [ 2244.414970][ T726] [c000001a2398f9c0] [c0000000004df814] collapse_huge_page+0x104/0x1000 > collapse_huge_page at mm/khugepaged.c:1064 (discriminator 10) > [ 2244.414991][ T726] [c000001a2398faf0] [c0000000004e0f84] khugepaged_scan_pmd+0x874/0xc70 > [ 2244.415021][ T726] [c000001a2398fbf0] [c0000000004e2a90] khugepaged+0x900/0x1920 > [ 2244.415043][ T726] [c000001a2398fdb0] [c000000000155aa4] kthread+0x1c4/0x1d0 > [ 2244.415075][ T726] [c000001a2398fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74 > [ 2244.415095][ T726] Instruction dump: > [ 2244.415113][ T726] 384228f0 7c0802a6 60000000 f821ffb1 e92d0c70 f9210048 39200000 3d22ffec > [ 2244.415146][ T726] 3929f9f4 81290000 2f890000 409d0048 <e9230038> 2fa90000 419e003c 7c0802a6 > [ 2244.415181][ T726] ---[ end trace 3488eb8818913a26 ]---
next prev parent reply other threads:[~2020-05-12 17:11 UTC|newest] Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner 2020-05-08 18:30 ` [PATCH 01/19] mm: fix NUMA node file count error in replace_page_cache() Johannes Weiner 2020-05-08 18:30 ` Johannes Weiner 2020-05-18 11:18 ` Balbir Singh 2020-05-08 18:30 ` [PATCH 02/19] mm: memcontrol: fix stat-corrupting race in charge moving Johannes Weiner 2020-05-08 18:30 ` [PATCH 03/19] mm: memcontrol: drop @compound parameter from memcg charging API Johannes Weiner 2020-05-08 18:30 ` Johannes Weiner 2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner 2020-05-08 18:30 ` [PATCH 05/19] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API Johannes Weiner 2020-06-10 16:09 ` Michal Hocko 2020-06-10 16:09 ` Michal Hocko 2020-05-08 18:30 ` [PATCH 06/19] mm: memcontrol: prepare uncharging for removal of private page type counters Johannes Weiner 2020-05-08 18:30 ` [PATCH 07/19] mm: memcontrol: prepare move_account " Johannes Weiner 2020-05-08 18:30 ` Johannes Weiner 2020-05-08 18:30 ` [PATCH 08/19] mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters Johannes Weiner 2020-05-08 18:30 ` [PATCH 09/19] mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters Johannes Weiner 2020-06-10 16:42 ` Michal Hocko 2020-06-10 16:42 ` Michal Hocko 2020-05-08 18:30 ` [PATCH 10/19] mm: memcontrol: switch to native NR_ANON_MAPPED counter Johannes Weiner 2020-05-08 18:30 ` Johannes Weiner 2020-05-08 18:30 ` [PATCH 11/19] mm: memcontrol: switch to native NR_ANON_THPS counter Johannes Weiner 2020-05-08 18:30 ` [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API Johannes Weiner 2020-05-08 18:30 ` Johannes Weiner 2020-05-12 14:38 ` Qian Cai 2020-05-12 14:38 ` Qian Cai 2020-05-12 17:11 ` Qian Cai [this message] 2020-05-12 17:11 ` Qian Cai 2020-05-12 21:58 ` Johannes Weiner 2020-05-12 23:58 ` Qian Cai 2020-05-12 23:58 ` Qian Cai 2020-05-08 18:31 ` [PATCH 13/19] mm: memcontrol: drop unused try/commit/cancel charge API Johannes Weiner 2020-05-08 18:31 ` Johannes Weiner 2020-06-22 17:06 ` Ben Widawsky 2020-06-22 17:06 ` Ben Widawsky 2020-05-08 18:31 ` [PATCH 14/19] mm: memcontrol: prepare swap controller setup for integration Johannes Weiner 2020-05-08 18:31 ` [PATCH 15/19] mm: memcontrol: make swap tracking an integral part of memory control Johannes Weiner 2020-05-08 18:31 ` Johannes Weiner 2020-05-08 18:31 ` [PATCH 16/19] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner 2020-05-08 18:31 ` Johannes Weiner 2020-06-11 9:35 ` Michal Hocko 2020-06-11 9:35 ` Michal Hocko 2020-06-17 8:49 ` [PATCH for 5.8] mm: do_swap_page fix up the error code instantiation Michal Hocko 2020-06-17 8:49 ` Michal Hocko 2020-06-17 9:02 ` Michal Hocko 2020-06-17 13:34 ` Johannes Weiner 2020-06-17 13:34 ` Johannes Weiner 2020-05-08 18:31 ` [PATCH 17/19] mm: memcontrol: document the new swap control behavior Johannes Weiner 2020-05-08 18:31 ` Johannes Weiner 2020-05-08 18:31 ` [PATCH 18/19] mm: memcontrol: delete unused lrucare handling Johannes Weiner 2020-05-08 18:31 ` [PATCH 19/19] mm: memcontrol: update page->mem_cgroup stability rules Johannes Weiner 2020-05-08 18:31 ` Johannes Weiner 2020-06-11 9:40 ` Michal Hocko 2020-06-11 9:40 ` Michal Hocko 2020-05-13 11:30 ` [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Balbir Singh 2020-05-13 12:35 ` Johannes Weiner 2020-05-13 12:35 ` Johannes Weiner 2020-05-14 11:04 ` Balbir Singh 2020-05-14 11:04 ` Balbir Singh
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=76E2FF28-1CD8-4A96-B2E6-5EDF51F8E3AB@lca.pw \ --to=cai@lca.pw \ --cc=akpm@linux-foundation.org \ --cc=alex.shi@linux.alibaba.com \ --cc=cgroups@vger.kernel.org \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=hughd@google.com \ --cc=js1304@gmail.com \ --cc=kernel-team@fb.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.com \ --cc=sfr@canb.auug.org.au \ --cc=shakeelb@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.