From: Masoud Sharbiani <msharbiani@apple.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: mhocko@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1.
Date: Thu, 01 Aug 2019 18:08:42 -0700 [thread overview]
Message-ID: <7EE30F16-A90B-47DC-A065-3C21881CD1CC@apple.com> (raw)
In-Reply-To: <20190801181952.GA8425@kroah.com>
> On Aug 1, 2019, at 11:19 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Aug 01, 2019 at 11:04:14AM -0700, Masoud Sharbiani wrote:
>> Hey folks,
>> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1.
>> It was introduced by
>>
>> 29ef680 memcg, oom: move out_of_memory back to the charge path
>>
>> The gist of it is that if you have a memory control group for a process that repeatedly maps all of the pages of a file with repeated calls to:
>>
>> mmap(NULL, pages * PAGE_SIZE, PROT_WRITE|PROT_READ, MAP_FILE|MAP_PRIVATE, fd, 0)
>>
>> The memory cg eventually runs out of memory, as it should. However,
>> prior to the 29ef680 commit, it would kill the running process with
>> OOM; After that commit ( and until 5.3-rc1; Haven’t pinpointed the
>> exact commit in between 5.2.0 and 5.3-rc1) the offending process goes
>> into %100 CPU usage, and doesn’t die (prior behavior) or fail the mmap
>> call (which is what happens if one runs the test program with a low
>> ulimit -v value).
>>
>> Any ideas on how to chase this down further?
>
> Finding the exact patch that fixes this would be great, as then I can
> add it to the 4.19 and 5.2 stable kernels (4.20 is long end-of-life, no
> idea why you are messing with that one...)
>
> thanks,
>
> greg k-h
Allow me to issue a correction:
Running this test on linux master <629f8205a6cc63d2e8e30956bad958a3507d018f> correctly terminates the leaker app with OOM.
However, running it a second time (after removing the memory cgroup, and allowing the test script to run it again), causes this:
kernel:watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [leaker1:7193]
[ 202.511024] CPU: 7 PID: 7193 Comm: leaker1 Not tainted 5.3.0-rc2+ #8
[ 202.517378] Hardware name: <redacted>
[ 202.525554] RIP: 0010:lruvec_lru_size+0x49/0xf0
[ 202.530085] Code: 41 89 ed b8 ff ff ff ff 45 31 f6 49 c1 e5 03 eb 19 48 63 d0 4c 89 e9 48 8b 14 d5 20 b7 11 b5 48 03 8b 88 00 00 00 4c 03 34 11 <48> c7 c6 80 c5 40 b5 89 c7 e8 29 a7 6f 00 3b 05 57 9d 24 01 72 d1
[ 202.548831] RSP: 0018:ffffa7c5480df620 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 202.556398] RAX: 0000000000000000 RBX: ffff8f5b7a1af800 RCX: 00003859bfa03bc0
[ 202.563528] RDX: ffff8f5b7f800000 RSI: 0000000000000018 RDI: ffffffffb540c580
[ 202.570662] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004
[ 202.577795] R10: ffff8f5b62548000 R11: 0000000000000000 R12: 0000000000000004
[ 202.584928] R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000
[ 202.592063] FS: 00007ff73d835740(0000) GS:ffff8f6b7f840000(0000) knlGS:0000000000000000
[ 202.600149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 202.605895] CR2: 00007f1b1c00e428 CR3: 0000001021d56006 CR4: 00000000001606e0
[ 202.613026] Call Trace:
[ 202.615475] shrink_node_memcg+0xdb/0x7a0
[ 202.619488] ? shrink_slab+0x266/0x2a0
[ 202.623242] ? mem_cgroup_iter+0x10a/0x2c0
[ 202.627337] shrink_node+0xdd/0x4c0
[ 202.630831] do_try_to_free_pages+0xea/0x3c0
[ 202.635104] try_to_free_mem_cgroup_pages+0xf5/0x1e0
[ 202.640068] try_charge+0x279/0x7a0
[ 202.643565] mem_cgroup_try_charge+0x51/0x1a0
[ 202.647925] __add_to_page_cache_locked+0x19f/0x330
[ 202.652800] ? __mod_lruvec_state+0x40/0xe0
[ 202.656987] ? scan_shadow_nodes+0x30/0x30
[ 202.661086] add_to_page_cache_lru+0x49/0xd0
[ 202.665361] iomap_readpages_actor+0xea/0x230
[ 202.669718] ? iomap_migrate_page+0xe0/0xe0
[ 202.673906] iomap_apply+0xb8/0x150
[ 202.677398] iomap_readpages+0xa7/0x1a0
[ 202.681237] ? iomap_migrate_page+0xe0/0xe0
[ 202.685424] read_pages+0x68/0x190
[ 202.688829] __do_page_cache_readahead+0x19c/0x1b0
[ 202.693622] ondemand_readahead+0x168/0x2a0
[ 202.697808] filemap_fault+0x32d/0x830
[ 202.701562] ? __mod_lruvec_state+0x40/0xe0
[ 202.705747] ? page_remove_rmap+0xcf/0x150
[ 202.709846] ? alloc_set_pte+0x240/0x2c0
[ 202.713775] __xfs_filemap_fault+0x71/0x1c0
[ 202.717963] __do_fault+0x38/0xb0
[ 202.721280] __handle_mm_fault+0x73f/0x1080
[ 202.725467] ? __switch_to_asm+0x34/0x70
[ 202.729390] ? __switch_to_asm+0x40/0x70
[ 202.733318] handle_mm_fault+0xce/0x1f0
[ 202.737158] __do_page_fault+0x231/0x480
[ 202.741083] page_fault+0x2f/0x40
[ 202.744404] RIP: 0033:0x400c20
[ 202.747461] Code: 45 c8 48 89 c6 bf 32 0e 40 00 b8 00 00 00 00 e8 76 fb ff ff c7 45 ec 00 00 00 00 eb 36 8b 45 ec 48 63 d0 48 8b 45 c8 48 01 d0 <0f> b6 00 0f be c0 01 45 e4 8b 45 ec 25 ff 0f 00 00 85 c0 75 10 8b
[ 202.766208] RSP: 002b:00007ffde95ae460 EFLAGS: 00010206
[ 202.771432] RAX: 00007ff71e855000 RBX: 0000000000000000 RCX: 000000000000001a
[ 202.778558] RDX: 0000000001dfd000 RSI: 000000007fffffe5 RDI: 0000000000000000
[ 202.785692] RBP: 00007ffde95af4b0 R08: 0000000000000000 R09: 00007ff73d2a520d
[ 202.792823] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000400850
[ 202.799949] R13: 00007ffde95af590 R14: 0000000000000000 R15: 0000000000000000
Further tests show that this also happens if one waits long enough on 5.3-rc1 as well.
So I don’t think we have a fix in tree yet.
Cheers,
Masoud
WARNING: multiple messages have this Message-ID (diff)
From: Hillf Danton <hdanton@sina.com>
To: Masoud Sharbiani <msharbiani@apple.com>
Cc: mhocko@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org,
Greg KH <gregkh@linuxfoundation.org>
Subject: Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1.
Date: Fri, 2 Aug 2019 16:08:01 +0800 [thread overview]
Message-ID: <7EE30F16-A90B-47DC-A065-3C21881CD1CC@apple.com> (raw)
Message-ID: <20190802080801.FgipbUiUIRs2pZQem8TIVsBBsNLNyssPt3Um-NmRPB4@z> (raw)
In-Reply-To: <20190801181952.GA8425@kroah.com>
On Thu, 01 Aug 2019 18:08:42 -0700 Masoud Sharbiani wrote:
>
> Allow me to issue a correction:
> Running this test on linux master
> <629f8205a6cc63d2e8e30956bad958a3507d018f> correctly terminates the
> leaker app with OOM.
> However, running it a second time (after removing the memory cgroup, and
> allowing the test script to run it again), causes this:
>
> kernel:watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [leaker1:7193]
>
>
> [ 202.511024] CPU: 7 PID: 7193 Comm: leaker1 Not tainted 5.3.0-rc2+ #8
> [ 202.517378] Hardware name: <redacted>
> [ 202.525554] RIP: 0010:lruvec_lru_size+0x49/0xf0
> [ 202.530085] Code: 41 89 ed b8 ff ff ff ff 45 31 f6 49 c1 e5 03 eb 19
> 48 63 d0 4c 89 e9 48 8b 14 d5 20 b7 11 b5 48 03 8b 88 00 00 00 4c 03 34
> 11 <48> c7 c6 80 c5 40 b5 89 c7 e8 29 a7 6f 00 3b 05 57 9d 24 01 72 d1
> [ 202.548831] RSP: 0018:ffffa7c5480df620 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [ 202.556398] RAX: 0000000000000000 RBX: ffff8f5b7a1af800 RCX: 00003859bfa03bc0
> [ 202.563528] RDX: ffff8f5b7f800000 RSI: 0000000000000018 RDI: ffffffffb540c580
> [ 202.570662] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004
> [ 202.577795] R10: ffff8f5b62548000 R11: 0000000000000000 R12: 0000000000000004
> [ 202.584928] R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000
> [ 202.592063] FS: 00007ff73d835740(0000) GS:ffff8f6b7f840000(0000) knlGS:0000000000000000
> [ 202.600149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 202.605895] CR2: 00007f1b1c00e428 CR3: 0000001021d56006 CR4: 00000000001606e0
> [ 202.613026] Call Trace:
> [ 202.615475] shrink_node_memcg+0xdb/0x7a0
> [ 202.619488] ? shrink_slab+0x266/0x2a0
> [ 202.623242] ? mem_cgroup_iter+0x10a/0x2c0
> [ 202.627337] shrink_node+0xdd/0x4c0
> [ 202.630831] do_try_to_free_pages+0xea/0x3c0
> [ 202.635104] try_to_free_mem_cgroup_pages+0xf5/0x1e0
> [ 202.640068] try_charge+0x279/0x7a0
> [ 202.643565] mem_cgroup_try_charge+0x51/0x1a0
> [ 202.647925] __add_to_page_cache_locked+0x19f/0x330
> [ 202.652800] ? __mod_lruvec_state+0x40/0xe0
> [ 202.656987] ? scan_shadow_nodes+0x30/0x30
> [ 202.661086] add_to_page_cache_lru+0x49/0xd0
> [ 202.665361] iomap_readpages_actor+0xea/0x230
> [ 202.669718] ? iomap_migrate_page+0xe0/0xe0
> [ 202.673906] iomap_apply+0xb8/0x150
> [ 202.677398] iomap_readpages+0xa7/0x1a0
> [ 202.681237] ? iomap_migrate_page+0xe0/0xe0
> [ 202.685424] read_pages+0x68/0x190
> [ 202.688829] __do_page_cache_readahead+0x19c/0x1b0
> [ 202.693622] ondemand_readahead+0x168/0x2a0
> [ 202.697808] filemap_fault+0x32d/0x830
> [ 202.701562] ? __mod_lruvec_state+0x40/0xe0
> [ 202.705747] ? page_remove_rmap+0xcf/0x150
> [ 202.709846] ? alloc_set_pte+0x240/0x2c0
> [ 202.713775] __xfs_filemap_fault+0x71/0x1c0
> [ 202.717963] __do_fault+0x38/0xb0
> [ 202.721280] __handle_mm_fault+0x73f/0x1080
> [ 202.725467] ? __switch_to_asm+0x34/0x70
> [ 202.729390] ? __switch_to_asm+0x40/0x70
> [ 202.733318] handle_mm_fault+0xce/0x1f0
> [ 202.737158] __do_page_fault+0x231/0x480
> [ 202.741083] page_fault+0x2f/0x40
> [ 202.744404] RIP: 0033:0x400c20
> [ 202.747461] Code: 45 c8 48 89 c6 bf 32 0e 40 00 b8 00 00 00 00 e8 76
> fb ff ff c7 45 ec 00 00 00 00 eb 36 8b 45 ec 48 63 d0 48 8b 45 c8 48 01
> d0 <0f> b6 00 0f be c0 01 45 e4 8b 45 ec 25 ff 0f 00 00 85 c0 75 10 8b
> [ 202.766208] RSP: 002b:00007ffde95ae460 EFLAGS: 00010206
> [ 202.771432] RAX: 00007ff71e855000 RBX: 0000000000000000 RCX: 000000000000001a
> [ 202.778558] RDX: 0000000001dfd000 RSI: 000000007fffffe5 RDI: 0000000000000000
> [ 202.785692] RBP: 00007ffde95af4b0 R08: 0000000000000000 R09: 00007ff73d2a520d
> [ 202.792823] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000400850
> [ 202.799949] R13: 00007ffde95af590 R14: 0000000000000000 R15: 0000000000000000
>
>
> Further tests show that this also happens if one waits long enough on
> 5.3-rc1 as well.
> So I dont think we have a fix in tree yet.
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2547,8 +2547,12 @@ retry:
nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages,
gfp_mask, may_swap);
- if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
- goto retry;
+ if (mem_cgroup_margin(mem_over_limit) >= nr_pages) {
+ if (nr_retries--)
+ goto retry;
+ /* give up charging memhog */
+ return -ENOMEM;
+ }
if (!drained) {
drain_all_stock(mem_over_limit);
--
next prev parent reply other threads:[~2019-08-02 1:09 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-01 18:04 Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Masoud Sharbiani
2019-08-01 18:19 ` Greg KH
2019-08-01 22:26 ` Masoud Sharbiani
2019-08-02 1:08 ` Masoud Sharbiani [this message]
2019-08-02 8:08 ` Hillf Danton
2019-08-02 8:18 ` Michal Hocko
2019-08-02 7:40 ` Michal Hocko
2019-08-02 14:18 ` Masoud Sharbiani
2019-08-02 14:41 ` Michal Hocko
2019-08-02 18:00 ` Masoud Sharbiani
2019-08-02 19:14 ` Michal Hocko
2019-08-02 23:28 ` Masoud Sharbiani
2019-08-03 2:36 ` Tetsuo Handa
2019-08-03 15:51 ` Tetsuo Handa
2019-08-03 17:41 ` Masoud Sharbiani
2019-08-03 18:24 ` Masoud Sharbiani
2019-08-05 8:42 ` Michal Hocko
2019-08-05 11:36 ` Tetsuo Handa
2019-08-05 11:44 ` Michal Hocko
2019-08-05 14:00 ` Tetsuo Handa
2019-08-05 14:26 ` Michal Hocko
2019-08-06 10:26 ` Tetsuo Handa
2019-08-06 10:50 ` Michal Hocko
2019-08-06 12:48 ` [PATCH v3] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer Tetsuo Handa
2019-08-05 8:18 ` Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Michal Hocko
2019-08-02 12:10 Hillf Danton
2019-08-02 13:40 ` Michal Hocko
2019-08-03 5:45 Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7EE30F16-A90B-47DC-A065-3C21881CD1CC@apple.com \
--to=msharbiani@apple.com \
--cc=cgroups@vger.kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).