All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masoud Sharbiani <msharbiani@apple.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: gregkh@linuxfoundation.org, hannes@cmpxchg.org,
	vdavydov.dev@gmail.com, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1.
Date: Fri, 02 Aug 2019 11:00:55 -0700	[thread overview]
Message-ID: <5DE6F4AE-F3F9-4C52-9DFC-E066D9DD5EDC@apple.com> (raw)
In-Reply-To: <20190802144110.GL6461@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 6201 bytes --]



> On Aug 2, 2019, at 7:41 AM, Michal Hocko <mhocko@kernel.org> wrote:
> 
> On Fri 02-08-19 07:18:17, Masoud Sharbiani wrote:
>> 
>> 
>>> On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
>>> 
>>> On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote:
>>>> Hey folks,
>>>> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1.
>>>> It was introduced by
>>>> 
>>>> 29ef680 memcg, oom: move out_of_memory back to the charge path 
>>> 
>>> This commit shouldn't really change the OOM behavior for your particular
>>> test case. It would have changed MAP_POPULATE behavior but your usage is
>>> triggering the standard page fault path. The only difference with
>>> 29ef680 is that the OOM killer is invoked during the charge path rather
>>> than on the way out of the page fault.
>>> 
>>> Anyway, I tried to run your test case in a loop and leaker always ends
>>> up being killed as expected with 5.2. See the below oom report. There
>>> must be something else going on. How much swap do you have on your
>>> system?
>> 
>> I do not have swap defined. 
> 
> OK, I have retested with swap disabled and again everything seems to be
> working as expected. The oom happens earlier because I do not have to
> wait for the swap to get full.
> 

In my tests (with the script provided), it only loops 11 iterations before hanging, and uttering the soft lockup message.


> Which fs do you use to write the file that you mmap?

/dev/sda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Part of the soft lockup path actually specifies that it is going through __xfs_filemap_fault():

[  561.452933] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [leaker:3261]
[  561.459904] Modules linked in: dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt gpio_ich iTCO_vendor_support dcdbas ipmi_ssif intel_powerc
lamp coretemp kvm_intel ses ipmi_si kvm enclosure scsi_transport_sas ipmi_devintf irqbypass pcspkr lpc_ich sg joydev ipmi_msghandler wmi acp
i_power_meter acpi_cpufreq xfs libcrc32c ata_generic sd_mod pata_acpi ata_piix libata megaraid_sas crc32c_intel serio_raw bnx2 bonding
[  561.495979] CPU: 4 PID: 3261 Comm: leaker Tainted: G          I  L    5.3.0-rc2+ #10
[  561.503704] Hardware name: Dell Inc. PowerEdge R710/0YDJK3, BIOS 6.4.0 07/23/2013
[  561.511168] RIP: 0010:lruvec_lru_size+0x49/0xf0
[  561.515687] Code: 41 89 ed b8 ff ff ff ff 45 31 f6 49 c1 e5 03 eb 19 48 63 d0 4c 89 e9 48 03 8b 88 00 00 00 48 8b 14 d5 60 a9 92 94 4c 03
 34 11 <48> c7 c6 80 7c bf 94 89 c7 e8 89 d3 59 00 3b 05 27 eb ff 00 72 d1
[  561.534418] RSP: 0018:ffffb5f886a3f640 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  561.541968] RAX: 0000000000000002 RBX: ffff96fca3bba400 RCX: 00003ef5d82059f0
[  561.549085] RDX: ffff9702a7a40000 RSI: 0000000000000010 RDI: ffffffff94bf7c80
[  561.556202] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff94ae1c00
[  561.563318] R10: ffff96fcc7802520 R11: 0000000000000000 R12: 0000000000000004
[  561.570435] R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000
[  561.577553] FS:  00007f5522602740(0000) GS:ffff9702a7a80000(0000) knlGS:0000000000000000
[  561.585623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  561.591352] CR2: 00007fba755f95b0 CR3: 0000000c646dc000 CR4: 00000000000006e0
[  561.598468] Call Trace:
[  561.600907]  shrink_node_memcg+0xc8/0x790
[  561.604905]  ? shrink_slab+0x245/0x280
[  561.608644]  ? mem_cgroup_iter+0x10a/0x2c0
[  561.612728]  shrink_node+0xcd/0x490
[  561.616208]  do_try_to_free_pages+0xda/0x3a0
[  561.620466]  ? mem_cgroup_select_victim_node+0x43/0x2f0
[  561.625678]  try_to_free_mem_cgroup_pages+0xe7/0x1c0
[  561.630629]  try_charge+0x246/0x7a0
[  561.634107]  mem_cgroup_try_charge+0x6b/0x1e0
[  561.638453]  ? mem_cgroup_commit_charge+0x5a/0x110
[  561.643231]  __add_to_page_cache_locked+0x195/0x330
[  561.648100]  ? scan_shadow_nodes+0x30/0x30
[  561.652184]  add_to_page_cache_lru+0x39/0xa0
[  561.656442]  iomap_readpages_actor+0xf2/0x230
[  561.660787]  iomap_apply+0xa3/0x130
[  561.664266]  iomap_readpages+0x97/0x180
[  561.668091]  ? iomap_migrate_page+0xe0/0xe0
[  561.672266]  read_pages+0x57/0x180
[  561.675657]  __do_page_cache_readahead+0x1ac/0x1c0
[  561.680436]  ondemand_readahead+0x168/0x2a0
[  561.684606]  filemap_fault+0x30d/0x830
[  561.688343]  ? flush_tlb_func_common.isra.8+0x147/0x230
[  561.693554]  ? __mod_lruvec_state+0x40/0xe0
[  561.697726]  ? alloc_set_pte+0x4e6/0x5b0
[  561.701669]  __xfs_filemap_fault+0x61/0x190 [xfs]
[  561.706361]  __do_fault+0x38/0xb0
[  561.709666]  __handle_mm_fault+0xbee/0xe90
[  561.713750]  handle_mm_fault+0xe2/0x200
[  561.717574]  __do_page_fault+0x224/0x490
[  561.721485]  do_page_fault+0x31/0x120
[  561.725137]  page_fault+0x3e/0x50
[  561.728439] RIP: 0033:0x400c5a
[  561.731483] Code: 45 c0 48 89 c6 bf 77 0e 40 00 b8 00 00 00 00 e8 3c fb ff ff c7 45 dc 00 00 00 00 eb 36 8b 45 dc 48 63 d0 48 8b 45 c0 48
 01 d0 <0f> b6 00 0f be c0 01 45 e8 8b 45 dc 25 ff 0f 00 00 85 c0 75 10 8b
[  561.750214] RSP: 002b:00007fffba1d9450 EFLAGS: 00010206
[  561.755426] RAX: 00007f550346b000 RBX: 0000000000000000 RCX: 000000000000001a
[  561.762542] RDX: 0000000001c4c000 RSI: 000000007fffffe5 RDI: 0000000000000000
[  561.769659] RBP: 00007fffba1da4a0 R08: 0000000000000000 R09: 00007f552206c20d
[  561.776775] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000400850
[  561.783892] R13: 00007fffba1da580 R14: 0000000000000000 R15: 0000000000000000


If I switch the backing file to a ext4 filesystem (separate hard drive), it OOMs.


If I switch the file used to /dev/zero, it OOMs: 
…
Todal sum was 0. Loop count is 11
Buffer is @ 0x7f2b66c00000
./test-script-devzero.sh: line 16:  3561 Killed                  ./leaker -p 10240 -c 100000


> Or could you try to
> simplify your test even further? E.g. does everything work as expected
> when doing anonymous mmap rather than file backed one?

It also OOMs with MAP_ANON. 

Hope that helps.
Masoud


> -- 
> Michal Hocko
> SUSE Labs


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3437 bytes --]

  reply	other threads:[~2019-08-02 18:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-01 18:04 Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Masoud Sharbiani
2019-08-01 18:19 ` Greg KH
2019-08-01 22:26   ` Masoud Sharbiani
2019-08-02  1:08   ` Masoud Sharbiani
2019-08-02  8:08     ` Hillf Danton
2019-08-02  8:18     ` Michal Hocko
2019-08-02  7:40 ` Michal Hocko
2019-08-02  7:40   ` Michal Hocko
2019-08-02 14:18   ` Masoud Sharbiani
2019-08-02 14:18     ` Masoud Sharbiani
2019-08-02 14:41     ` Michal Hocko
2019-08-02 14:41       ` Michal Hocko
2019-08-02 18:00       ` Masoud Sharbiani [this message]
2019-08-02 19:14         ` Michal Hocko
2019-08-02 23:28           ` Masoud Sharbiani
2019-08-03  2:36             ` Tetsuo Handa
2019-08-03 15:51               ` Tetsuo Handa
2019-08-03 17:41                 ` Masoud Sharbiani
2019-08-03 18:24                   ` Masoud Sharbiani
2019-08-05  8:42                 ` Michal Hocko
2019-08-05 11:36                   ` Tetsuo Handa
2019-08-05 11:44                     ` Michal Hocko
2019-08-05 14:00                       ` Tetsuo Handa
2019-08-05 14:26                         ` Michal Hocko
2019-08-06 10:26                           ` Tetsuo Handa
2019-08-06 10:50                             ` Michal Hocko
2019-08-06 12:48                               ` [PATCH v3] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer Tetsuo Handa
2019-08-05  8:18             ` Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Michal Hocko
2019-08-02 12:10 Hillf Danton
2019-08-02 13:40 ` Michal Hocko
2019-08-03  5:45 Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5DE6F4AE-F3F9-4C52-9DFC-E066D9DD5EDC@apple.com \
    --to=msharbiani@apple.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.