From: Wang Yugui <wangyugui@e16-tech.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, linux-btrfs@vger.kernel.org
Subject: Re: unexpected -ENOMEM from percpu_counter_init()
Date: Fri, 09 Apr 2021 12:02:15 +0800 [thread overview]
Message-ID: <20210409120214.7BB6.409509F4@e16-tech.com> (raw)
In-Reply-To: <YG+4fS7W+Ii4IxO6@google.com>
Hi,
> On Fri, Apr 09, 2021 at 08:08:00AM +0800, Wang Yugui wrote:
> > Hi,
> >
> > > > kernel: at least 5.10.26/5.10.27/5.10.28
> > > >
> > > > This problem is triggered by our application, NOT xfstests.
> > > > But our applicaiton have some heavy write load just like xfstest/generic/476.
> > > > Our application use at most 75% of memory, if still not enough,
> > > > it will write out all buffer info to filesystem.
> > >
> > > Do you use cgroups at all? If yes can you describe the workload pattern
> > > a bit.
> >
> > cgroups is enabled defaultly, so cgroups is used.
> >
> > This is the output of systemd-cgls, ''samtools.nipt sort -m 60G" is one
> > of our application. but our application is NOT cgroups-aware, and it NOT
> > call any cgroup interface directly.
> >
> > Control group /:
> > -.slice
> > ├─user.slice
> > │ └─user-0.slice
> > │ ├─session-55.scope
> > │ │ ├─48747 sshd: root [priv]
> > │ │ ├─48788 sshd: root@notty
> > │ │ ├─48795 perl -e @GNU_Parallel=split/_/,"use_IPC::Open3;_use_MIME::Base6...
> > │ │ ├─48943 samtools.nipt sort -m 60G -T /nodetmp//nfs/biowrk/baseline.wgs2...
> > │ │ ├─....
> > │ └─user@0.service
> > │ └─init.scope
> > │ ├─48775 /usr/lib/systemd/systemd --user
> > │ └─48781 (sd-pam)
> > ├─init.scope
> > │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18
> > └─system.slice
> > ├─rngd.service
> > │ └─1577 /sbin/rngd -f --fill-watermark=0
> > ├─irqbalance.service
> > │ └─1543 /usr/sbin/irqbalance --foreground
> > ....
> >
> >
> > > > This problem is happen in linux kernel 5.10.x, but not happen in linux
> > > > kernel 5.4.x. It have high frequency to repduce too.
> > >
> > > Ah. Can you try the following patch?
> > > https://lore.kernel.org/lkml/20210408035736.883861-4-guro@fb.com/
> > >
> > > Thanks,
> > > Dennis
> >
> > kernel: kernel 5.10.28+this patch
> > result: yet not happen after 4 times test.
> > without this path, the reproduce frequency is >50%
> >
> > And a question about this,
> > > > > > upper caller:
> > > > > > nofs_flag = memalloc_nofs_save();
> > > > > > ret = btrfs_drew_lock_init(&root->snapshot_lock);
> > > > > > memalloc_nofs_restore(nofs_flag);
> > >
> > > The issue is here. nofs is set which means percpu attempts an atomic
> > > allocation. If it cannot find anything already allocated it isn't happy.
> > > This was done before memalloc_nofs_{save/restore}() were pervasive.
> > >
> > > Percpu should probably try to allocate some pages if possible even if
> > > nofs is set.
> >
> > Should we check and pre-alloc memory inside memalloc_nofs_restore()?
> > another memalloc_nofs_save() may come soon.
> >
> > something like this in memalloc_nofs_save()?
> > if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_LOW)
> > pcpu_schedule_balance_work();
> >
>
> Percpu does do this via a workqueue item. The issue is in v5.9 we
> introduced 2 types of chunks. However, the free float page number was
> for the total. So even if 1 chunk type dropped below, the other chunk
> type might have enough pages. I'm queuing this for 5.12 and will send it
> out assuming it does fix your problem.
>
> >
> > by the way, this problem still happen in kernel 5.10.28+this patch.
> > Is this is a PANIC without OOPS? any guide for troubleshooting please.
>
> Sorry I don't follow. Above you said the problem hasn't reproed. But now
> you're saying it does? Does your issue still reproduce with the patch
> above?
I'm sorry.
The problem (-ENOMEM of percpu_counter_init) yet not happen with
the patch(https://lore.kernel.org/lkml/20210408035736.883861-4-guro@fb.com/).
but another problem(os freezed without call trace, PANIC without OOPS?,
the reason is yet unkown) still happen.
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/04/09
>
> > > problem:
> > > OS/VGA console is freezed , and no call trace is outputed.
> > > Just some info is outputed to IPMI/dell iDRAC
> > > 2 | 04/03/2021 | 11:35:01 | OS Critical Stop #0x46 | Run-time critical stop () | Asserted
> > > 3 | Linux kernel panic: Fatal excep
> > > 4 | Linux kernel panic: tion
> >
> > Best Regards
> > Wang Yugui (wangyugui@e16-tech.com)
> > 2021/04/08
> >
>
> Thanks,
> Dennis
next prev parent reply other threads:[~2021-04-09 4:02 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-01 10:51 unexpected -ENOMEM from percpu_counter_init() Wang Yugui
2021-04-02 1:49 ` Wang Yugui
2021-04-07 12:35 ` Vlastimil Babka
2021-04-07 13:09 ` Wang Yugui
2021-04-07 14:56 ` Dennis Zhou
2021-04-07 23:28 ` Wang Yugui
2021-04-08 2:44 ` Dennis Zhou
2021-04-08 9:20 ` Wang Yugui
2021-04-08 13:48 ` Dennis Zhou
2021-04-08 14:28 ` Filipe Manana
2021-04-08 15:02 ` Dennis Zhou
2021-04-09 11:39 ` Filipe Manana
2021-04-09 13:39 ` Dennis Zhou
2021-04-09 13:42 ` Filipe Manana
2021-04-09 0:08 ` Wang Yugui
2021-04-09 2:14 ` Dennis Zhou
2021-04-09 4:02 ` Wang Yugui [this message]
2021-04-09 7:36 ` Wang Yugui
2021-04-09 7:48 ` Wang Yugui
2021-04-09 13:56 ` Dennis Zhou
2021-04-10 15:29 ` Wang Yugui
2021-04-10 15:52 ` Dennis Zhou
2021-04-10 16:08 ` Wang Yugui
2021-04-11 15:20 ` Wang Yugui
2021-04-12 4:03 ` Dennis Zhou
2021-04-12 5:24 ` Wang Yugui
2021-04-09 9:52 ` Wang Yugui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210409120214.7BB6.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=dennis@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).