From: Wang Yugui <wangyugui@e16-tech.com>
To: Dennis Zhou <dennis@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, linux-btrfs@vger.kernel.org
Subject: Re: unexpected -ENOMEM from percpu_counter_init()
Date: Sat, 10 Apr 2021 23:29:17 +0800 [thread overview]
Message-ID: <20210410232913.6F82.409509F4@e16-tech.com> (raw)
In-Reply-To: <YHBc+e8WQHZ/W0Bv@google.com>
Hi, Dennis Zhou
Thanks for your ncie answer.
but still a few questions.
> Percpu is not really cheap memory to allocate because it has a
> amplification factor of NR_CPUS. As a result, percpu on the critical
> path is really not something that is expected to be high throughput.
> Ideally things like btrfs snapshots should preallocate a number of these
> and not try to do atomic allocations because that in theory could fail
> because even after we go to the page allocator in the future we can't
> get enough pages due to needing to go into reclaim.
pre-allocate in module such as mempool_t is just used in a few place in
linux/fs. so most people like system wide pre-allocate, because it is
more easy to use?
can we add more chance to management the system wide pre-alloc
just like this?
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index dc1f4dc..eb3f592 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -226,6 +226,11 @@ static inline void memalloc_noio_restore(unsigned int flags)
static inline unsigned int memalloc_nofs_save(void)
{
unsigned int flags = current->flags & PF_MEMALLOC_NOFS;
+
+ // just like slab_pre_alloc_hook
+ fs_reclaim_acquire(current->flags & gfp_allowed_mask);
+ fs_reclaim_release(current->flags & gfp_allowed_mask);
+
current->flags |= PF_MEMALLOC_NOFS;
return flags;
}
> The workqueue approach has been good enough so far. Technically there is
> a higher priority workqueue that this work could be scheduled on, but
> save for this miss on my part, the system workqueue has worked out fine.
> In the future as I mentioned above. It would be good to support actually
> getting pages, but it's work that needs to be tackled with a bit of
> care. I might target the work for v5.14.
>
> > this is our application pipeline.
> > file_pre_process |
> > bwa.nipt xx |
> > samtools.nipt sort xx |
> > file_post_process
> >
> > file_pre_process/file_post_process is fast, so often are blocked by
> > pipe input/output.
> >
> > 'bwa.nipt xx' is a high-cpu-load, almost all of CPU cores.
> >
> > 'samtools.nipt sort xx' is a high-mem-load, it keep the input in memory.
> > if the memory is not enough, it will save all the buffer to temp file,
> > so it is sometimes high-IO-load too(write 60G or more to file).
> >
> >
> > xfstests(generic/476) is just high-IO-load, cpu/memory load is NOT high.
> > so xfstests(generic/476) maybe easy than our application pipeline.
> >
> > Although there is yet not a simple reproducer for another problem
> > happend here, but there is a little high chance that something is wrong
> > in btrfs/mm/fs-buffer.
> > > but another problem(os freezed without call trace, PANIC without OOPS?,
> > > the reason is yet unkown) still happen.
>
> I do not have an answer for this. I would recommend looking into kdump.
percpu ENOMEM problem blocked many heavy load test a little long time?
I still guess this problem of system freeze is a mm/btrfs problem.
OOM not work, OOPS not work too.
I try to reproduce it with some simple script. I noticed the value of
'free' is a little low, although 'available' is big.
# free -h
total used free shared buff/cache available
Mem: 188Gi 1.4Gi 5.5Gi 17Mi 181Gi 175Gi
Swap: 0B 0B 0B
vm.min_free_kbytes is auto configed to 4Gi(4194304)
# write files with the size >= memory size *3
#for((i=0;i<10;++i));do dd if=/dev/zero bs=1M count=64K of=/nodetmp/${i}.txt; free -h; done
any advice or patch to let the value of 'free' a little bigger?
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/04/10
next prev parent reply other threads:[~2021-04-10 15:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-01 10:51 unexpected -ENOMEM from percpu_counter_init() Wang Yugui
2021-04-02 1:49 ` Wang Yugui
2021-04-07 12:35 ` Vlastimil Babka
2021-04-07 13:09 ` Wang Yugui
2021-04-07 14:56 ` Dennis Zhou
2021-04-07 23:28 ` Wang Yugui
2021-04-08 2:44 ` Dennis Zhou
2021-04-08 9:20 ` Wang Yugui
2021-04-08 13:48 ` Dennis Zhou
2021-04-08 14:28 ` Filipe Manana
2021-04-08 15:02 ` Dennis Zhou
2021-04-09 11:39 ` Filipe Manana
2021-04-09 13:39 ` Dennis Zhou
2021-04-09 13:42 ` Filipe Manana
2021-04-09 0:08 ` Wang Yugui
2021-04-09 2:14 ` Dennis Zhou
2021-04-09 4:02 ` Wang Yugui
2021-04-09 7:36 ` Wang Yugui
2021-04-09 7:48 ` Wang Yugui
2021-04-09 13:56 ` Dennis Zhou
2021-04-10 15:29 ` Wang Yugui [this message]
2021-04-10 15:52 ` Dennis Zhou
2021-04-10 16:08 ` Wang Yugui
2021-04-11 15:20 ` Wang Yugui
2021-04-12 4:03 ` Dennis Zhou
2021-04-12 5:24 ` Wang Yugui
2021-04-09 9:52 ` Wang Yugui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210410232913.6F82.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=dennis@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).