linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dennis Zhou <dennis@kernel.org>
To: Wang Yugui <wangyugui@e16-tech.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-btrfs@vger.kernel.org
Subject: Re: unexpected -ENOMEM from percpu_counter_init()
Date: Sat, 10 Apr 2021 15:52:04 +0000	[thread overview]
Message-ID: <YHHJpCVpS8sQg7Go@google.com> (raw)
In-Reply-To: <20210410232913.6F82.409509F4@e16-tech.com>

On Sat, Apr 10, 2021 at 11:29:17PM +0800, Wang Yugui wrote:
> Hi, Dennis Zhou 
> 
> Thanks for your ncie answer.
> but still a few questions.
> 
> > Percpu is not really cheap memory to allocate because it has a
> > amplification factor of NR_CPUS. As a result, percpu on the critical
> > path is really not something that is expected to be high throughput.
> 
> > Ideally things like btrfs snapshots should preallocate a number of these
> > and not try to do atomic allocations because that in theory could fail
> > because even after we go to the page allocator in the future we can't
> > get enough pages due to needing to go into reclaim.
> 
> pre-allocate in module such as mempool_t is just used in a few place in
> linux/fs.  so most people like system wide pre-allocate, because it is
> more easy to use?
> 
> can we add more chance to management the system wide pre-alloc
> just like this?
> 
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index dc1f4dc..eb3f592 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -226,6 +226,11 @@ static inline void memalloc_noio_restore(unsigned int flags)
>  static inline unsigned int memalloc_nofs_save(void)
>  {
>  	unsigned int flags = current->flags & PF_MEMALLOC_NOFS;
> +
> +	// just like slab_pre_alloc_hook
> +	fs_reclaim_acquire(current->flags & gfp_allowed_mask);
> +	fs_reclaim_release(current->flags & gfp_allowed_mask);
> +
>  	current->flags |= PF_MEMALLOC_NOFS;
>  	return flags;
>  }
> 
> 
> > The workqueue approach has been good enough so far. Technically there is
> > a higher priority workqueue that this work could be scheduled on, but
> > save for this miss on my part, the system workqueue has worked out fine.
> 
> > In the future as I mentioned above. It would be good to support actually
> > getting pages, but it's work that needs to be tackled with a bit of
> > care. I might target the work for v5.14.
> > 
> > > this is our application pipeline.
> > > 	file_pre_process |
> > > 	bwa.nipt xx |
> > > 	samtools.nipt sort xx |
> > > 	file_post_process
> > > 
> > > file_pre_process/file_post_process is fast, so often are blocked by
> > > pipe input/output.
> > > 
> > > 'bwa.nipt xx' is a high-cpu-load, almost all of CPU cores.
> > > 
> > > 'samtools.nipt sort xx' is a high-mem-load, it keep the input in memory.
> > > if the memory is not enough, it will save all the buffer to temp file,
> > > so it is sometimes high-IO-load too(write 60G or more to file).
> > > 
> > > 
> > > xfstests(generic/476) is just high-IO-load, cpu/memory load is NOT high.
> > > so xfstests(generic/476) maybe easy than our application pipeline.
> > > 
> > > Although there is yet not a simple reproducer for another problem
> > > happend here, but there is a little high chance that something is wrong
> > > in btrfs/mm/fs-buffer.
> > > > but another problem(os freezed without call trace, PANIC without OOPS?,
> > > > the reason is yet unkown) still happen.
> > 
> > I do not have an answer for this. I would recommend looking into kdump.
> 
> percpu ENOMEM problem blocked many heavy load test a little long time?
> I still guess this problem of system freeze is a mm/btrfs problem.
> OOM not work, OOPS not work too.
> 

I don't follow. Is this still a problem after the patch?

> I try to reproduce it with some simple script. I noticed the value of
> 'free' is a little low, although 'available' is big.
> 
> # free -h
>               total        used        free      shared  buff/cache   available
> Mem:          188Gi       1.4Gi       5.5Gi        17Mi       181Gi       175Gi
> Swap:            0B          0B          0B
> 
> vm.min_free_kbytes is auto configed to 4Gi(4194304)
> 
> # write files with the size >= memory size *3
> #for((i=0;i<10;++i));do dd if=/dev/zero bs=1M count=64K of=/nodetmp/${i}.txt; free -h; done
> 
> any advice or patch to let the value of 'free' a little bigger?
> 
> 
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2021/04/10
> 
> 
> 


  reply	other threads:[~2021-04-10 15:52 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 10:51 Wang Yugui
2021-04-02  1:49 ` Wang Yugui
2021-04-07 12:35 ` Vlastimil Babka
2021-04-07 13:09   ` Wang Yugui
2021-04-07 14:56     ` Dennis Zhou
2021-04-07 23:28       ` Wang Yugui
2021-04-08  2:44         ` Dennis Zhou
2021-04-08  9:20           ` Wang Yugui
2021-04-08 13:48             ` Dennis Zhou
2021-04-08 14:28               ` Filipe Manana
2021-04-08 15:02                 ` Dennis Zhou
2021-04-09 11:39                   ` Filipe Manana
2021-04-09 13:39                     ` Dennis Zhou
2021-04-09 13:42                       ` Filipe Manana
2021-04-09  0:08               ` Wang Yugui
2021-04-09  2:14                 ` Dennis Zhou
2021-04-09  4:02                   ` Wang Yugui
2021-04-09  7:36                     ` Wang Yugui
2021-04-09  7:48                       ` Wang Yugui
2021-04-09 13:56                       ` Dennis Zhou
2021-04-10 15:29                         ` Wang Yugui
2021-04-10 15:52                           ` Dennis Zhou [this message]
2021-04-10 16:08                             ` Wang Yugui
2021-04-11 15:20                               ` Wang Yugui
2021-04-12  4:03                                 ` Dennis Zhou
2021-04-12  5:24                                   ` Wang Yugui
2021-04-09  9:52   ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHHJpCVpS8sQg7Go@google.com \
    --to=dennis@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    --cc=wangyugui@e16-tech.com \
    --subject='Re: unexpected -ENOMEM from percpu_counter_init()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox