linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	"Anna.Schumaker@Netapp.com" <Anna.Schumaker@netapp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
Date: Thu, 2 Apr 2020 18:35:01 +0200	[thread overview]
Message-ID: <20200402163501.GC9751@quack2.suse.cz> (raw)
In-Reply-To: <87sghmyd8v.fsf@notabene.neil.brown.name>

On Thu 02-04-20 10:53:20, NeilBrown wrote:
> 
> PF_LESS_THROTTLE exists for loop-back nfsd, and a similar need in the
> loop block driver, where a daemon needs to write to one bdi in
> order to free up writes queued to another bdi.
> 
> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
> pages, so that it can still dirty pages after other processses have been
> throttled.
> 
> This approach was designed when all threads were blocked equally,
> independently on which device they were writing to, or how fast it was.
> Since that time the writeback algorithm has changed substantially with
> different threads getting different allowances based on non-trivial
> heuristics.  This means the simple "add 25%" heuristic is no longer
> reliable.
> 
> This patch changes the heuristic to ignore the global limits and
> consider only the limit relevant to the bdi being written to.  This
> approach is already available for BDI_CAP_STRICTLIMIT users (fuse) and
> should not introduce surprises.  This has the desired result of
> protecting the task from the consequences of large amounts of dirty data
> queued for other devices.
> 
> This approach of "only consider the target bdi" is consistent with the
> other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes
> attention to be focussed only on the target bdi.
> 
> So this patch
>  - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
>  - remove the 25% bonus that that flag gives, and
>  - imposes 'strictlimit' handling for any process with PF_LOCAL_THROTTLE
>    set.
> 
> Note that previously realtime threads were treated the same as
> PF_LESS_THROTTLE threads.  This patch does *not* change the behvaiour for
> real-time threads, so it is now different from the behaviour of nfsd and
> loop tasks.  I don't know what is wanted for realtime.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

This makes sense to me and the patch looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

Thanks.

								Honza

> ---
>  drivers/block/loop.c  |  2 +-
>  fs/nfsd/vfs.c         |  9 +++++----
>  include/linux/sched.h |  2 +-
>  kernel/sys.c          |  2 +-
>  mm/page-writeback.c   | 10 ++++++----
>  mm/vmscan.c           |  4 ++--
>  6 files changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 739b372a5112..2c59371ce936 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -897,7 +897,7 @@ static void loop_unprepare_queue(struct loop_device *lo)
>  
>  static int loop_kthread_worker_fn(void *worker_ptr)
>  {
> -	current->flags |= PF_LESS_THROTTLE | PF_MEMALLOC_NOIO;
> +	current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
>  	return kthread_worker_fn(worker_ptr);
>  }
>  
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 0aa02eb18bd3..c3fbab1753ec 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -979,12 +979,13 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
>  
>  	if (test_bit(RQ_LOCAL, &rqstp->rq_flags))
>  		/*
> -		 * We want less throttling in balance_dirty_pages()
> -		 * and shrink_inactive_list() so that nfs to
> +		 * We want throttling in balance_dirty_pages()
> +		 * and shrink_inactive_list() to only consider
> +		 * the backingdev we are writing to, so that nfs to
>  		 * localhost doesn't cause nfsd to lock up due to all
>  		 * the client's dirty pages or its congested queue.
>  		 */
> -		current->flags |= PF_LESS_THROTTLE;
> +		current->flags |= PF_LOCAL_THROTTLE;
>  
>  	exp = fhp->fh_export;
>  	use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp);
> @@ -1037,7 +1038,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
>  		nfserr = nfserrno(host_err);
>  	}
>  	if (test_bit(RQ_LOCAL, &rqstp->rq_flags))
> -		current_restore_flags(pflags, PF_LESS_THROTTLE);
> +		current_restore_flags(pflags, PF_LOCAL_THROTTLE);
>  	return nfserr;
>  }
>  
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 04278493bf15..5dcd27abc8cd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1473,7 +1473,7 @@ extern struct pid *cad_pid;
>  #define PF_KSWAPD		0x00020000	/* I am kswapd */
>  #define PF_MEMALLOC_NOFS	0x00040000	/* All allocation requests will inherit GFP_NOFS */
>  #define PF_MEMALLOC_NOIO	0x00080000	/* All allocation requests will inherit GFP_NOIO */
> -#define PF_LESS_THROTTLE	0x00100000	/* Throttle me less: I clean memory */
> +#define PF_LOCAL_THROTTLE	0x00100000	/* Throttle me less: I clean memory */
>  #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
>  #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
>  #define PF_SWAPWRITE		0x00800000	/* Allowed to write to swap */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index d325f3ab624a..180a2fa33f7f 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2262,7 +2262,7 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>  	return -EINVAL;
>  }
>  
> -#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LESS_THROTTLE)
> +#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>  
>  SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  		unsigned long, arg4, unsigned long, arg5)
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2caf780a42e7..2afb09fa2fe0 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -387,8 +387,7 @@ static unsigned long global_dirtyable_memory(void)
>   * Calculate @dtc->thresh and ->bg_thresh considering
>   * vm_dirty_{bytes|ratio} and dirty_background_{bytes|ratio}.  The caller
>   * must ensure that @dtc->avail is set before calling this function.  The
> - * dirty limits will be lifted by 1/4 for PF_LESS_THROTTLE (ie. nfsd) and
> - * real-time tasks.
> + * dirty limits will be lifted by 1/4 for real-time tasks.
>   */
>  static void domain_dirty_limits(struct dirty_throttle_control *dtc)
>  {
> @@ -436,7 +435,7 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
>  	if (bg_thresh >= thresh)
>  		bg_thresh = thresh / 2;
>  	tsk = current;
> -	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> +	if (rt_task(tsk)) {
>  		bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
>  		thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
>  	}
> @@ -486,7 +485,7 @@ static unsigned long node_dirty_limit(struct pglist_data *pgdat)
>  	else
>  		dirty = vm_dirty_ratio * node_memory / 100;
>  
> -	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk))
> +	if (rt_task(tsk))
>  		dirty += dirty / 4;
>  
>  	return dirty;
> @@ -1580,6 +1579,9 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
>  	bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT;
>  	unsigned long start_time = jiffies;
>  
> +	if (current->flags & PF_LOCAL_THROTTLE)
> +		/* This task must only be throttled by its own writeback */
> +		strictlimit = true;
>  	for (;;) {
>  		unsigned long now = jiffies;
>  		unsigned long dirty, thresh, bg_thresh;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 876370565455..c5cf25938c56 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1880,13 +1880,13 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
>  
>  /*
>   * If a kernel thread (such as nfsd for loop-back mounts) services
> - * a backing device by writing to the page cache it sets PF_LESS_THROTTLE.
> + * a backing device by writing to the page cache it sets PF_LOCAL_THROTTLE.
>   * In that case we should only throttle if the backing device it is
>   * writing to is congested.  In other cases it is safe to throttle.
>   */
>  static int current_may_throttle(void)
>  {
> -	return !(current->flags & PF_LESS_THROTTLE) ||
> +	return !(current->flags & PF_LOCAL_THROTTLE) ||
>  		current->backing_dev_info == NULL ||
>  		bdi_write_congested(current->backing_dev_info);
>  }
> -- 
> 2.26.0
> 


-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


  parent reply	other threads:[~2020-04-02 16:35 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26  3:25 [PATCH/RFC] MM: fix writeback for NFS NeilBrown
2020-04-01 23:52 ` Writeback fixes " NeilBrown
2020-04-01 23:53   ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-01 23:54     ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK NeilBrown
2020-04-02 15:10       ` Christoph Hellwig
2020-04-02 22:35         ` [PATCH 2/2 - v2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-04-03  9:42           ` Jan Kara
2020-04-03 11:03             ` Michal Hocko
2020-04-06  0:14               ` NeilBrown
2020-04-06  7:41                 ` Michal Hocko
2020-04-06 23:28             ` NeilBrown
2020-04-07  7:33               ` Michal Hocko
2020-04-02 19:55       ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK Jan Kara
2020-04-02 16:35     ` Jan Kara [this message]
2020-04-03 15:15     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE Michal Hocko
2020-04-03 21:40       ` NeilBrown
2020-04-06  7:44         ` Michal Hocko
2020-04-06  9:36           ` Jan Kara
2020-04-06 10:57             ` Michal Hocko
2020-04-06 11:58             ` NeilBrown
2020-04-02  4:26   ` Hillf Danton
2020-04-02  4:57     ` NeilBrown
2020-04-06  3:58     ` Hillf Danton
2020-04-06 23:42   ` Writeback fixes for NFS - V2 NeilBrown
2020-04-06 23:43     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-07 16:10       ` Chuck Lever
2020-04-16  0:29     ` Writeback fixes for NFS - V3 NeilBrown
2020-04-16  0:30       ` [PATCH 1/2 V3] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-16  6:54         ` Christoph Hellwig
2020-04-16 15:19         ` Jan Kara
2020-04-21  2:22           ` NeilBrown
2020-04-22 12:46             ` Jan Kara
2020-05-13  7:16               ` NeilBrown
2020-05-13  7:17                 ` [PATCH 1/2 V4] " NeilBrown
2020-05-15 11:10                   ` Jan Kara
2020-06-01  0:46                     ` Writeback fixes for NFS NeilBrown
2020-06-01  0:48                       ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-06-01  0:49                       ` [PATCH 2/2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-05-13  7:18                 ` [PATCH 2/2 V4] " NeilBrown
2020-05-15  9:59                   ` Jan Kara
2020-04-16  0:31       ` [PATCH 2/2 V3] " NeilBrown
2020-04-16  6:56         ` Christoph Hellwig
2020-04-16 15:24         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200402163501.GC9751@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Anna.Schumaker@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).