All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/2 V4] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
Date: Fri, 15 May 2020 13:10:43 +0200	[thread overview]
Message-ID: <20200515111043.GK9569@quack2.suse.cz> (raw)
In-Reply-To: <87y2pw9udb.fsf@notabene.neil.brown.name>

On Wed 13-05-20 17:17:20, NeilBrown wrote:
> 
> PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
> loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
> daemon needs to write to one bdi (the final bdi) in order to free up
> writes queued to another bdi (the client bdi).
> 
> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
> pages, so that it can still dirty pages after other processses have been
> throttled.  The purpose of this is to avoid deadlock that happen when
> the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
> but it is being thottled and cannot write.
> 
> This approach was designed when all threads were blocked equally,
> independently on which device they were writing to, or how fast it was.
> Since that time the writeback algorithm has changed substantially with
> different threads getting different allowances based on non-trivial
> heuristics.  This means the simple "add 25%" heuristic is no longer
> reliable.
> 
> The important issue is not that the daemon needs a *larger* dirty page
> allowance, but that it needs a *private* dirty page allowance, so that
> dirty pages for the "client" bdi that it is helping to clear (the bdi for
> an NFS filesystem or loop block device etc) do not affect the throttling
> of the deamon writing to the "final" bdi.
> 
> This patch changes the heuristic so that the task is not throttled when
> the bdi it is writing to has a dirty page count below below (or equal
> to) the free-run threshold for that bdi.  This ensures it will always be
> able to have some pages in flight, and so will not deadlock.
> 
> In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
> still be throttled by global threshold, but that is acceptable as it is
> only the deadlock state that is interesting for this flag.
> 
> This approach of "only throttle when target bdi is busy" is consistent
> with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
> it causes attention to be focussed only on the target bdi.
> 
> So this patch
>  - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
>  - removes the 25% bonus that that flag gives, and
>  - If PF_LOCAL_THROTTLE is set, don't delay at all unless the
>    global and the local free-run thresholds are exceeded.
> 
> Note that previously realtime threads were treated the same as
> PF_LESS_THROTTLE threads.  This patch does *not* change the behvaiour for
> real-time threads, so it is now different from the behaviour of nfsd and
> loop tasks.  I don't know what is wanted for realtime.
> 
> Acked-by: Chuck Lever <chuck.lever@oracle.com> (for nfsd change)
> Signed-off-by: NeilBrown <neilb@suse.de>

The idea looks good to me. It will still have the "threads may hit hard
wall" behavior when BDI freerun threshold is crossed at the moment we are
very close to the global limit but that should be rare. So I think for now
this good enough.

The patch looks good to me so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

Throttling patches usually go through mm tree so ask Andrew to pick them
up.


								Honza

> @@ -1653,8 +1652,12 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
>  		if (dirty <= dirty_freerun_ceiling(thresh, bg_thresh) &&
>  		    (!mdtc ||
>  		     m_dirty <= dirty_freerun_ceiling(m_thresh, m_bg_thresh))) {
> -			unsigned long intv = dirty_poll_interval(dirty, thresh);
> -			unsigned long m_intv = ULONG_MAX;
> +			unsigned long intv;
> +			unsigned long m_intv;
> +
> +		free_running:
> +			intv = dirty_poll_interval(dirty, thresh);
> +			m_intv = ULONG_MAX;
>  
>  			current->dirty_paused_when = now;
>  			current->nr_dirtied = 0;
> @@ -1673,9 +1676,20 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
>  		 * Calculate global domain's pos_ratio and select the
>  		 * global dtc by default.
>  		 */
> -		if (!strictlimit)
> +		if (!strictlimit) {
>  			wb_dirty_limits(gdtc);
>  
> +			if ((current->flags & PF_LOCAL_THROTTLE) &&
> +			    gdtc->wb_dirty <
> +			    dirty_freerun_ceiling(gdtc->wb_thresh,
> +						  gdtc->wb_bg_thresh))
> +				/*
> +				 * LOCAL_THROTTLE tasks must not be throttled
> +				 * when below the per-wb freerun ceiling.
> +				 */
> +				goto free_running;
> +		}
> +
>  		dirty_exceeded = (gdtc->wb_dirty > gdtc->wb_thresh) &&
>  			((gdtc->dirty > gdtc->thresh) || strictlimit);
>  
> @@ -1689,9 +1703,20 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
>  			 * both global and memcg domains.  Choose the one
>  			 * w/ lower pos_ratio.
>  			 */
> -			if (!strictlimit)
> +			if (!strictlimit) {
>  				wb_dirty_limits(mdtc);
>  
> +				if ((current->flags & PF_LOCAL_THROTTLE) &&
> +				    mdtc->wb_dirty <
> +				    dirty_freerun_ceiling(mdtc->wb_thresh,
> +							  mdtc->wb_bg_thresh))
> +					/*
> +					 * LOCAL_THROTTLE tasks must not be
> +					 * throttled when below the per-wb
> +					 * freerun ceiling.
> +					 */
> +					goto free_running;
> +			}
>  			dirty_exceeded |= (mdtc->wb_dirty > mdtc->wb_thresh) &&
>  				((mdtc->dirty > mdtc->thresh) || strictlimit);
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a37c87b5aee2..b2f5deb3603c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1878,13 +1878,13 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
>  
>  /*
>   * If a kernel thread (such as nfsd for loop-back mounts) services
> - * a backing device by writing to the page cache it sets PF_LESS_THROTTLE.
> + * a backing device by writing to the page cache it sets PF_LOCAL_THROTTLE.
>   * In that case we should only throttle if the backing device it is
>   * writing to is congested.  In other cases it is safe to throttle.
>   */
>  static int current_may_throttle(void)
>  {
> -	return !(current->flags & PF_LESS_THROTTLE) ||
> +	return !(current->flags & PF_LOCAL_THROTTLE) ||
>  		current->backing_dev_info == NULL ||
>  		bdi_write_congested(current->backing_dev_info);
>  }
> -- 
> 2.26.2
> 


-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-05-15 11:10 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26  3:25 [PATCH/RFC] MM: fix writeback for NFS NeilBrown
2020-04-01 23:52 ` Writeback fixes " NeilBrown
2020-04-01 23:53   ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-01 23:54     ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK NeilBrown
2020-04-02 15:10       ` Christoph Hellwig
2020-04-02 22:35         ` [PATCH 2/2 - v2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-04-03  9:42           ` Jan Kara
2020-04-03 11:03             ` Michal Hocko
2020-04-06  0:14               ` NeilBrown
2020-04-06  7:41                 ` Michal Hocko
2020-04-06 23:28             ` NeilBrown
2020-04-07  7:33               ` Michal Hocko
2020-04-02 19:55       ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK Jan Kara
2020-04-02 16:35     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE Jan Kara
2020-04-03 15:15     ` Michal Hocko
2020-04-03 21:40       ` NeilBrown
2020-04-06  7:44         ` Michal Hocko
2020-04-06  9:36           ` Jan Kara
2020-04-06 10:57             ` Michal Hocko
2020-04-06 11:58             ` NeilBrown
2020-04-02  4:26   ` Hillf Danton
2020-04-02  4:57     ` NeilBrown
2020-04-06  3:58     ` Hillf Danton
2020-04-06 23:42   ` Writeback fixes for NFS - V2 NeilBrown
2020-04-06 23:42     ` NeilBrown
2020-04-06 23:43     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-06 23:43       ` NeilBrown
2020-04-07 16:10       ` Chuck Lever
2020-04-16  0:29     ` Writeback fixes for NFS - V3 NeilBrown
2020-04-16  0:30       ` [PATCH 1/2 V3] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-16  6:54         ` Christoph Hellwig
2020-04-16 15:19         ` Jan Kara
2020-04-21  2:22           ` NeilBrown
2020-04-22 12:46             ` Jan Kara
2020-05-13  7:16               ` NeilBrown
2020-05-13  7:17                 ` [PATCH 1/2 V4] " NeilBrown
2020-05-15 11:10                   ` Jan Kara [this message]
2020-06-01  0:46                     ` Writeback fixes for NFS NeilBrown
2020-06-01  0:48                       ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-06-01  0:49                       ` [PATCH 2/2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-05-13  7:18                 ` [PATCH 2/2 V4] " NeilBrown
2020-05-15  9:59                   ` Jan Kara
2020-04-16  0:31       ` [PATCH 2/2 V3] " NeilBrown
2020-04-16  6:56         ` Christoph Hellwig
2020-04-16 15:24         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515111043.GK9569@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.