All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: NeilBrown <neilb@suse.de>,
	Trond Myklebust <trondmy@hammerspace.com>,
	"Anna.Schumaker@Netapp.com" <Anna.Schumaker@netapp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
Date: Mon, 6 Apr 2020 11:36:01 +0200	[thread overview]
Message-ID: <20200406093601.GA1143@quack2.suse.cz> (raw)
In-Reply-To: <20200406074453.GH19426@dhcp22.suse.cz>

On Mon 06-04-20 09:44:53, Michal Hocko wrote:
> On Sat 04-04-20 08:40:17, Neil Brown wrote:
> > On Fri, Apr 03 2020, Michal Hocko wrote:
> > 
> > > On Thu 02-04-20 10:53:20, Neil Brown wrote:
> > >> 
> > >> PF_LESS_THROTTLE exists for loop-back nfsd, and a similar need in the
> > >> loop block driver, where a daemon needs to write to one bdi in
> > >> order to free up writes queued to another bdi.
> > >> 
> > >> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
> > >> pages, so that it can still dirty pages after other processses have been
> > >> throttled.
> > >> 
> > >> This approach was designed when all threads were blocked equally,
> > >> independently on which device they were writing to, or how fast it was.
> > >> Since that time the writeback algorithm has changed substantially with
> > >> different threads getting different allowances based on non-trivial
> > >> heuristics.  This means the simple "add 25%" heuristic is no longer
> > >> reliable.
> > >> 
> > >> This patch changes the heuristic to ignore the global limits and
> > >> consider only the limit relevant to the bdi being written to.  This
> > >> approach is already available for BDI_CAP_STRICTLIMIT users (fuse) and
> > >> should not introduce surprises.  This has the desired result of
> > >> protecting the task from the consequences of large amounts of dirty data
> > >> queued for other devices.
> > >
> > > While I understand that you want to have per bdi throttling for those
> > > "special" files I am still missing how this is going to provide the
> > > additional room that the additnal 25% gave them previously. I might
> > > misremember or things have changed (what you mention as non-trivial
> > > heuristics) but PF_LESS_THROTTLE really needed that room to guarantee a
> > > forward progress. Care to expan some more on how this is handled now?
> > > Maybe we do not need it anymore but calling that out explicitly would be
> > > really helpful.
> > 
> > The 25% was a means to an end, not an end in itself.
> > 
> > The problem is that the NFS server needs to be able to write to the
> > backing filesystem when the dirty memory limits have been reached by
> > being totally consumed by dirty pages on the NFS filesystem.
> > 
> > The 25% was just a way of giving an allowance of dirty pages to nfsd
> > that could not be consumed by processes writing to an NFS filesystem.
> > i.e. it doesn't need 25% MORE, it needs 25% PRIVATELY.  Actually it only
> > really needs 1 page privately, but a few pages give better throughput
> > and 25% seemed like a good idea at the time.
> 
> Yes this part is clear to me.
>  
> > per-bdi throttling focuses on the "PRIVATELY" (the important bit) and
> > de-emphasises the 25% (the irrelevant detail).
> 
> It is still not clear to me how this patch is going to behave when the
> global dirty throttling is essentially equal to the per-bdi - e.g. there
> is only a single bdi and now the PF_LOCAL_THROTTLE process doesn't have
> anything private.

Let me think out loud so see whether I understand this properly. There are
two BDIs involved in NFS loop mount - the NFS virtual BDI (let's call it
simply NFS-bdi) and the bdi of the real filesystem that is backing NFS
(let's call this real-bdi). The case we are concerned about is when NFS-bdi
is full of dirty pages so that global dirty limit of the machine is
exceeded. Then flusher thread will take dirty pages from NFS-bdi and send
them over localhost to nfsd. Nfsd, which has PF_LOCAL_THROTTLE set, will take
these pages and write them to real-bdi. Now because PF_LOCAL_THROTTLE is
set for nfsd, the fact that we are over global limit does not take effect
and nfsd is still able to write to real-bdi until dirty limit on real-bdi
is reached. So things should work as Neil writes AFAIU.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-04-06  9:36 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26  3:25 [PATCH/RFC] MM: fix writeback for NFS NeilBrown
2020-04-01 23:52 ` Writeback fixes " NeilBrown
2020-04-01 23:53   ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-01 23:54     ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK NeilBrown
2020-04-02 15:10       ` Christoph Hellwig
2020-04-02 22:35         ` [PATCH 2/2 - v2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-04-03  9:42           ` Jan Kara
2020-04-03 11:03             ` Michal Hocko
2020-04-06  0:14               ` NeilBrown
2020-04-06  7:41                 ` Michal Hocko
2020-04-06 23:28             ` NeilBrown
2020-04-07  7:33               ` Michal Hocko
2020-04-02 19:55       ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK Jan Kara
2020-04-02 16:35     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE Jan Kara
2020-04-03 15:15     ` Michal Hocko
2020-04-03 21:40       ` NeilBrown
2020-04-06  7:44         ` Michal Hocko
2020-04-06  9:36           ` Jan Kara [this message]
2020-04-06 10:57             ` Michal Hocko
2020-04-06 11:58             ` NeilBrown
2020-04-02  4:26   ` Hillf Danton
2020-04-02  4:57     ` NeilBrown
2020-04-06  3:58     ` Hillf Danton
2020-04-06 23:42   ` Writeback fixes for NFS - V2 NeilBrown
2020-04-06 23:42     ` NeilBrown
2020-04-06 23:43     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-06 23:43       ` NeilBrown
2020-04-07 16:10       ` Chuck Lever
2020-04-16  0:29     ` Writeback fixes for NFS - V3 NeilBrown
2020-04-16  0:30       ` [PATCH 1/2 V3] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-16  6:54         ` Christoph Hellwig
2020-04-16 15:19         ` Jan Kara
2020-04-21  2:22           ` NeilBrown
2020-04-22 12:46             ` Jan Kara
2020-05-13  7:16               ` NeilBrown
2020-05-13  7:17                 ` [PATCH 1/2 V4] " NeilBrown
2020-05-15 11:10                   ` Jan Kara
2020-06-01  0:46                     ` Writeback fixes for NFS NeilBrown
2020-06-01  0:48                       ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-06-01  0:49                       ` [PATCH 2/2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-05-13  7:18                 ` [PATCH 2/2 V4] " NeilBrown
2020-05-15  9:59                   ` Jan Kara
2020-04-16  0:31       ` [PATCH 2/2 V3] " NeilBrown
2020-04-16  6:56         ` Christoph Hellwig
2020-04-16 15:24         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200406093601.GA1143@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Anna.Schumaker@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.