linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: NeilBrown <neilb@suse.de>,
	Trond Myklebust <trondmy@hammerspace.com>,
	"Anna.Schumaker@Netapp.com" <Anna.Schumaker@netapp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
Date: Mon, 6 Apr 2020 11:36:01 +0200	[thread overview]
Message-ID: <20200406093601.GA1143@quack2.suse.cz> (raw)
In-Reply-To: <20200406074453.GH19426@dhcp22.suse.cz>

On Mon 06-04-20 09:44:53, Michal Hocko wrote:
> On Sat 04-04-20 08:40:17, Neil Brown wrote:
> > On Fri, Apr 03 2020, Michal Hocko wrote:
> > 
> > > On Thu 02-04-20 10:53:20, Neil Brown wrote:
> > >> 
> > >> PF_LESS_THROTTLE exists for loop-back nfsd, and a similar need in the
> > >> loop block driver, where a daemon needs to write to one bdi in
> > >> order to free up writes queued to another bdi.
> > >> 
> > >> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
> > >> pages, so that it can still dirty pages after other processses have been
> > >> throttled.
> > >> 
> > >> This approach was designed when all threads were blocked equally,
> > >> independently on which device they were writing to, or how fast it was.
> > >> Since that time the writeback algorithm has changed substantially with
> > >> different threads getting different allowances based on non-trivial
> > >> heuristics.  This means the simple "add 25%" heuristic is no longer
> > >> reliable.
> > >> 
> > >> This patch changes the heuristic to ignore the global limits and
> > >> consider only the limit relevant to the bdi being written to.  This
> > >> approach is already available for BDI_CAP_STRICTLIMIT users (fuse) and
> > >> should not introduce surprises.  This has the desired result of
> > >> protecting the task from the consequences of large amounts of dirty data
> > >> queued for other devices.
> > >
> > > While I understand that you want to have per bdi throttling for those
> > > "special" files I am still missing how this is going to provide the
> > > additional room that the additnal 25% gave them previously. I might
> > > misremember or things have changed (what you mention as non-trivial
> > > heuristics) but PF_LESS_THROTTLE really needed that room to guarantee a
> > > forward progress. Care to expan some more on how this is handled now?
> > > Maybe we do not need it anymore but calling that out explicitly would be
> > > really helpful.
> > 
> > The 25% was a means to an end, not an end in itself.
> > 
> > The problem is that the NFS server needs to be able to write to the
> > backing filesystem when the dirty memory limits have been reached by
> > being totally consumed by dirty pages on the NFS filesystem.
> > 
> > The 25% was just a way of giving an allowance of dirty pages to nfsd
> > that could not be consumed by processes writing to an NFS filesystem.
> > i.e. it doesn't need 25% MORE, it needs 25% PRIVATELY.  Actually it only
> > really needs 1 page privately, but a few pages give better throughput
> > and 25% seemed like a good idea at the time.
> 
> Yes this part is clear to me.
>  
> > per-bdi throttling focuses on the "PRIVATELY" (the important bit) and
> > de-emphasises the 25% (the irrelevant detail).
> 
> It is still not clear to me how this patch is going to behave when the
> global dirty throttling is essentially equal to the per-bdi - e.g. there
> is only a single bdi and now the PF_LOCAL_THROTTLE process doesn't have
> anything private.

Let me think out loud so see whether I understand this properly. There are
two BDIs involved in NFS loop mount - the NFS virtual BDI (let's call it
simply NFS-bdi) and the bdi of the real filesystem that is backing NFS
(let's call this real-bdi). The case we are concerned about is when NFS-bdi
is full of dirty pages so that global dirty limit of the machine is
exceeded. Then flusher thread will take dirty pages from NFS-bdi and send
them over localhost to nfsd. Nfsd, which has PF_LOCAL_THROTTLE set, will take
these pages and write them to real-bdi. Now because PF_LOCAL_THROTTLE is
set for nfsd, the fact that we are over global limit does not take effect
and nfsd is still able to write to real-bdi until dirty limit on real-bdi
is reached. So things should work as Neil writes AFAIU.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


  reply	other threads:[~2020-04-06  9:36 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26  3:25 [PATCH/RFC] MM: fix writeback for NFS NeilBrown
2020-04-01 23:52 ` Writeback fixes " NeilBrown
2020-04-01 23:53   ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-01 23:54     ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK NeilBrown
2020-04-02 15:10       ` Christoph Hellwig
2020-04-02 22:35         ` [PATCH 2/2 - v2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-04-03  9:42           ` Jan Kara
2020-04-03 11:03             ` Michal Hocko
2020-04-06  0:14               ` NeilBrown
2020-04-06  7:41                 ` Michal Hocko
2020-04-06 23:28             ` NeilBrown
2020-04-07  7:33               ` Michal Hocko
2020-04-02 19:55       ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK Jan Kara
2020-04-02 16:35     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE Jan Kara
2020-04-03 15:15     ` Michal Hocko
2020-04-03 21:40       ` NeilBrown
2020-04-06  7:44         ` Michal Hocko
2020-04-06  9:36           ` Jan Kara [this message]
2020-04-06 10:57             ` Michal Hocko
2020-04-06 11:58             ` NeilBrown
2020-04-02  4:26   ` Hillf Danton
2020-04-02  4:57     ` NeilBrown
2020-04-06  3:58     ` Hillf Danton
2020-04-06 23:42   ` Writeback fixes for NFS - V2 NeilBrown
2020-04-06 23:43     ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-07 16:10       ` Chuck Lever
2020-04-16  0:29     ` Writeback fixes for NFS - V3 NeilBrown
2020-04-16  0:30       ` [PATCH 1/2 V3] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-16  6:54         ` Christoph Hellwig
2020-04-16 15:19         ` Jan Kara
2020-04-21  2:22           ` NeilBrown
2020-04-22 12:46             ` Jan Kara
2020-05-13  7:16               ` NeilBrown
2020-05-13  7:17                 ` [PATCH 1/2 V4] " NeilBrown
2020-05-15 11:10                   ` Jan Kara
2020-06-01  0:46                     ` Writeback fixes for NFS NeilBrown
2020-06-01  0:48                       ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-06-01  0:49                       ` [PATCH 2/2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-05-13  7:18                 ` [PATCH 2/2 V4] " NeilBrown
2020-05-15  9:59                   ` Jan Kara
2020-04-16  0:31       ` [PATCH 2/2 V3] " NeilBrown
2020-04-16  6:56         ` Christoph Hellwig
2020-04-16 15:24         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200406093601.GA1143@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Anna.Schumaker@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=neilb@suse.de \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).