From: Jan Kara <jack@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: NeilBrown <neilb@suse.de>,
Trond Myklebust <trondmy@hammerspace.com>,
"Anna.Schumaker@Netapp.com" <Anna.Schumaker@netapp.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, linux-nfs@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
Date: Mon, 6 Apr 2020 11:36:01 +0200 [thread overview]
Message-ID: <20200406093601.GA1143@quack2.suse.cz> (raw)
In-Reply-To: <20200406074453.GH19426@dhcp22.suse.cz>
On Mon 06-04-20 09:44:53, Michal Hocko wrote:
> On Sat 04-04-20 08:40:17, Neil Brown wrote:
> > On Fri, Apr 03 2020, Michal Hocko wrote:
> >
> > > On Thu 02-04-20 10:53:20, Neil Brown wrote:
> > >>
> > >> PF_LESS_THROTTLE exists for loop-back nfsd, and a similar need in the
> > >> loop block driver, where a daemon needs to write to one bdi in
> > >> order to free up writes queued to another bdi.
> > >>
> > >> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
> > >> pages, so that it can still dirty pages after other processses have been
> > >> throttled.
> > >>
> > >> This approach was designed when all threads were blocked equally,
> > >> independently on which device they were writing to, or how fast it was.
> > >> Since that time the writeback algorithm has changed substantially with
> > >> different threads getting different allowances based on non-trivial
> > >> heuristics. This means the simple "add 25%" heuristic is no longer
> > >> reliable.
> > >>
> > >> This patch changes the heuristic to ignore the global limits and
> > >> consider only the limit relevant to the bdi being written to. This
> > >> approach is already available for BDI_CAP_STRICTLIMIT users (fuse) and
> > >> should not introduce surprises. This has the desired result of
> > >> protecting the task from the consequences of large amounts of dirty data
> > >> queued for other devices.
> > >
> > > While I understand that you want to have per bdi throttling for those
> > > "special" files I am still missing how this is going to provide the
> > > additional room that the additnal 25% gave them previously. I might
> > > misremember or things have changed (what you mention as non-trivial
> > > heuristics) but PF_LESS_THROTTLE really needed that room to guarantee a
> > > forward progress. Care to expan some more on how this is handled now?
> > > Maybe we do not need it anymore but calling that out explicitly would be
> > > really helpful.
> >
> > The 25% was a means to an end, not an end in itself.
> >
> > The problem is that the NFS server needs to be able to write to the
> > backing filesystem when the dirty memory limits have been reached by
> > being totally consumed by dirty pages on the NFS filesystem.
> >
> > The 25% was just a way of giving an allowance of dirty pages to nfsd
> > that could not be consumed by processes writing to an NFS filesystem.
> > i.e. it doesn't need 25% MORE, it needs 25% PRIVATELY. Actually it only
> > really needs 1 page privately, but a few pages give better throughput
> > and 25% seemed like a good idea at the time.
>
> Yes this part is clear to me.
>
> > per-bdi throttling focuses on the "PRIVATELY" (the important bit) and
> > de-emphasises the 25% (the irrelevant detail).
>
> It is still not clear to me how this patch is going to behave when the
> global dirty throttling is essentially equal to the per-bdi - e.g. there
> is only a single bdi and now the PF_LOCAL_THROTTLE process doesn't have
> anything private.
Let me think out loud so see whether I understand this properly. There are
two BDIs involved in NFS loop mount - the NFS virtual BDI (let's call it
simply NFS-bdi) and the bdi of the real filesystem that is backing NFS
(let's call this real-bdi). The case we are concerned about is when NFS-bdi
is full of dirty pages so that global dirty limit of the machine is
exceeded. Then flusher thread will take dirty pages from NFS-bdi and send
them over localhost to nfsd. Nfsd, which has PF_LOCAL_THROTTLE set, will take
these pages and write them to real-bdi. Now because PF_LOCAL_THROTTLE is
set for nfsd, the fact that we are over global limit does not take effect
and nfsd is still able to write to real-bdi until dirty limit on real-bdi
is reached. So things should work as Neil writes AFAIU.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2020-04-06 9:36 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-26 3:25 [PATCH/RFC] MM: fix writeback for NFS NeilBrown
2020-04-01 23:52 ` Writeback fixes " NeilBrown
2020-04-01 23:53 ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-01 23:54 ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK NeilBrown
2020-04-02 15:10 ` Christoph Hellwig
2020-04-02 22:35 ` [PATCH 2/2 - v2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-04-03 9:42 ` Jan Kara
2020-04-03 11:03 ` Michal Hocko
2020-04-06 0:14 ` NeilBrown
2020-04-06 7:41 ` Michal Hocko
2020-04-06 23:28 ` NeilBrown
2020-04-07 7:33 ` Michal Hocko
2020-04-02 19:55 ` [PATCH 2/2] Deprecate NR_UNSTABLE_NFS, use NR_WRITEBACK Jan Kara
2020-04-02 16:35 ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE Jan Kara
2020-04-03 15:15 ` Michal Hocko
2020-04-03 21:40 ` NeilBrown
2020-04-06 7:44 ` Michal Hocko
2020-04-06 9:36 ` Jan Kara [this message]
2020-04-06 10:57 ` Michal Hocko
2020-04-06 11:58 ` NeilBrown
2020-04-02 4:26 ` Hillf Danton
2020-04-02 4:57 ` NeilBrown
2020-04-06 3:58 ` Hillf Danton
2020-04-06 23:42 ` Writeback fixes for NFS - V2 NeilBrown
2020-04-06 23:43 ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-07 16:10 ` Chuck Lever
2020-04-16 0:29 ` Writeback fixes for NFS - V3 NeilBrown
2020-04-16 0:30 ` [PATCH 1/2 V3] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-04-16 6:54 ` Christoph Hellwig
2020-04-16 15:19 ` Jan Kara
2020-04-21 2:22 ` NeilBrown
2020-04-22 12:46 ` Jan Kara
2020-05-13 7:16 ` NeilBrown
2020-05-13 7:17 ` [PATCH 1/2 V4] " NeilBrown
2020-05-15 11:10 ` Jan Kara
2020-06-01 0:46 ` Writeback fixes for NFS NeilBrown
2020-06-01 0:48 ` [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE NeilBrown
2020-06-01 0:49 ` [PATCH 2/2] MM: Discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead NeilBrown
2020-05-13 7:18 ` [PATCH 2/2 V4] " NeilBrown
2020-05-15 9:59 ` Jan Kara
2020-04-16 0:31 ` [PATCH 2/2 V3] " NeilBrown
2020-04-16 6:56 ` Christoph Hellwig
2020-04-16 15:24 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200406093601.GA1143@quack2.suse.cz \
--to=jack@suse.cz \
--cc=Anna.Schumaker@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).