linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Congestion
Date: Thu, 13 May 2021 15:44:09 +0800	[thread overview]
Message-ID: <20210513074409.3181-1-hdanton@sina.com> (raw)
In-Reply-To: <20200106232100.GL23195@dread.disaster.area>

On Tue, 7 Jan 2020 10:21:00 Dave Chinner wrote:
>On Mon, Jan 06, 2020 at 12:55:14PM +0100, Michal Hocko wrote:
>> On Tue 31-12-19 04:59:08, Matthew Wilcox wrote:
>> > 
>> > I don't want to present this topic; I merely noticed the problem.
>> > I nominate Jens Axboe and Michael Hocko as session leaders.  See the
>> > thread here:
>> 
>> Thanks for bringing this up Matthew! The change in the behavior came as
>> a surprise to me. I can lead the session for the MM side.
>> 
>> > https://lore.kernel.org/linux-mm/20190923111900.GH15392@bombadil.infradead.org/
>> > 
>> > Summary: Congestion is broken and has been for years, and everybody's
>> > system is sleeping waiting for congestion that will never clear.
>> > 
>> > A good outcome for this meeting would be:
>> > 
>> >  - MM defines what information they want from the block stack.
>> 
>> The history of the congestion waiting is kinda hairy but I will try to
>> summarize expectations we used to have and we can discuss how much of
>> that has been real and what followed up as a cargo cult. Maybe we just
>> find out that we do not need functionality like that anymore. I believe
>> Mel would be a great contributor to the discussion.
>
>We most definitely do need some form of reclaim throttling based on
>IO congestion, because it is trivial to drive the system into swap
>storms and OOM killer invocation when there are large dirty slab
>caches that require IO to make reclaim progress and there's little
>in the way of page cache to reclaim.
>
>This is one of the biggest issues I've come across trying to make
>XFS inode reclaim non-blocking - the existing code blocks on inode
>writeback IO congestion to throttle the overall reclaim rate and
>so prevents swap storms and OOM killer rampages from occurring.
>
>The moment I remove the inode writeback blocking from the reclaim
>path and move the backoffs to the core reclaim congestion backoff
>algorithms, I see a sustantial increase in the typical reclaim scan
>priority. This is because the reclaim code does not have an
>integrated back-off mechanism that can balance reclaim throttling
>between slab cache and page cache reclaim. This results in
>insufficient page reclaim backoff under slab cache backoff
>conditions, leading to excessive page cache reclaim and swapping out
>all the anonymous pages in memory. Then performance goes to hell as
>userspace then starts to block on page faults swap thrashing like
>this:
>
>page_fault
>  swap_in
>    alloc page
>      direct reclaim
>        swap out anon page
>	  submit_bio
>	    wbt_throttle
>
>
>IOWs, page reclaim doesn't back off until userspace gets throttled
>in the block layer doing swap out during swap in during page
>faults. For these sorts of workloads there should be little to no
>swap thrashing occurring - throttling reclaim to the rate at which
>inodes are cleaned by async IO dispatcher threads is what is needed
>here, not continuing to wind up reclaim priority  until swap storms
>and the oom killer end up killng the machine...
>
>I also see this when the inode cache load is on a separate device to
>the swap partition - both devices end up at 100% utilisation, one
>doing inode writeback flat out (about 300,000 inodes/sec from an
>inode cache of 5-10 million inodes), the other is swap thrashing
>from a page cache of only 250-500 pages in size.

Is there a watermark of clean inodes in the inode cache, say 3% of the
cache size? A laundry thread kicks off once clean inodes drop below it,
better independent of dirty page writeback and kswapd, to ease direct
reclaimers.

Hillf
>
>Hence the way congestion was historically dealt with as a "global
>condition" still needs to exist in some manner - congestion on a
>single device is sufficient to cause the high level reclaim
>algroithms to misbehave badly...
>
>Hence it seems to me that having IO load feedback to the memory
>reclaim algorithms is most definitely required for memory reclaim to
>be able to make the correct decisions about what to reclaim. If the
>shrinker for the cache that uses 50% of RAM in the machine is saying
>"backoff needed" and it's underlying device is
>congested and limiting object reclaim rates, then it's a pretty good
>indication that reclaim should back off and wait for IO progress to
>be made instead of trying to reclaim from other LRUs that hold an
>insignificant amount of memory compared to the huge cache that is
>backed up waiting on IO completion to make progress....
>
>Cheers,
>
>Dave.
>-- 
>Dave Chinner
>david@fromorbit.com


      parent reply	other threads:[~2021-05-13  7:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-31 12:59 [LSF/MM TOPIC] Congestion Matthew Wilcox
2020-01-04  9:09 ` Dave Chinner
2020-01-06 11:55 ` [Lsf-pc] " Michal Hocko
2020-01-06 23:21   ` Dave Chinner
2020-01-07  8:23     ` Chris Murphy
2020-01-07 11:53       ` Michal Hocko
2020-01-07 20:12         ` Chris Murphy
2020-01-07 11:53     ` Michal Hocko
2020-01-09 11:07     ` Jan Kara
2020-01-09 23:00       ` Dave Chinner
2020-02-05 16:05         ` Mel Gorman
2020-02-06 23:19           ` Dave Chinner
2020-02-07  0:08             ` Matthew Wilcox
2020-02-13  3:18               ` Andrew Morton
2021-05-13  7:44     ` Hillf Danton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210513074409.3181-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).