From: Mel Gorman <email@example.com> To: Matthew Wilcox <firstname.lastname@example.org> Cc: Linux-MM <email@example.com>, NeilBrown <firstname.lastname@example.org>, Theodore Ts'o <email@example.com>, Andreas Dilger <firstname.lastname@example.org>, "Darrick J . Wong" <email@example.com>, Michal Hocko <firstname.lastname@example.org>, Dave Chinner <email@example.com>, Rik van Riel <firstname.lastname@example.org>, Vlastimil Babka <email@example.com>, Johannes Weiner <firstname.lastname@example.org>, Jonathan Corbet <email@example.com>, Linux-fsdevel <firstname.lastname@example.org>, LKML <email@example.com> Subject: Re: [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/ Date: Mon, 20 Sep 2021 20:51:09 +0100 [thread overview] Message-ID: <20210920195109.GJ3959@techsingularity.net> (raw) In-Reply-To: <YUhztA8TmplTluyQ@casper.infradead.org> On Mon, Sep 20, 2021 at 12:42:44PM +0100, Matthew Wilcox wrote: > On Mon, Sep 20, 2021 at 09:54:31AM +0100, Mel Gorman wrote: > > This has been lightly tested only and the testing was useless as the > > relevant code was not executed. The workload configurations I had that > > used to trigger these corner cases no longer work (yey?) and I'll need > > to implement a new synthetic workload. If someone is aware of a realistic > > workload that forces reclaim activity to the point where reclaim stalls > > then kindly share the details. > > The stereeotypical "stalling on I/O" problem is to plug in one of the > crap USB drives you were given at a trade show and simply > dd if=/dev/zero of=/dev/sdb > sync > > You can also set up qemu to have extremely slow I/O performance: > https://serverfault.com/questions/675704/extremely-slow-qemu-storage-performance-with-qcow2-images > Ok, I managed to get something working and nothing blew up. The workload was similar to what I described except the dirty file data is related to dirty_ratio, the memory hogs no longer sleep and I disabled the parallel readers. There is still a configuration with the parallel readers but I won't have the results till tomorrow. Surprising no one, vanilla kernel throttling barely works. 1 writeback_wait_iff_congested: usec_delayed=4000 3 writeback_congestion_wait: usec_delayed=108000 196 writeback_congestion_wait: usec_delayed=104000 16697 writeback_wait_iff_congested: usec_delayed=0 too_many_isolated it not tracked at all so we don't know what that looks like but kswapd "blocking" on dirty pages at the tail basically never stalls. The few congestion_wait's that did happen stalled for the full duration as the bdi is not tracking congestion at all. With the series, the breakdown of reasons to stall were 5703 reason=VMSCAN_THROTTLE_WRITEBACK 29644 reason=VMSCAN_THROTTLE_NOPROGRESS 1979999 reason=VMSCAN_THROTTLE_ISOLATED kswapd stalls were rare but they did happen and surprise surprise, it was dirty pages 914 reason=VMSCAN_THROTTLE_WRITEBACK All of them stalled for the full timeout so there might be a bug in patch 1 because that sounds suspicious. As "too many pages isolated" was the top reason, the frequency of each stall time is as follows 1 usect_delayed=164000 1 usect_delayed=192000 1 usect_delayed=200000 1 usect_delayed=208000 1 usect_delayed=220000 1 usect_delayed=244000 1 usect_delayed=308000 1 usect_delayed=312000 1 usect_delayed=316000 1 usect_delayed=332000 1 usect_delayed=588000 1 usect_delayed=620000 1 usect_delayed=836000 3 usect_delayed=116000 4 usect_delayed=124000 4 usect_delayed=128000 6 usect_delayed=120000 9 usect_delayed=112000 11 usect_delayed=100000 13 usect_delayed=48000 13 usect_delayed=96000 14 usect_delayed=40000 15 usect_delayed=88000 15 usect_delayed=92000 16 usect_delayed=80000 18 usect_delayed=68000 19 usect_delayed=76000 22 usect_delayed=84000 23 usect_delayed=108000 23 usect_delayed=60000 25 usect_delayed=44000 25 usect_delayed=52000 29 usect_delayed=36000 30 usect_delayed=56000 30 usect_delayed=64000 33 usect_delayed=72000 57 usect_delayed=32000 91 usect_delayed=20000 107 usect_delayed=24000 125 usect_delayed=28000 131 usect_delayed=16000 180 usect_delayed=12000 186 usect_delayed=8000 1379 usect_delayed=104000 16493 usect_delayed=4000 1960837 usect_delayed=0 In other words, the vast majority of stalls were for 0 time and the task was immediately woken again. The next most common stall time was 1 tick but a sizable number reach the full timeout. Everything else is somewhere in between so the event trigger appears to be ok. I don't know how the application itself performed as I still have to write the analysis script and assuming I can look at this tomorrow, I'll probably start with why VMSCAN_THROTTLE_WRITEBACK always stalled for the full timeout. -- Mel Gorman SUSE Labs
next prev parent reply other threads:[~2021-09-20 19:54 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-20 8:54 Mel Gorman 2021-09-20 8:54 ` [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman 2021-09-20 23:19 ` NeilBrown 2021-09-21 11:12 ` Mel Gorman 2021-09-21 21:27 ` NeilBrown 2021-09-21 0:13 ` NeilBrown 2021-09-21 10:58 ` Mel Gorman 2021-09-21 21:40 ` NeilBrown 2021-09-22 6:04 ` Dave Chinner 2021-09-22 8:03 ` Mel Gorman 2021-09-22 12:16 ` Hillf Danton 2021-09-22 14:13 ` Mel Gorman 2021-09-20 8:54 ` [PATCH 2/5] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman 2021-09-20 23:27 ` NeilBrown 2021-09-21 11:03 ` Mel Gorman 2021-09-21 18:45 ` Yang Shi 2021-09-21 18:45 ` Yang Shi 2021-09-22 8:11 ` Mel Gorman 2021-09-20 8:54 ` [PATCH 3/5] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman 2021-09-20 23:31 ` NeilBrown 2021-09-21 11:16 ` Mel Gorman 2021-09-21 21:46 ` NeilBrown 2021-09-22 9:21 ` Mel Gorman 2021-09-20 8:54 ` [PATCH 4/5] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman 2021-09-20 8:54 ` [PATCH 5/5] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman 2021-09-20 11:42 ` [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/ Matthew Wilcox 2021-09-20 12:50 ` Mel Gorman 2021-09-20 14:11 ` David Sterba 2021-09-21 11:18 ` Mel Gorman 2021-09-20 19:51 ` Mel Gorman [this message] 2021-09-21 20:46 ` Dave Chinner 2021-09-22 17:52 ` Mel Gorman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210920195109.GJ3959@techsingularity.net \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.