From: Mel Gorman <mgorman@suse.de> To: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com>, Minchan Kim <minchan.kim@gmail.com>, Dave Jones <davej@redhat.com>, Jan Kara <jack@suse.cz>, Andy Isaacson <adi@hexapodia.org>, Johannes Weiner <jweiner@redhat.com>, Rik van Riel <riel@redhat.com>, Nai Xia <nai.xia@gmail.com>, Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU Date: Thu, 29 Dec 2011 16:59:51 +0000 [thread overview] Message-ID: <20111229165951.GA15729@suse.de> (raw) In-Reply-To: <alpine.LSU.2.00.1112231039030.17640@eggly.anvils> I was offline for several days for the holidays and I'm not back online properly until Jan 4th, hence the delay in responding. On Fri, Dec 23, 2011 at 11:08:19AM -0800, Hugh Dickins wrote: > Sorry, Mel, I've had to revert this patch (and its two little children) > from my 3.2.0-rc6-next-20111222 testing: you really do need a page flag > (or substitute) for your "immediate" lru. > Don't be sorry at all. I prefer that this was caught before merging to mainline and thanks for catching this. > How else can a del_page_from_lru[_list]() know whether to decrement > the count of the immediate or the inactive list? You are right, it cannot and because pages are removed from the LRU list in contexts such as invalidating a mapping, we cannot be sure whether a page is on the immediate LRU or inactive_file in all cases. It is further complicated by the fact that PageReclaim and PageReadhead use the same page flag. > page_lru() says to > decrement the count of the inactive list, so in due course that wraps > to a gigantic number, and then page reclaim livelocks trying to wring > pages out of an empty list. It's the memcg case I've been hitting, > but presumably the same happens with global counts. > I've verified that the accounting can break. I did not see it wrap negative because in my testing it was rare the problem occurred but it would happen eventually. I considered a few ways of fixing this. The obvious one is to add a new page flag but that is difficult to justify as the high-cpu-usage problem should only occur when there is a lot of writeback to slow storage which I believe is a rare case. It is not a suitable use for an extended page flag. The second was to keep these PageReclaim pages off the LRU but this leads to complications of its own. The third was to use a combination of flags to mark pages that are on the immediate LRU such as how PG_compound and PG_reclaim in combination mark tail pages. This would not be free of races and would eventually cause corruption. There is also the problem that we cannot atomically set multiple bits so setting the bits in contexts such as set_page_dirty() may be problematic. Andrew, as there is not an easy uncontroversial fix can you remove the following patches from mmotm please? mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch mm-isolate-pages-for-immediate-reclaim-on-their-own-lru-fix.patch mm-isolate-pages-for-immediate-reclaim-on-their-own-lru-fix-2.patch The impact is that users writing to slow stage may see higher CPU usage as the pages under writeback have to be skipped by scanning once the dirty pages move to the end of the LRU list. I'm assuming once they are removed from mmotm that they also get removed from linux-next. > There is another such accounting bug in -next, been there longer and > not so easy to hit: I'm fairly sure it will turn out to be memcg > misaccounting a THPage somewhere, I'll have a look around shortly. > > p.s. Immediate? Isn't that an odd name for a list of pages which are > not immediately freeable? Maybe Rik's launder/laundry name would be > better: pages which are currently being cleaned. That is potentially very misleading as not all pages being laundered are on that list. reclaim_writeback might be a better name. Thanks. -- Mel Gorman SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de> To: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com>, Minchan Kim <minchan.kim@gmail.com>, Dave Jones <davej@redhat.com>, Jan Kara <jack@suse.cz>, Andy Isaacson <adi@hexapodia.org>, Johannes Weiner <jweiner@redhat.com>, Rik van Riel <riel@redhat.com>, Nai Xia <nai.xia@gmail.com>, Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU Date: Thu, 29 Dec 2011 16:59:51 +0000 [thread overview] Message-ID: <20111229165951.GA15729@suse.de> (raw) In-Reply-To: <alpine.LSU.2.00.1112231039030.17640@eggly.anvils> I was offline for several days for the holidays and I'm not back online properly until Jan 4th, hence the delay in responding. On Fri, Dec 23, 2011 at 11:08:19AM -0800, Hugh Dickins wrote: > Sorry, Mel, I've had to revert this patch (and its two little children) > from my 3.2.0-rc6-next-20111222 testing: you really do need a page flag > (or substitute) for your "immediate" lru. > Don't be sorry at all. I prefer that this was caught before merging to mainline and thanks for catching this. > How else can a del_page_from_lru[_list]() know whether to decrement > the count of the immediate or the inactive list? You are right, it cannot and because pages are removed from the LRU list in contexts such as invalidating a mapping, we cannot be sure whether a page is on the immediate LRU or inactive_file in all cases. It is further complicated by the fact that PageReclaim and PageReadhead use the same page flag. > page_lru() says to > decrement the count of the inactive list, so in due course that wraps > to a gigantic number, and then page reclaim livelocks trying to wring > pages out of an empty list. It's the memcg case I've been hitting, > but presumably the same happens with global counts. > I've verified that the accounting can break. I did not see it wrap negative because in my testing it was rare the problem occurred but it would happen eventually. I considered a few ways of fixing this. The obvious one is to add a new page flag but that is difficult to justify as the high-cpu-usage problem should only occur when there is a lot of writeback to slow storage which I believe is a rare case. It is not a suitable use for an extended page flag. The second was to keep these PageReclaim pages off the LRU but this leads to complications of its own. The third was to use a combination of flags to mark pages that are on the immediate LRU such as how PG_compound and PG_reclaim in combination mark tail pages. This would not be free of races and would eventually cause corruption. There is also the problem that we cannot atomically set multiple bits so setting the bits in contexts such as set_page_dirty() may be problematic. Andrew, as there is not an easy uncontroversial fix can you remove the following patches from mmotm please? mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch mm-isolate-pages-for-immediate-reclaim-on-their-own-lru-fix.patch mm-isolate-pages-for-immediate-reclaim-on-their-own-lru-fix-2.patch The impact is that users writing to slow stage may see higher CPU usage as the pages under writeback have to be skipped by scanning once the dirty pages move to the end of the LRU list. I'm assuming once they are removed from mmotm that they also get removed from linux-next. > There is another such accounting bug in -next, been there longer and > not so easy to hit: I'm fairly sure it will turn out to be memcg > misaccounting a THPage somewhere, I'll have a look around shortly. > > p.s. Immediate? Isn't that an odd name for a list of pages which are > not immediately freeable? Maybe Rik's launder/laundry name would be > better: pages which are currently being cleaned. That is potentially very misleading as not all pages being laundered are on that list. reclaim_writeback might be a better name. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-12-29 17:00 UTC|newest] Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top 2011-12-14 15:41 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v6 Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-14 15:41 ` [PATCH 01/11] mm: compaction: Allow compaction to isolate dirty pages Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-14 15:41 ` [PATCH 02/11] mm: compaction: Use synchronous compaction for /proc/sys/vm/compact_memory Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-14 15:41 ` [PATCH 03/11] mm: vmscan: Check if we isolated a compound page during lumpy scan Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-15 23:21 ` Rik van Riel 2011-12-15 23:21 ` Rik van Riel 2011-12-14 15:41 ` [PATCH 04/11] mm: vmscan: Do not OOM if aborting reclaim to start compaction Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-15 23:36 ` Rik van Riel 2011-12-15 23:36 ` Rik van Riel 2011-12-14 15:41 ` [PATCH 05/11] mm: compaction: Determine if dirty pages can be migrated without blocking within ->migratepage Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 3:32 ` Rik van Riel 2011-12-16 3:32 ` Rik van Riel 2011-12-16 23:20 ` Andrew Morton 2011-12-16 23:20 ` Andrew Morton 2011-12-17 3:03 ` Nai Xia 2011-12-17 3:03 ` Nai Xia 2011-12-17 3:26 ` Andrew Morton 2011-12-17 3:26 ` Andrew Morton 2011-12-19 11:05 ` Mel Gorman 2011-12-19 11:05 ` Mel Gorman 2011-12-19 13:12 ` nai.xia 2011-12-19 13:12 ` nai.xia 2011-12-14 15:41 ` [PATCH 06/11] mm: compaction: make isolate_lru_page() filter-aware again Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 3:34 ` Rik van Riel 2011-12-16 3:34 ` Rik van Riel 2011-12-18 1:53 ` Minchan Kim 2011-12-18 1:53 ` Minchan Kim 2011-12-14 15:41 ` [PATCH 07/11] mm: page allocator: Do not call direct reclaim for THP allocations while compaction is deferred Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 4:10 ` Rik van Riel 2011-12-16 4:10 ` Rik van Riel 2011-12-14 15:41 ` [PATCH 08/11] mm: compaction: Introduce sync-light migration for use by compaction Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 4:31 ` Rik van Riel 2011-12-16 4:31 ` Rik van Riel 2011-12-18 2:05 ` Minchan Kim 2011-12-18 2:05 ` Minchan Kim 2011-12-19 11:45 ` Mel Gorman 2011-12-19 11:45 ` Mel Gorman 2011-12-20 7:18 ` Minchan Kim 2011-12-20 7:18 ` Minchan Kim 2012-01-13 21:25 ` Andrew Morton 2012-01-13 21:25 ` Andrew Morton 2012-01-16 11:33 ` Mel Gorman 2012-01-16 11:33 ` Mel Gorman 2011-12-14 15:41 ` [PATCH 09/11] mm: vmscan: When reclaiming for compaction, ensure there are sufficient free pages available Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 4:35 ` Rik van Riel 2011-12-16 4:35 ` Rik van Riel 2011-12-14 15:41 ` [PATCH 10/11] mm: vmscan: Check if reclaim should really abort even if compaction_ready() is true for one zone Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 4:38 ` Rik van Riel 2011-12-16 4:38 ` Rik van Riel 2011-12-16 11:29 ` Mel Gorman 2011-12-16 11:29 ` Mel Gorman 2011-12-14 15:41 ` [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU Mel Gorman 2011-12-14 15:41 ` Mel Gorman 2011-12-16 4:47 ` Rik van Riel 2011-12-16 4:47 ` Rik van Riel 2011-12-16 12:26 ` Mel Gorman 2011-12-16 12:26 ` Mel Gorman 2011-12-16 15:17 ` Johannes Weiner 2011-12-16 15:17 ` Johannes Weiner 2011-12-16 16:07 ` Mel Gorman 2011-12-16 16:07 ` Mel Gorman 2011-12-19 16:14 ` Johannes Weiner 2011-12-19 16:14 ` Johannes Weiner 2011-12-17 16:08 ` Minchan Kim 2011-12-17 16:08 ` Minchan Kim 2011-12-19 13:26 ` Mel Gorman 2011-12-19 13:26 ` Mel Gorman 2011-12-20 7:10 ` Minchan Kim 2011-12-20 7:10 ` Minchan Kim 2011-12-20 9:55 ` Mel Gorman 2011-12-20 9:55 ` Mel Gorman 2011-12-23 19:08 ` Hugh Dickins 2011-12-23 19:08 ` Hugh Dickins 2011-12-29 16:59 ` Mel Gorman [this message] 2011-12-29 16:59 ` Mel Gorman 2011-12-29 19:31 ` Rik van Riel 2011-12-29 19:31 ` Rik van Riel 2011-12-30 11:27 ` Mel Gorman 2011-12-30 11:27 ` Mel Gorman 2011-12-16 22:56 ` [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v6 Andrew Morton 2011-12-16 22:56 ` Andrew Morton 2011-12-19 14:40 ` Mel Gorman 2011-12-19 14:40 ` Mel Gorman 2011-12-16 23:37 ` Andrew Morton 2011-12-16 23:37 ` Andrew Morton 2011-12-19 14:20 ` Mel Gorman 2011-12-19 14:20 ` Mel Gorman -- strict thread matches above, loose matches on Subject: below -- 2011-12-01 17:36 [PATCH 0/11] Reduce compaction-related stalls and improve asynchronous migration of dirty pages v5 Mel Gorman 2011-12-01 17:36 ` [PATCH 11/11] mm: Isolate pages for immediate reclaim on their own LRU Mel Gorman 2011-12-01 17:36 ` Mel Gorman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20111229165951.GA15729@suse.de \ --to=mgorman@suse.de \ --cc=aarcange@redhat.com \ --cc=adi@hexapodia.org \ --cc=akpm@linux-foundation.org \ --cc=davej@redhat.com \ --cc=hughd@google.com \ --cc=jack@suse.cz \ --cc=jweiner@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=minchan.kim@gmail.com \ --cc=minchan@kernel.org \ --cc=nai.xia@gmail.com \ --cc=riel@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.