From: Vlastimil Babka <vbabka@suse.cz> To: Michal Hocko <mhocko@kernel.org>, Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Reza Arbab <arbab@linux.vnet.ibm.com>, Yasuaki Ishimatsu <yasu.isimatu@gmail.com>, qiuxishi@huawei.com, Igor Mammedov <imammedo@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Date: Wed, 11 Oct 2017 10:04:39 +0200 [thread overview] Message-ID: <d29b6788-da1b-23e9-090c-d43428deb97d@suse.cz> (raw) In-Reply-To: <20171011065123.e7jvoftmtso3vcha@dhcp22.suse.cz> On 10/11/2017 08:51 AM, Michal Hocko wrote: > On Wed 11-10-17 13:37:50, Michael Ellerman wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote: >>>> Michal Hocko <mhocko@kernel.org> writes: >>>> >>>>> From: Michal Hocko <mhocko@suse.com> >>>>> >>>>> Memory offlining can fail just too eagerly under a heavy memory pressure. >>>>> >>>>> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3 >>>>> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk) >>>>> [ 5410.336811] page dumped because: isolation failed >>>>> [ 5410.336813] page->mem_cgroup:ffff8801cd662000 >>>>> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed >>>>> >>>>> Isolation has failed here because the page is not on LRU. Most probably >>>>> because it was on the pcp LRU cache or it has been removed from the LRU >>>>> already but it hasn't been freed yet. In both cases the page doesn't look >>>>> non-migrable so retrying more makes sense. >>>> >>>> This breaks offline for me. >>>> >>>> Prior to this commit: >>>> /sys/devices/system/memory/memory0# time echo 0 > online >>>> -bash: echo: write error: Device or resource busy Well, that means offline didn't actually work for that block even before this patch, right? Is it even a movable_node block? I guess not? >>>> real 0m0.001s >>>> user 0m0.000s >>>> sys 0m0.001s >>>> >>>> After: >>>> /sys/devices/system/memory/memory0# time echo 0 > online >>>> -bash: echo: write error: Device or resource busy >>>> >>>> real 2m0.009s >>>> user 0m0.000s >>>> sys 1m25.035s >>>> >>>> >>>> There's no way that block can be removed, it contains the kernel text, >>>> so it should instantly fail - which it used to. Ah, right. So your complain is really about that the failure is not instant anymore for blocks that can't be offlined. >>> OK, that means that start_isolate_page_range should have failed but it >>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing >>> something wrong. Is the kernel text marked somehow? E.g. PageReserved? >> >> I'm not sure how the text is marked, will have to dig into that. >> >>> In other words, does the diff below helps? >> >> No that doesn't help. > > This is really strange! As you write in other email the page is > reserved. That means that some of the earlier checks > if (zone_idx(zone) == ZONE_MOVABLE) > return false; > mt = get_pageblock_migratetype(page); > if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt)) The MIGRATE_MOVABLE check is indeed bogus, because that doesn't guarantee there are no unmovable pages in the block (CMA block OTOH should be a guarantee). > return false; > has bailed out early. I would be quite surprised if the kernel text was > sitting in the zone movable. The migrate type check is more fishy > AFAICS. I can imagine that the kernel text can share the movable or CMA > mt block. I am not really familiar with this function but it looks > suspicious. So does it help to remove this check? > --- > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3badcedf96a7..5b4d85ae445c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > */ > if (zone_idx(zone) == ZONE_MOVABLE) > return false; > - mt = get_pageblock_migratetype(page); > - if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt)) > - return false; > > pfn = page_to_pfn(page); > for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { > @@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > > page = pfn_to_page(check); > > + if (PageReserved(page)) > + return true; > + > /* > * Hugepages are not in LRU lists, but they're movable. > * We need not scan over tail pages bacause we don't >
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz> To: Michal Hocko <mhocko@kernel.org>, Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Reza Arbab <arbab@linux.vnet.ibm.com>, Yasuaki Ishimatsu <yasu.isimatu@gmail.com>, qiuxishi@huawei.com, Igor Mammedov <imammedo@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Date: Wed, 11 Oct 2017 10:04:39 +0200 [thread overview] Message-ID: <d29b6788-da1b-23e9-090c-d43428deb97d@suse.cz> (raw) In-Reply-To: <20171011065123.e7jvoftmtso3vcha@dhcp22.suse.cz> On 10/11/2017 08:51 AM, Michal Hocko wrote: > On Wed 11-10-17 13:37:50, Michael Ellerman wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote: >>>> Michal Hocko <mhocko@kernel.org> writes: >>>> >>>>> From: Michal Hocko <mhocko@suse.com> >>>>> >>>>> Memory offlining can fail just too eagerly under a heavy memory pressure. >>>>> >>>>> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3 >>>>> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk) >>>>> [ 5410.336811] page dumped because: isolation failed >>>>> [ 5410.336813] page->mem_cgroup:ffff8801cd662000 >>>>> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed >>>>> >>>>> Isolation has failed here because the page is not on LRU. Most probably >>>>> because it was on the pcp LRU cache or it has been removed from the LRU >>>>> already but it hasn't been freed yet. In both cases the page doesn't look >>>>> non-migrable so retrying more makes sense. >>>> >>>> This breaks offline for me. >>>> >>>> Prior to this commit: >>>> /sys/devices/system/memory/memory0# time echo 0 > online >>>> -bash: echo: write error: Device or resource busy Well, that means offline didn't actually work for that block even before this patch, right? Is it even a movable_node block? I guess not? >>>> real 0m0.001s >>>> user 0m0.000s >>>> sys 0m0.001s >>>> >>>> After: >>>> /sys/devices/system/memory/memory0# time echo 0 > online >>>> -bash: echo: write error: Device or resource busy >>>> >>>> real 2m0.009s >>>> user 0m0.000s >>>> sys 1m25.035s >>>> >>>> >>>> There's no way that block can be removed, it contains the kernel text, >>>> so it should instantly fail - which it used to. Ah, right. So your complain is really about that the failure is not instant anymore for blocks that can't be offlined. >>> OK, that means that start_isolate_page_range should have failed but it >>> hasn't for some reason. I strongly suspect has_unmovable_pages is doing >>> something wrong. Is the kernel text marked somehow? E.g. PageReserved? >> >> I'm not sure how the text is marked, will have to dig into that. >> >>> In other words, does the diff below helps? >> >> No that doesn't help. > > This is really strange! As you write in other email the page is > reserved. That means that some of the earlier checks > if (zone_idx(zone) == ZONE_MOVABLE) > return false; > mt = get_pageblock_migratetype(page); > if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt)) The MIGRATE_MOVABLE check is indeed bogus, because that doesn't guarantee there are no unmovable pages in the block (CMA block OTOH should be a guarantee). > return false; > has bailed out early. I would be quite surprised if the kernel text was > sitting in the zone movable. The migrate type check is more fishy > AFAICS. I can imagine that the kernel text can share the movable or CMA > mt block. I am not really familiar with this function but it looks > suspicious. So does it help to remove this check? > --- > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3badcedf96a7..5b4d85ae445c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > */ > if (zone_idx(zone) == ZONE_MOVABLE) > return false; > - mt = get_pageblock_migratetype(page); > - if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt)) > - return false; > > pfn = page_to_pfn(page); > for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { > @@ -7368,6 +7365,9 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > > page = pfn_to_page(check); > > + if (PageReserved(page)) > + return true; > + > /* > * Hugepages are not in LRU lists, but they're movable. > * We need not scan over tail pages bacause we don't > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-11 8:06 UTC|newest] Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-09-18 7:08 [PATCH v2 0/2] mm, memory_hotplug: redefine memory offline retry logic Michal Hocko 2017-09-18 7:08 ` Michal Hocko 2017-09-18 7:08 ` [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Michal Hocko 2017-09-18 7:08 ` Michal Hocko 2017-10-10 12:05 ` Michael Ellerman 2017-10-10 12:05 ` Michael Ellerman 2017-10-10 12:27 ` Michal Hocko 2017-10-10 12:27 ` Michal Hocko 2017-10-11 2:37 ` Michael Ellerman 2017-10-11 2:37 ` Michael Ellerman 2017-10-11 5:19 ` Michael Ellerman 2017-10-11 5:19 ` Michael Ellerman 2017-10-11 14:05 ` Anshuman Khandual 2017-10-11 14:05 ` Anshuman Khandual 2017-10-11 14:16 ` Michal Hocko 2017-10-11 14:16 ` Michal Hocko 2017-10-11 6:51 ` Michal Hocko 2017-10-11 6:51 ` Michal Hocko 2017-10-11 8:04 ` Vlastimil Babka [this message] 2017-10-11 8:04 ` Vlastimil Babka 2017-10-11 8:13 ` Michal Hocko 2017-10-11 8:13 ` Michal Hocko 2017-10-11 11:17 ` Vlastimil Babka 2017-10-11 11:17 ` Vlastimil Babka 2017-10-11 11:24 ` Michal Hocko 2017-10-11 11:24 ` Michal Hocko 2017-10-13 11:42 ` Michael Ellerman 2017-10-13 11:42 ` Michael Ellerman 2017-10-13 11:58 ` Michal Hocko 2017-10-13 11:58 ` Michal Hocko 2017-10-13 12:00 ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Michal Hocko 2017-10-13 12:00 ` Michal Hocko 2017-10-13 12:00 ` [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages Michal Hocko 2017-10-13 12:00 ` Michal Hocko 2017-10-13 12:04 ` Vlastimil Babka 2017-10-13 12:04 ` Vlastimil Babka 2017-10-13 12:07 ` Michal Hocko 2017-10-13 12:07 ` Michal Hocko 2017-10-17 13:03 ` Vlastimil Babka 2017-10-17 13:03 ` Vlastimil Babka 2017-10-17 11:41 ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Michael Ellerman 2017-10-17 11:41 ` Michael Ellerman 2017-10-17 12:03 ` Michal Hocko 2017-10-17 12:03 ` Michal Hocko 2017-10-17 13:02 ` Vlastimil Babka 2017-10-17 13:02 ` Vlastimil Babka 2017-10-19 2:51 ` Joonsoo Kim 2017-10-19 2:51 ` Joonsoo Kim 2017-10-19 7:15 ` Michal Hocko 2017-10-19 7:15 ` Michal Hocko 2017-10-19 7:33 ` Joonsoo Kim 2017-10-19 7:33 ` Joonsoo Kim 2017-10-19 8:20 ` Michal Hocko 2017-10-19 8:20 ` Michal Hocko 2017-10-19 12:21 ` Michal Hocko 2017-10-19 12:21 ` Michal Hocko 2017-10-20 2:13 ` Joonsoo Kim 2017-10-20 2:13 ` Joonsoo Kim 2017-10-20 5:59 ` Michal Hocko 2017-10-20 5:59 ` Michal Hocko 2017-10-20 6:50 ` Joonsoo Kim 2017-10-20 6:50 ` Joonsoo Kim 2017-10-20 7:02 ` Michal Hocko 2017-10-20 7:02 ` Michal Hocko 2017-10-23 5:23 ` Joonsoo Kim 2017-10-23 5:23 ` Joonsoo Kim 2017-10-23 8:10 ` Michal Hocko 2017-10-23 8:10 ` Michal Hocko 2017-10-24 4:44 ` Joonsoo Kim 2017-10-24 4:44 ` Joonsoo Kim 2017-10-24 7:44 ` Michal Hocko 2017-10-24 7:44 ` Michal Hocko 2017-10-24 8:12 ` Vlastimil Babka 2017-10-24 8:12 ` Vlastimil Babka 2017-10-24 12:25 ` Michal Hocko 2017-10-24 12:25 ` Michal Hocko 2017-10-26 2:47 ` Joonsoo Kim 2017-10-26 2:47 ` Joonsoo Kim 2017-10-26 7:41 ` Michal Hocko 2017-10-26 7:41 ` Michal Hocko 2017-10-20 7:22 ` Xishi Qiu 2017-10-20 7:22 ` Xishi Qiu 2017-10-20 8:17 ` Michal Hocko 2017-10-20 8:17 ` Michal Hocko 2017-10-23 5:26 ` Joonsoo Kim 2017-10-23 5:26 ` Joonsoo Kim 2017-10-26 13:04 ` Vlastimil Babka 2017-10-26 13:04 ` Vlastimil Babka 2017-10-26 13:59 ` Michal Hocko 2017-10-26 13:59 ` Michal Hocko 2017-09-18 7:08 ` [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory Michal Hocko 2017-09-18 7:08 ` Michal Hocko -- strict thread matches above, loose matches on Subject: below -- 2017-09-04 8:21 [PATCH 0/2] mm, memory_hotplug: redefine memory offline retry logic Michal Hocko 2017-09-04 8:21 ` [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Michal Hocko 2017-09-04 8:21 ` Michal Hocko 2017-09-05 6:29 ` Anshuman Khandual 2017-09-05 6:29 ` Anshuman Khandual 2017-09-05 7:13 ` Michal Hocko 2017-09-05 7:13 ` Michal Hocko 2017-09-08 17:26 ` Vlastimil Babka 2017-09-08 17:26 ` Vlastimil Babka 2017-09-11 8:17 ` Michal Hocko 2017-09-11 8:17 ` Michal Hocko 2017-09-13 11:41 ` Vlastimil Babka 2017-09-13 11:41 ` Vlastimil Babka 2017-09-13 12:10 ` Michal Hocko 2017-09-13 12:10 ` Michal Hocko 2017-09-13 12:14 ` Michal Hocko 2017-09-13 12:14 ` Michal Hocko 2017-09-13 12:19 ` Vlastimil Babka 2017-09-13 12:19 ` Vlastimil Babka 2017-09-13 12:32 ` Michal Hocko 2017-09-13 12:32 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d29b6788-da1b-23e9-090c-d43428deb97d@suse.cz \ --to=vbabka@suse.cz \ --cc=akpm@linux-foundation.org \ --cc=arbab@linux.vnet.ibm.com \ --cc=imammedo@redhat.com \ --cc=kamezawa.hiroyu@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=qiuxishi@huawei.com \ --cc=vkuznets@redhat.com \ --cc=yasu.isimatu@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.