All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Reza Arbab <arbab@linux.vnet.ibm.com>,
	Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
	qiuxishi@huawei.com, Igor Mammedov <imammedo@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
Date: Tue, 10 Oct 2017 23:05:08 +1100	[thread overview]
Message-ID: <87bmlfw6mj.fsf@concordia.ellerman.id.au> (raw)
In-Reply-To: <20170918070834.13083-2-mhocko@kernel.org>

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> Memory offlining can fail just too eagerly under a heavy memory pressure.
>
> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> [ 5410.336811] page dumped because: isolation failed
> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>
> Isolation has failed here because the page is not on LRU. Most probably
> because it was on the pcp LRU cache or it has been removed from the LRU
> already but it hasn't been freed yet. In both cases the page doesn't look
> non-migrable so retrying more makes sense.

This breaks offline for me.

Prior to this commit:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	0m0.001s
  user	0m0.000s
  sys	0m0.001s

After:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	2m0.009s
  user	0m0.000s
  sys	1m25.035s


There's no way that block can be removed, it contains the kernel text,
so it should instantly fail - which it used to.


With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
__offline_memory") also applied, it appears to just get stuck forever,
and I get lots of:

  [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
  [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
  [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
  [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
  [ 1232.113531] Call Trace:
  [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
  [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
  [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
  [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
  [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
  [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
  [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
  [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
  [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
  [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
  [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
  [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
  [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c


cheers

WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <mpe@ellerman.id.au>
To: Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Reza Arbab <arbab@linux.vnet.ibm.com>,
	Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
	qiuxishi@huawei.com, Igor Mammedov <imammedo@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early
Date: Tue, 10 Oct 2017 23:05:08 +1100	[thread overview]
Message-ID: <87bmlfw6mj.fsf@concordia.ellerman.id.au> (raw)
In-Reply-To: <20170918070834.13083-2-mhocko@kernel.org>

Michal Hocko <mhocko@kernel.org> writes:

> From: Michal Hocko <mhocko@suse.com>
>
> Memory offlining can fail just too eagerly under a heavy memory pressure.
>
> [ 5410.336792] page:ffffea22a646bd00 count:255 mapcount:252 mapping:ffff88ff926c9f38 index:0x3
> [ 5410.336809] flags: 0x9855fe40010048(uptodate|active|mappedtodisk)
> [ 5410.336811] page dumped because: isolation failed
> [ 5410.336813] page->mem_cgroup:ffff8801cd662000
> [ 5420.655030] memory offlining [mem 0x18b580000000-0x18b5ffffffff] failed
>
> Isolation has failed here because the page is not on LRU. Most probably
> because it was on the pcp LRU cache or it has been removed from the LRU
> already but it hasn't been freed yet. In both cases the page doesn't look
> non-migrable so retrying more makes sense.

This breaks offline for me.

Prior to this commit:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	0m0.001s
  user	0m0.000s
  sys	0m0.001s

After:
  /sys/devices/system/memory/memory0# time echo 0 > online
  -bash: echo: write error: Device or resource busy
  
  real	2m0.009s
  user	0m0.000s
  sys	1m25.035s


There's no way that block can be removed, it contains the kernel text,
so it should instantly fail - which it used to.


With commit 3aa2823fdf66 ("mm, memory_hotplug: remove timeout from
__offline_memory") also applied, it appears to just get stuck forever,
and I get lots of:

  [ 1232.112953] INFO: task kworker/3:0:4609 blocked for more than 120 seconds.
  [ 1232.113067]       Not tainted 4.14.0-rc4-gcc6-next-20171009-g49827b9 #1
  [ 1232.113183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 1232.113319] kworker/3:0     D11984  4609      2 0x00000800
  [ 1232.113416] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
  [ 1232.113531] Call Trace:
  [ 1232.113579] [c0000000fb2db7a0] [c0000000fb2db900] 0xc0000000fb2db900 (unreliable)
  [ 1232.113717] [c0000000fb2db970] [c00000000001c964] __switch_to+0x304/0x6e0
  [ 1232.113840] [c0000000fb2dba10] [c000000000a408c0] __schedule+0x2e0/0xa80
  [ 1232.113978] [c0000000fb2dbae0] [c000000000a410a8] schedule+0x48/0xc0
  [ 1232.114113] [c0000000fb2dbb10] [c000000000a44d88] rwsem_down_read_failed+0x128/0x1b0
  [ 1232.114269] [c0000000fb2dbb70] [c0000000001696a8] __percpu_down_read+0x108/0x110
  [ 1232.114426] [c0000000fb2dbba0] [c00000000032e498] get_online_mems+0x68/0x80
  [ 1232.115487] [c0000000fb2dbbc0] [c0000000002c82ec] memcg_create_kmem_cache+0x4c/0x190
  [ 1232.115651] [c0000000fb2dbc60] [c0000000003483b8] memcg_kmem_cache_create_func+0x38/0xf0
  [ 1232.115809] [c0000000fb2dbc90] [c000000000121594] process_one_work+0x2b4/0x590
  [ 1232.115964] [c0000000fb2dbd20] [c000000000121908] worker_thread+0x98/0x5d0
  [ 1232.116095] [c0000000fb2dbdc0] [c00000000012a134] kthread+0x164/0x1b0
  [ 1232.116229] [c0000000fb2dbe30] [c00000000000bae0] ret_from_kernel_thread+0x5c/0x7c


cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-10 12:05 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-18  7:08 [PATCH v2 0/2] mm, memory_hotplug: redefine memory offline retry logic Michal Hocko
2017-09-18  7:08 ` Michal Hocko
2017-09-18  7:08 ` [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Michal Hocko
2017-09-18  7:08   ` Michal Hocko
2017-10-10 12:05   ` Michael Ellerman [this message]
2017-10-10 12:05     ` Michael Ellerman
2017-10-10 12:27     ` Michal Hocko
2017-10-10 12:27       ` Michal Hocko
2017-10-11  2:37       ` Michael Ellerman
2017-10-11  2:37         ` Michael Ellerman
2017-10-11  5:19         ` Michael Ellerman
2017-10-11  5:19           ` Michael Ellerman
2017-10-11 14:05           ` Anshuman Khandual
2017-10-11 14:05             ` Anshuman Khandual
2017-10-11 14:16             ` Michal Hocko
2017-10-11 14:16               ` Michal Hocko
2017-10-11  6:51         ` Michal Hocko
2017-10-11  6:51           ` Michal Hocko
2017-10-11  8:04           ` Vlastimil Babka
2017-10-11  8:04             ` Vlastimil Babka
2017-10-11  8:13             ` Michal Hocko
2017-10-11  8:13               ` Michal Hocko
2017-10-11 11:17               ` Vlastimil Babka
2017-10-11 11:17                 ` Vlastimil Babka
2017-10-11 11:24                 ` Michal Hocko
2017-10-11 11:24                   ` Michal Hocko
2017-10-13 11:42             ` Michael Ellerman
2017-10-13 11:42               ` Michael Ellerman
2017-10-13 11:58               ` Michal Hocko
2017-10-13 11:58                 ` Michal Hocko
2017-10-13 12:00                 ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Michal Hocko
2017-10-13 12:00                   ` Michal Hocko
2017-10-13 12:00                   ` [PATCH 2/2] mm, page_alloc: fail has_unmovable_pages when seeing reserved pages Michal Hocko
2017-10-13 12:00                     ` Michal Hocko
2017-10-13 12:04                     ` Vlastimil Babka
2017-10-13 12:04                       ` Vlastimil Babka
2017-10-13 12:07                       ` Michal Hocko
2017-10-13 12:07                         ` Michal Hocko
2017-10-17 13:03                         ` Vlastimil Babka
2017-10-17 13:03                           ` Vlastimil Babka
2017-10-17 11:41                   ` [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages Michael Ellerman
2017-10-17 11:41                     ` Michael Ellerman
2017-10-17 12:03                     ` Michal Hocko
2017-10-17 12:03                       ` Michal Hocko
2017-10-17 13:02                   ` Vlastimil Babka
2017-10-17 13:02                     ` Vlastimil Babka
2017-10-19  2:51                   ` Joonsoo Kim
2017-10-19  2:51                     ` Joonsoo Kim
2017-10-19  7:15                     ` Michal Hocko
2017-10-19  7:15                       ` Michal Hocko
2017-10-19  7:33                       ` Joonsoo Kim
2017-10-19  7:33                         ` Joonsoo Kim
2017-10-19  8:20                         ` Michal Hocko
2017-10-19  8:20                           ` Michal Hocko
2017-10-19 12:21                           ` Michal Hocko
2017-10-19 12:21                             ` Michal Hocko
2017-10-20  2:13                             ` Joonsoo Kim
2017-10-20  2:13                               ` Joonsoo Kim
2017-10-20  5:59                               ` Michal Hocko
2017-10-20  5:59                                 ` Michal Hocko
2017-10-20  6:50                                 ` Joonsoo Kim
2017-10-20  6:50                                   ` Joonsoo Kim
2017-10-20  7:02                                   ` Michal Hocko
2017-10-20  7:02                                     ` Michal Hocko
2017-10-23  5:23                                     ` Joonsoo Kim
2017-10-23  5:23                                       ` Joonsoo Kim
2017-10-23  8:10                                       ` Michal Hocko
2017-10-23  8:10                                         ` Michal Hocko
2017-10-24  4:44                                         ` Joonsoo Kim
2017-10-24  4:44                                           ` Joonsoo Kim
2017-10-24  7:44                                           ` Michal Hocko
2017-10-24  7:44                                             ` Michal Hocko
2017-10-24  8:12                                           ` Vlastimil Babka
2017-10-24  8:12                                             ` Vlastimil Babka
2017-10-24 12:25                                             ` Michal Hocko
2017-10-24 12:25                                               ` Michal Hocko
2017-10-26  2:47                                             ` Joonsoo Kim
2017-10-26  2:47                                               ` Joonsoo Kim
2017-10-26  7:41                                               ` Michal Hocko
2017-10-26  7:41                                                 ` Michal Hocko
2017-10-20  7:22                               ` Xishi Qiu
2017-10-20  7:22                                 ` Xishi Qiu
2017-10-20  8:17                                 ` Michal Hocko
2017-10-20  8:17                                   ` Michal Hocko
2017-10-23  5:26                                   ` Joonsoo Kim
2017-10-23  5:26                                     ` Joonsoo Kim
2017-10-26 13:04                             ` Vlastimil Babka
2017-10-26 13:04                               ` Vlastimil Babka
2017-10-26 13:59                             ` Michal Hocko
2017-10-26 13:59                               ` Michal Hocko
2017-09-18  7:08 ` [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory Michal Hocko
2017-09-18  7:08   ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2017-09-04  8:21 [PATCH 0/2] mm, memory_hotplug: redefine memory offline retry logic Michal Hocko
2017-09-04  8:21 ` [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early Michal Hocko
2017-09-04  8:21   ` Michal Hocko
2017-09-05  6:29   ` Anshuman Khandual
2017-09-05  6:29     ` Anshuman Khandual
2017-09-05  7:13     ` Michal Hocko
2017-09-05  7:13       ` Michal Hocko
2017-09-08 17:26   ` Vlastimil Babka
2017-09-08 17:26     ` Vlastimil Babka
2017-09-11  8:17     ` Michal Hocko
2017-09-11  8:17       ` Michal Hocko
2017-09-13 11:41       ` Vlastimil Babka
2017-09-13 11:41         ` Vlastimil Babka
2017-09-13 12:10         ` Michal Hocko
2017-09-13 12:10           ` Michal Hocko
2017-09-13 12:14           ` Michal Hocko
2017-09-13 12:14             ` Michal Hocko
2017-09-13 12:19             ` Vlastimil Babka
2017-09-13 12:19               ` Vlastimil Babka
2017-09-13 12:32               ` Michal Hocko
2017-09-13 12:32                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bmlfw6mj.fsf@concordia.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=arbab@linux.vnet.ibm.com \
    --cc=imammedo@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=qiuxishi@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.