All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] BUG raised when onlining HWPoisoned page
@ 2017-04-25 14:27 ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-25 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

When a page is HWPoisoned and later offlined and onlined back, a BUG
warning is raised in the kernel:

BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
index:0x1
flags: 0x3ffff800200000(hwpoison)
raw: 003ffff800200000 0000000000000000 0000000000000001
00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
c0000007fe055800
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000007fe055800
Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
virtio_ring virtio
CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
#1
Call Trace:
[c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
(unreliable)
[c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
[c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
[c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
[c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
[c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
[c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
[c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
[c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
[c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
[c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
[c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
[c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
[c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
[c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
[c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
[c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
[c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc

This has been seen on x86 kvm guest, PowerPC bare metal system and KVM
guest.

The issue is that the onlined page has already the mem_cgroup field
set.

It seems that the mem_cgroup field should be cleared when the page is
poisoned, which is done in the first patch of this series.

Then when the page is onlined back, the BUG warning is no more
triggered, but the page is now available for use, and once a process
is using it, it got killed because of the memory error.
It seems that the page should be ignored when onlined, as it is when
it is offlined (introduced by commit b023f46813cd "memory-hotplug:
skip HWPoisoned page when offlining pages"). The second patch of this
series is skipping HWPoisoned page when the memory block is onlined
back.

Changes against V1:
 - The first patch commit's description mentioned the BUG explicitly
 - In the patch as requested by Naoya Horiguchi, clear the page's
   reserved flag when the skipping the poisoned page.

Laurent Dufour (2):
  mm: Uncharge poisoned pages
  mm: skip HWPoisoned pages when onlining pages

 mm/memory-failure.c | 3 +++
 mm/memory_hotplug.c | 4 ++++
 2 files changed, 7 insertions(+)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/2] BUG raised when onlining HWPoisoned page
@ 2017-04-25 14:27 ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-25 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

When a page is HWPoisoned and later offlined and onlined back, a BUG
warning is raised in the kernel:

BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
index:0x1
flags: 0x3ffff800200000(hwpoison)
raw: 003ffff800200000 0000000000000000 0000000000000001
00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
c0000007fe055800
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000007fe055800
Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
virtio_ring virtio
CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
#1
Call Trace:
[c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
(unreliable)
[c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
[c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
[c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
[c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
[c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
[c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
[c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
[c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
[c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
[c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
[c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
[c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
[c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
[c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
[c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
[c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
[c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc

This has been seen on x86 kvm guest, PowerPC bare metal system and KVM
guest.

The issue is that the onlined page has already the mem_cgroup field
set.

It seems that the mem_cgroup field should be cleared when the page is
poisoned, which is done in the first patch of this series.

Then when the page is onlined back, the BUG warning is no more
triggered, but the page is now available for use, and once a process
is using it, it got killed because of the memory error.
It seems that the page should be ignored when onlined, as it is when
it is offlined (introduced by commit b023f46813cd "memory-hotplug:
skip HWPoisoned page when offlining pages"). The second patch of this
series is skipping HWPoisoned page when the memory block is onlined
back.

Changes against V1:
 - The first patch commit's description mentioned the BUG explicitly
 - In the patch as requested by Naoya Horiguchi, clear the page's
   reserved flag when the skipping the poisoned page.

Laurent Dufour (2):
  mm: Uncharge poisoned pages
  mm: skip HWPoisoned pages when onlining pages

 mm/memory-failure.c | 3 +++
 mm/memory_hotplug.c | 4 ++++
 2 files changed, 7 insertions(+)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-25 14:27 ` Laurent Dufour
@ 2017-04-25 14:27   ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-25 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

When page are poisoned, they should be uncharged from the root memory
cgroup.

This is required to avoid a BUG raised when the page is onlined back:
BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
index:0x1
flags: 0x3ffff800200000(hwpoison)
raw: 003ffff800200000 0000000000000000 0000000000000001
00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
c0000007fe055800
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000007fe055800
Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
virtio_ring virtio
CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
Call Trace:
[c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
(unreliable)
[c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
[c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
[c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
[c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
[c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
[c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
[c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
[c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
[c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
[c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
[c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
[c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
[c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
[c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
[c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
[c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
[c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 mm/memory-failure.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27f7210e7fab..22bd22eb25cb 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
  */
 static int delete_from_lru_cache(struct page *p)
 {
+	if (memcg_kmem_enabled())
+		memcg_kmem_uncharge(p, 0);
+
 	if (!isolate_lru_page(p)) {
 		/*
 		 * Clear sensible page flags, so that the buddy system won't
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-25 14:27   ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-25 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

When page are poisoned, they should be uncharged from the root memory
cgroup.

This is required to avoid a BUG raised when the page is onlined back:
BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
index:0x1
flags: 0x3ffff800200000(hwpoison)
raw: 003ffff800200000 0000000000000000 0000000000000001
00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
c0000007fe055800
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000007fe055800
Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
virtio_ring virtio
CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
Call Trace:
[c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
(unreliable)
[c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
[c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
[c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
[c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
[c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
[c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
[c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
[c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
[c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
[c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
[c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
[c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
[c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
[c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
[c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
[c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
[c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 mm/memory-failure.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27f7210e7fab..22bd22eb25cb 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
  */
 static int delete_from_lru_cache(struct page *p)
 {
+	if (memcg_kmem_enabled())
+		memcg_kmem_uncharge(p, 0);
+
 	if (!isolate_lru_page(p)) {
 		/*
 		 * Clear sensible page flags, so that the buddy system won't
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-25 14:27 ` Laurent Dufour
@ 2017-04-25 14:27   ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-25 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
offlining pages") skip the HWPoisoned pages when offlining pages, but
this should be skipped when onlining the pages too.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 mm/memory_hotplug.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6fa7208bcd56..741ddb50e7d2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
+			if (PageHWPoison(page)) {
+				ClearPageReserved(page);
+				continue;
+			}
 			(*online_page_callback)(page);
 			onlined_pages++;
 		}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-25 14:27   ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-25 14:27 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
offlining pages") skip the HWPoisoned pages when offlining pages, but
this should be skipped when onlining the pages too.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 mm/memory_hotplug.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6fa7208bcd56..741ddb50e7d2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
+			if (PageHWPoison(page)) {
+				ClearPageReserved(page);
+				continue;
+			}
 			(*online_page_callback)(page);
 			onlined_pages++;
 		}
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-25 14:27   ` Laurent Dufour
@ 2017-04-25 23:48     ` Naoya Horiguchi
  -1 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-25 23:48 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: linux-kernel, linux-mm, akpm

On Tue, Apr 25, 2017 at 04:27:51PM +0200, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
> 
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)
> raw: 003ffff800200000 0000000000000000 0000000000000001
> 00000000ffffffff
> raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> c0000007fe055800
> page dumped because: page still charged to cgroup
> page->mem_cgroup:c0000007fe055800
> Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> virtio_ring virtio
> CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
> Call Trace:
> [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> (unreliable)
> [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
>  mm/memory-failure.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210e7fab..22bd22eb25cb 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
>   */
>  static int delete_from_lru_cache(struct page *p)
>  {
> +	if (memcg_kmem_enabled())
> +		memcg_kmem_uncharge(p, 0);
> +
>  	if (!isolate_lru_page(p)) {
>  		/*
>  		 * Clear sensible page flags, so that the buddy system won't
> -- 
> 2.7.4
> 
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-25 23:48     ` Naoya Horiguchi
  0 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-25 23:48 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: linux-kernel, linux-mm, akpm

On Tue, Apr 25, 2017 at 04:27:51PM +0200, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
> 
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)
> raw: 003ffff800200000 0000000000000000 0000000000000001
> 00000000ffffffff
> raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> c0000007fe055800
> page dumped because: page still charged to cgroup
> page->mem_cgroup:c0000007fe055800
> Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> virtio_ring virtio
> CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
> Call Trace:
> [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> (unreliable)
> [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
>  mm/memory-failure.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210e7fab..22bd22eb25cb 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
>   */
>  static int delete_from_lru_cache(struct page *p)
>  {
> +	if (memcg_kmem_enabled())
> +		memcg_kmem_uncharge(p, 0);
> +
>  	if (!isolate_lru_page(p)) {
>  		/*
>  		 * Clear sensible page flags, so that the buddy system won't
> -- 
> 2.7.4
> 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-25 14:27   ` Laurent Dufour
@ 2017-04-26  1:54     ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  1:54 UTC (permalink / raw)
  To: Laurent Dufour, Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
> 
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)
> raw: 003ffff800200000 0000000000000000 0000000000000001
> 00000000ffffffff
> raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> c0000007fe055800
> page dumped because: page still charged to cgroup
> page->mem_cgroup:c0000007fe055800
> Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> virtio_ring virtio
> CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
> Call Trace:
> [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> (unreliable)
> [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> ---
>  mm/memory-failure.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210e7fab..22bd22eb25cb 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
>   */
>  static int delete_from_lru_cache(struct page *p)
>  {
> +	if (memcg_kmem_enabled())
> +		memcg_kmem_uncharge(p, 0);
> +

The changelog is not quite clear, so we are uncharging a page using
memcg_kmem_uncharge for a page in swap cache/page cache?

Balbir Singh.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-26  1:54     ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  1:54 UTC (permalink / raw)
  To: Laurent Dufour, Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
> 
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)
> raw: 003ffff800200000 0000000000000000 0000000000000001
> 00000000ffffffff
> raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> c0000007fe055800
> page dumped because: page still charged to cgroup
> page->mem_cgroup:c0000007fe055800
> Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> virtio_ring virtio
> CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
> Call Trace:
> [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> (unreliable)
> [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> ---
>  mm/memory-failure.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210e7fab..22bd22eb25cb 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
>   */
>  static int delete_from_lru_cache(struct page *p)
>  {
> +	if (memcg_kmem_enabled())
> +		memcg_kmem_uncharge(p, 0);
> +

The changelog is not quite clear, so we are uncharging a page using
memcg_kmem_uncharge for a page in swap cache/page cache?

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-25 14:27   ` Laurent Dufour
@ 2017-04-26  2:10     ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  2:10 UTC (permalink / raw)
  To: Laurent Dufour, Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> offlining pages") skip the HWPoisoned pages when offlining pages, but
> this should be skipped when onlining the pages too.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> ---
>  mm/memory_hotplug.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6fa7208bcd56..741ddb50e7d2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>  	if (PageReserved(pfn_to_page(start_pfn)))
>  		for (i = 0; i < nr_pages; i++) {
>  			page = pfn_to_page(start_pfn + i);
> +			if (PageHWPoison(page)) {
> +				ClearPageReserved(page);

Why do we clear page reserved? Also if the page is marked PageHWPoison, it
was never offlined to begin with? Or do you expect this to be set on newly
hotplugged memory? Also don't we need to skip the entire pageblock?

Balbir Singh.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-26  2:10     ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  2:10 UTC (permalink / raw)
  To: Laurent Dufour, Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> offlining pages") skip the HWPoisoned pages when offlining pages, but
> this should be skipped when onlining the pages too.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> ---
>  mm/memory_hotplug.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6fa7208bcd56..741ddb50e7d2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>  	if (PageReserved(pfn_to_page(start_pfn)))
>  		for (i = 0; i < nr_pages; i++) {
>  			page = pfn_to_page(start_pfn + i);
> +			if (PageHWPoison(page)) {
> +				ClearPageReserved(page);

Why do we clear page reserved? Also if the page is marked PageHWPoison, it
was never offlined to begin with? Or do you expect this to be set on newly
hotplugged memory? Also don't we need to skip the entire pageblock?

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-26  1:54     ` Balbir Singh
@ 2017-04-26  2:34       ` Naoya Horiguchi
  -1 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  2:34 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, Apr 26, 2017 at 11:54:58AM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > When page are poisoned, they should be uncharged from the root memory
> > cgroup.
> > 
> > This is required to avoid a BUG raised when the page is onlined back:
> > BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> > page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> > index:0x1
> > flags: 0x3ffff800200000(hwpoison)
> > raw: 003ffff800200000 0000000000000000 0000000000000001
> > 00000000ffffffff
> > raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> > c0000007fe055800
> > page dumped because: page still charged to cgroup
> > page->mem_cgroup:c0000007fe055800
> > Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> > ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> > virtio_ring virtio
> > CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
> > Call Trace:
> > [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> > (unreliable)
> > [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> > [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> > [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> > [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> > [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> > [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> > [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> > [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> > [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> > [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> > [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> > [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> > [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> > [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> > [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> > [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> > [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> > 
> > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > ---
> >  mm/memory-failure.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 27f7210e7fab..22bd22eb25cb 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
> >   */
> >  static int delete_from_lru_cache(struct page *p)
> >  {
> > +	if (memcg_kmem_enabled())
> > +		memcg_kmem_uncharge(p, 0);
> > +
> 
> The changelog is not quite clear, so we are uncharging a page using
> memcg_kmem_uncharge for a page in swap cache/page cache?

Hi Balbir,

Yes, in the normal page lifecycle, uncharge is done in page free time.
But in memory error handling case, in-use pages (i.e. swap cache and page
cache) are removed from normal path and they don't pass page freeing code.
So I think that this change is to keep the consistent charging for such a case.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-26  2:34       ` Naoya Horiguchi
  0 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  2:34 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, Apr 26, 2017 at 11:54:58AM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > When page are poisoned, they should be uncharged from the root memory
> > cgroup.
> > 
> > This is required to avoid a BUG raised when the page is onlined back:
> > BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> > page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> > index:0x1
> > flags: 0x3ffff800200000(hwpoison)
> > raw: 003ffff800200000 0000000000000000 0000000000000001
> > 00000000ffffffff
> > raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> > c0000007fe055800
> > page dumped because: page still charged to cgroup
> > page->mem_cgroup:c0000007fe055800
> > Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> > ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> > virtio_ring virtio
> > CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G    B 4.11.0-rc7-hwp
> > Call Trace:
> > [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> > (unreliable)
> > [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> > [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> > [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> > [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> > [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> > [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> > [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> > [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> > [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> > [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> > [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> > [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> > [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> > [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> > [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> > [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> > [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> > 
> > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > ---
> >  mm/memory-failure.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 27f7210e7fab..22bd22eb25cb 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
> >   */
> >  static int delete_from_lru_cache(struct page *p)
> >  {
> > +	if (memcg_kmem_enabled())
> > +		memcg_kmem_uncharge(p, 0);
> > +
> 
> The changelog is not quite clear, so we are uncharging a page using
> memcg_kmem_uncharge for a page in swap cache/page cache?

Hi Balbir,

Yes, in the normal page lifecycle, uncharge is done in page free time.
But in memory error handling case, in-use pages (i.e. swap cache and page
cache) are removed from normal path and they don't pass page freeing code.
So I think that this change is to keep the consistent charging for such a case.

- Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-26  2:10     ` Balbir Singh
@ 2017-04-26  3:13       ` Naoya Horiguchi
  -1 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  3:13 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > this should be skipped when onlining the pages too.
> >
> > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > ---
> >  mm/memory_hotplug.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 6fa7208bcd56..741ddb50e7d2 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> >  	if (PageReserved(pfn_to_page(start_pfn)))
> >  		for (i = 0; i < nr_pages; i++) {
> >  			page = pfn_to_page(start_pfn + i);
> > +			if (PageHWPoison(page)) {
> > +				ClearPageReserved(page);
>
> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> was never offlined to begin with? Or do you expect this to be set on newly
> hotplugged memory? Also don't we need to skip the entire pageblock?

If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
that we skip the page status check for hwpoisoned pages *not* to prevent
memory offlining for memblocks with hwpoisoned pages. That means that
hwpoisoned pages can be offlined.

And another reason to clear PageReserved is that we could reuse the
hwpoisoned page after onlining back with replacing the broken DIMM.
In this usecase, we first do unpoisoning to clear PageHWPoison,
but it doesn't work if PageReserved is set. My simple testing shows
the BUG below in unpoisoning (without the ClearPageReserved):

  Unpoison: Software-unpoisoned page 0x18000
  BUG: Bad page state in process page-types  pfn:18000
  page:ffffda5440600000 count:0 mapcount:0 mapping:          (null) index:0x70006b599
  flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
  raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
  raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
  bad because of flags: 0x800(reserved)

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-26  3:13       ` Naoya Horiguchi
  0 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  3:13 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > this should be skipped when onlining the pages too.
> >
> > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > ---
> >  mm/memory_hotplug.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 6fa7208bcd56..741ddb50e7d2 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> >  	if (PageReserved(pfn_to_page(start_pfn)))
> >  		for (i = 0; i < nr_pages; i++) {
> >  			page = pfn_to_page(start_pfn + i);
> > +			if (PageHWPoison(page)) {
> > +				ClearPageReserved(page);
>
> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> was never offlined to begin with? Or do you expect this to be set on newly
> hotplugged memory? Also don't we need to skip the entire pageblock?

If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
that we skip the page status check for hwpoisoned pages *not* to prevent
memory offlining for memblocks with hwpoisoned pages. That means that
hwpoisoned pages can be offlined.

And another reason to clear PageReserved is that we could reuse the
hwpoisoned page after onlining back with replacing the broken DIMM.
In this usecase, we first do unpoisoning to clear PageHWPoison,
but it doesn't work if PageReserved is set. My simple testing shows
the BUG below in unpoisoning (without the ClearPageReserved):

  Unpoison: Software-unpoisoned page 0x18000
  BUG: Bad page state in process page-types  pfn:18000
  page:ffffda5440600000 count:0 mapcount:0 mapping:          (null) index:0x70006b599
  flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
  raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
  raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
  bad because of flags: 0x800(reserved)

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-26  2:34       ` Naoya Horiguchi
@ 2017-04-26  3:45         ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  3:45 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

> > >  static int delete_from_lru_cache(struct page *p)
> > >  {
> > > +	if (memcg_kmem_enabled())
> > > +		memcg_kmem_uncharge(p, 0);
> > > +
> > 
> > The changelog is not quite clear, so we are uncharging a page using
> > memcg_kmem_uncharge for a page in swap cache/page cache?
> 
> Hi Balbir,
> 
> Yes, in the normal page lifecycle, uncharge is done in page free time.
> But in memory error handling case, in-use pages (i.e. swap cache and page
> cache) are removed from normal path and they don't pass page freeing code.
> So I think that this change is to keep the consistent charging for such a case.

I agree we should uncharge, but looking at the API name, it seems to
be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
something?

Balbir Singh.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-26  3:45         ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  3:45 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

> > >  static int delete_from_lru_cache(struct page *p)
> > >  {
> > > +	if (memcg_kmem_enabled())
> > > +		memcg_kmem_uncharge(p, 0);
> > > +
> > 
> > The changelog is not quite clear, so we are uncharging a page using
> > memcg_kmem_uncharge for a page in swap cache/page cache?
> 
> Hi Balbir,
> 
> Yes, in the normal page lifecycle, uncharge is done in page free time.
> But in memory error handling case, in-use pages (i.e. swap cache and page
> cache) are removed from normal path and they don't pass page freeing code.
> So I think that this change is to keep the consistent charging for such a case.

I agree we should uncharge, but looking at the API name, it seems to
be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
something?

Balbir Singh.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-26  3:45         ` Balbir Singh
@ 2017-04-26  4:46           ` Naoya Horiguchi
  -1 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  4:46 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
> > > >  static int delete_from_lru_cache(struct page *p)
> > > >  {
> > > > +	if (memcg_kmem_enabled())
> > > > +		memcg_kmem_uncharge(p, 0);
> > > > +
> > > 
> > > The changelog is not quite clear, so we are uncharging a page using
> > > memcg_kmem_uncharge for a page in swap cache/page cache?
> > 
> > Hi Balbir,
> > 
> > Yes, in the normal page lifecycle, uncharge is done in page free time.
> > But in memory error handling case, in-use pages (i.e. swap cache and page
> > cache) are removed from normal path and they don't pass page freeing code.
> > So I think that this change is to keep the consistent charging for such a case.
> 
> I agree we should uncharge, but looking at the API name, it seems to
> be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
> something?

Thank you for pointing out.
Actually I had the same question and this surely looks strange.
But simply calling mem_cgroup_uncharge() here doesn't work because it
assumes that page_refcount(p) == 0, which is not true in hwpoison context.
We need some other clearer way or at least some justifying comment about
why this is ok.

- Naoya

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-26  4:46           ` Naoya Horiguchi
  0 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-04-26  4:46 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
> > > >  static int delete_from_lru_cache(struct page *p)
> > > >  {
> > > > +	if (memcg_kmem_enabled())
> > > > +		memcg_kmem_uncharge(p, 0);
> > > > +
> > > 
> > > The changelog is not quite clear, so we are uncharging a page using
> > > memcg_kmem_uncharge for a page in swap cache/page cache?
> > 
> > Hi Balbir,
> > 
> > Yes, in the normal page lifecycle, uncharge is done in page free time.
> > But in memory error handling case, in-use pages (i.e. swap cache and page
> > cache) are removed from normal path and they don't pass page freeing code.
> > So I think that this change is to keep the consistent charging for such a case.
> 
> I agree we should uncharge, but looking at the API name, it seems to
> be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
> something?

Thank you for pointing out.
Actually I had the same question and this surely looks strange.
But simply calling mem_cgroup_uncharge() here doesn't work because it
assumes that page_refcount(p) == 0, which is not true in hwpoison context.
We need some other clearer way or at least some justifying comment about
why this is ok.

- Naoya
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-26  4:46           ` Naoya Horiguchi
@ 2017-04-26  8:59             ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  8:59 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, 2017-04-26 at 04:46 +0000, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
> > > > >  static int delete_from_lru_cache(struct page *p)
> > > > >  {
> > > > > +	if (memcg_kmem_enabled())
> > > > > +		memcg_kmem_uncharge(p, 0);
> > > > > +
> > > > 
> > > > The changelog is not quite clear, so we are uncharging a page using
> > > > memcg_kmem_uncharge for a page in swap cache/page cache?
> > > 
> > > Hi Balbir,
> > > 
> > > Yes, in the normal page lifecycle, uncharge is done in page free time.
> > > But in memory error handling case, in-use pages (i.e. swap cache and page
> > > cache) are removed from normal path and they don't pass page freeing code.
> > > So I think that this change is to keep the consistent charging for such a case.
> > 
> > I agree we should uncharge, but looking at the API name, it seems to
> > be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
> > something?
> 
> Thank you for pointing out.
> Actually I had the same question and this surely looks strange.
> But simply calling mem_cgroup_uncharge() here doesn't work because it
> assumes that page_refcount(p) == 0, which is not true in hwpoison context.
> We need some other clearer way or at least some justifying comment about
> why this is ok.
>

We should call mem_cgroup_uncharge() after isolate_lru_page()/put_page().
We could check if page_count() is 0 or force if required (!MF_RECOVERED &&
!MF_DELAYED). We could even skip the VM_BUG_ON if the page is poisoned.

Balbir Singh. 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-26  8:59             ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-26  8:59 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, 2017-04-26 at 04:46 +0000, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
> > > > >  static int delete_from_lru_cache(struct page *p)
> > > > >  {
> > > > > +	if (memcg_kmem_enabled())
> > > > > +		memcg_kmem_uncharge(p, 0);
> > > > > +
> > > > 
> > > > The changelog is not quite clear, so we are uncharging a page using
> > > > memcg_kmem_uncharge for a page in swap cache/page cache?
> > > 
> > > Hi Balbir,
> > > 
> > > Yes, in the normal page lifecycle, uncharge is done in page free time.
> > > But in memory error handling case, in-use pages (i.e. swap cache and page
> > > cache) are removed from normal path and they don't pass page freeing code.
> > > So I think that this change is to keep the consistent charging for such a case.
> > 
> > I agree we should uncharge, but looking at the API name, it seems to
> > be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
> > something?
> 
> Thank you for pointing out.
> Actually I had the same question and this surely looks strange.
> But simply calling mem_cgroup_uncharge() here doesn't work because it
> assumes that page_refcount(p) == 0, which is not true in hwpoison context.
> We need some other clearer way or at least some justifying comment about
> why this is ok.
>

We should call mem_cgroup_uncharge() after isolate_lru_page()/put_page().
We could check if page_count() is 0 or force if required (!MF_RECOVERED &&
!MF_DELAYED). We could even skip the VM_BUG_ON if the page is poisoned.

Balbir Singh. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-25 14:27   ` Laurent Dufour
@ 2017-04-27 14:37     ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-27 14:37 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm

On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
> 
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)

My knowledge of memory poisoning is very rudimentary but aren't those
pages supposed to leak and never come back? In other words isn't the
hoplug code broken because it should leave them alone?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-27 14:37     ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-27 14:37 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm

On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
> 
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)

My knowledge of memory poisoning is very rudimentary but aren't those
pages supposed to leak and never come back? In other words isn't the
hoplug code broken because it should leave them alone?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-27 14:37     ` Michal Hocko
@ 2017-04-27 20:51       ` Andi Kleen
  -1 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2017-04-27 20:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Laurent Dufour, Naoya Horiguchi, linux-kernel, linux-mm, akpm

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>> When page are poisoned, they should be uncharged from the root memory
>> cgroup.
>> 
>> This is required to avoid a BUG raised when the page is onlined back:
>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
>> index:0x1
>> flags: 0x3ffff800200000(hwpoison)
>
> My knowledge of memory poisoning is very rudimentary but aren't those
> pages supposed to leak and never come back? In other words isn't the
> hoplug code broken because it should leave them alone?

Yes that would be the right interpretation. If it was really offlined
due to a hardware error the memory will be poisoned and any access
could cause a machine check.

hwpoison has an own "unpoison" option (only used for debugging), which
I think handles this.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-27 20:51       ` Andi Kleen
  0 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2017-04-27 20:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Laurent Dufour, Naoya Horiguchi, linux-kernel, linux-mm, akpm

Michal Hocko <mhocko@kernel.org> writes:

> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>> When page are poisoned, they should be uncharged from the root memory
>> cgroup.
>> 
>> This is required to avoid a BUG raised when the page is onlined back:
>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
>> index:0x1
>> flags: 0x3ffff800200000(hwpoison)
>
> My knowledge of memory poisoning is very rudimentary but aren't those
> pages supposed to leak and never come back? In other words isn't the
> hoplug code broken because it should leave them alone?

Yes that would be the right interpretation. If it was really offlined
due to a hardware error the memory will be poisoned and any access
could cause a machine check.

hwpoison has an own "unpoison" option (only used for debugging), which
I think handles this.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-26  3:13       ` Naoya Horiguchi
@ 2017-04-28  2:51         ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-28  2:51 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, 2017-04-26 at 03:13 +0000, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > > 
> > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > ---
> > >  mm/memory_hotplug.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > >  		for (i = 0; i < nr_pages; i++) {
> > >  			page = pfn_to_page(start_pfn + i);
> > > +			if (PageHWPoison(page)) {
> > > +				ClearPageReserved(page);
> > 
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
> 
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.
> 
> And another reason to clear PageReserved is that we could reuse the
> hwpoisoned page after onlining back with replacing the broken DIMM.
> In this usecase, we first do unpoisoning to clear PageHWPoison,
> but it doesn't work if PageReserved is set. My simple testing shows
> the BUG below in unpoisoning (without the ClearPageReserved):
>

Fair enough, thanks for the explanation

Balbir Singh. 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-28  2:51         ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-04-28  2:51 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Laurent Dufour, linux-kernel, linux-mm, akpm

On Wed, 2017-04-26 at 03:13 +0000, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > > 
> > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > ---
> > >  mm/memory_hotplug.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > >  		for (i = 0; i < nr_pages; i++) {
> > >  			page = pfn_to_page(start_pfn + i);
> > > +			if (PageHWPoison(page)) {
> > > +				ClearPageReserved(page);
> > 
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
> 
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.
> 
> And another reason to clear PageReserved is that we could reuse the
> hwpoisoned page after onlining back with replacing the broken DIMM.
> In this usecase, we first do unpoisoning to clear PageHWPoison,
> but it doesn't work if PageReserved is set. My simple testing shows
> the BUG below in unpoisoning (without the ClearPageReserved):
>

Fair enough, thanks for the explanation

Balbir Singh. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-27 20:51       ` Andi Kleen
@ 2017-04-28  6:07         ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:07 UTC (permalink / raw)
  To: Laurent Dufour, Andi Kleen; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm

On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> >> When page are poisoned, they should be uncharged from the root memory
> >> cgroup.
> >> 
> >> This is required to avoid a BUG raised when the page is onlined back:
> >> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> >> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> >> index:0x1
> >> flags: 0x3ffff800200000(hwpoison)
> >
> > My knowledge of memory poisoning is very rudimentary but aren't those
> > pages supposed to leak and never come back? In other words isn't the
> > hoplug code broken because it should leave them alone?
> 
> Yes that would be the right interpretation. If it was really offlined
> due to a hardware error the memory will be poisoned and any access
> could cause a machine check.

OK, thanks for the clarification. Then I am not sure the patch is
correct. Why do we need to uncharge that page at all? It is not freed.
The correct thing to do is to not online it in the first place which is
done in patch2 [1]. Even if we need to uncharge the page the reason is
not to silent the BUG, that is merely papering a issue than a real fix.
Laurent can you elaborate please.

[1] http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-28  6:07         ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:07 UTC (permalink / raw)
  To: Laurent Dufour, Andi Kleen; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm

On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> >> When page are poisoned, they should be uncharged from the root memory
> >> cgroup.
> >> 
> >> This is required to avoid a BUG raised when the page is onlined back:
> >> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> >> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> >> index:0x1
> >> flags: 0x3ffff800200000(hwpoison)
> >
> > My knowledge of memory poisoning is very rudimentary but aren't those
> > pages supposed to leak and never come back? In other words isn't the
> > hoplug code broken because it should leave them alone?
> 
> Yes that would be the right interpretation. If it was really offlined
> due to a hardware error the memory will be poisoned and any access
> could cause a machine check.

OK, thanks for the clarification. Then I am not sure the patch is
correct. Why do we need to uncharge that page at all? It is not freed.
The correct thing to do is to not online it in the first place which is
done in patch2 [1]. Even if we need to uncharge the page the reason is
not to silent the BUG, that is merely papering a issue than a real fix.
Laurent can you elaborate please.

[1] http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-26  3:13       ` Naoya Horiguchi
@ 2017-04-28  6:30         ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm, Wen Congyang

On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > >
> > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > ---
> > >  mm/memory_hotplug.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > >  		for (i = 0; i < nr_pages; i++) {
> > >  			page = pfn_to_page(start_pfn + i);
> > > +			if (PageHWPoison(page)) {
> > > +				ClearPageReserved(page);
> >
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
> 
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.

Is this patch actually correct? I am trying to wrap my head around it
but it smells like it tries to avoid the problem rather than fix it
properly. I might be wrong here of course but to me it sounds like
poisoned page should simply be offlined and keep its poison state all
the time. If the memory is hot-removed and added again we have lost the
struct page along with the state which is the expected behavior. If it
is still broken we will re-poison it.

Anyway a patch to skip over poisoned pages during online makes perfect
sense to me. The PageReserved fiddling around much less so.

Or am I missing something. Let's CC Wen Congyang for the clarification
here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-28  6:30         ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm, Wen Congyang

On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > >
> > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > ---
> > >  mm/memory_hotplug.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > >  		for (i = 0; i < nr_pages; i++) {
> > >  			page = pfn_to_page(start_pfn + i);
> > > +			if (PageHWPoison(page)) {
> > > +				ClearPageReserved(page);
> >
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
> 
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.

Is this patch actually correct? I am trying to wrap my head around it
but it smells like it tries to avoid the problem rather than fix it
properly. I might be wrong here of course but to me it sounds like
poisoned page should simply be offlined and keep its poison state all
the time. If the memory is hot-removed and added again we have lost the
struct page along with the state which is the expected behavior. If it
is still broken we will re-poison it.

Anyway a patch to skip over poisoned pages during online makes perfect
sense to me. The PageReserved fiddling around much less so.

Or am I missing something. Let's CC Wen Congyang for the clarification
here.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-28  6:30         ` Michal Hocko
@ 2017-04-28  6:50           ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:50 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm, Wen Congyang

[Drop Wen Congyang because his address bounces - we will have to find
out ourselves...]

On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > this should be skipped when onlining the pages too.
> > > >
> > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > ---
> > > >  mm/memory_hotplug.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > >  		for (i = 0; i < nr_pages; i++) {
> > > >  			page = pfn_to_page(start_pfn + i);
> > > > +			if (PageHWPoison(page)) {
> > > > +				ClearPageReserved(page);
> > >
> > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > was never offlined to begin with? Or do you expect this to be set on newly
> > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > 
> > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > that we skip the page status check for hwpoisoned pages *not* to prevent
> > memory offlining for memblocks with hwpoisoned pages. That means that
> > hwpoisoned pages can be offlined.
> 
> Is this patch actually correct? I am trying to wrap my head around it
> but it smells like it tries to avoid the problem rather than fix it
> properly. I might be wrong here of course but to me it sounds like
> poisoned page should simply be offlined and keep its poison state all
> the time. If the memory is hot-removed and added again we have lost the
> struct page along with the state which is the expected behavior. If it
> is still broken we will re-poison it.
> 
> Anyway a patch to skip over poisoned pages during online makes perfect
> sense to me. The PageReserved fiddling around much less so.
> 
> Or am I missing something. Let's CC Wen Congyang for the clarification
> here.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-28  6:50           ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:50 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm, Wen Congyang

[Drop Wen Congyang because his address bounces - we will have to find
out ourselves...]

On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > this should be skipped when onlining the pages too.
> > > >
> > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > ---
> > > >  mm/memory_hotplug.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > >  		for (i = 0; i < nr_pages; i++) {
> > > >  			page = pfn_to_page(start_pfn + i);
> > > > +			if (PageHWPoison(page)) {
> > > > +				ClearPageReserved(page);
> > >
> > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > was never offlined to begin with? Or do you expect this to be set on newly
> > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > 
> > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > that we skip the page status check for hwpoisoned pages *not* to prevent
> > memory offlining for memblocks with hwpoisoned pages. That means that
> > hwpoisoned pages can be offlined.
> 
> Is this patch actually correct? I am trying to wrap my head around it
> but it smells like it tries to avoid the problem rather than fix it
> properly. I might be wrong here of course but to me it sounds like
> poisoned page should simply be offlined and keep its poison state all
> the time. If the memory is hot-removed and added again we have lost the
> struct page along with the state which is the expected behavior. If it
> is still broken we will re-poison it.
> 
> Anyway a patch to skip over poisoned pages during online makes perfect
> sense to me. The PageReserved fiddling around much less so.
> 
> Or am I missing something. Let's CC Wen Congyang for the clarification
> here.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-28  6:50           ` Michal Hocko
@ 2017-04-28  6:51             ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm

On Fri 28-04-17 08:50:50, Michal Hocko wrote:
> [Drop Wen Congyang because his address bounces - we will have to find
> out ourselves...]

Ble, for real this time. Sorry about spamming
 
> On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> > On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > > this should be skipped when onlining the pages too.
> > > > >
> > > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > > ---
> > > > >  mm/memory_hotplug.c | 4 ++++
> > > > >  1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > > --- a/mm/memory_hotplug.c
> > > > > +++ b/mm/memory_hotplug.c
> > > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > > >  		for (i = 0; i < nr_pages; i++) {
> > > > >  			page = pfn_to_page(start_pfn + i);
> > > > > +			if (PageHWPoison(page)) {
> > > > > +				ClearPageReserved(page);
> > > >
> > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > > was never offlined to begin with? Or do you expect this to be set on newly
> > > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > > 
> > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > > that we skip the page status check for hwpoisoned pages *not* to prevent
> > > memory offlining for memblocks with hwpoisoned pages. That means that
> > > hwpoisoned pages can be offlined.
> > 
> > Is this patch actually correct? I am trying to wrap my head around it
> > but it smells like it tries to avoid the problem rather than fix it
> > properly. I might be wrong here of course but to me it sounds like
> > poisoned page should simply be offlined and keep its poison state all
> > the time. If the memory is hot-removed and added again we have lost the
> > struct page along with the state which is the expected behavior. If it
> > is still broken we will re-poison it.
> > 
> > Anyway a patch to skip over poisoned pages during online makes perfect
> > sense to me. The PageReserved fiddling around much less so.
> > 
> > Or am I missing something. Let's CC Wen Congyang for the clarification
> > here.
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-04-28  6:51             ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  6:51 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm

On Fri 28-04-17 08:50:50, Michal Hocko wrote:
> [Drop Wen Congyang because his address bounces - we will have to find
> out ourselves...]

Ble, for real this time. Sorry about spamming
 
> On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> > On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > > this should be skipped when onlining the pages too.
> > > > >
> > > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > > ---
> > > > >  mm/memory_hotplug.c | 4 ++++
> > > > >  1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > > --- a/mm/memory_hotplug.c
> > > > > +++ b/mm/memory_hotplug.c
> > > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > > >  		for (i = 0; i < nr_pages; i++) {
> > > > >  			page = pfn_to_page(start_pfn + i);
> > > > > +			if (PageHWPoison(page)) {
> > > > > +				ClearPageReserved(page);
> > > >
> > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > > was never offlined to begin with? Or do you expect this to be set on newly
> > > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > > 
> > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > > that we skip the page status check for hwpoisoned pages *not* to prevent
> > > memory offlining for memblocks with hwpoisoned pages. That means that
> > > hwpoisoned pages can be offlined.
> > 
> > Is this patch actually correct? I am trying to wrap my head around it
> > but it smells like it tries to avoid the problem rather than fix it
> > properly. I might be wrong here of course but to me it sounds like
> > poisoned page should simply be offlined and keep its poison state all
> > the time. If the memory is hot-removed and added again we have lost the
> > struct page along with the state which is the expected behavior. If it
> > is still broken we will re-poison it.
> > 
> > Anyway a patch to skip over poisoned pages during online makes perfect
> > sense to me. The PageReserved fiddling around much less so.
> > 
> > Or am I missing something. Let's CC Wen Congyang for the clarification
> > here.
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-28  6:07         ` Michal Hocko
@ 2017-04-28  7:31           ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  7:31 UTC (permalink / raw)
  To: Laurent Dufour, Andi Kleen
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Johannes Weiner,
	Vladimir Davydov

[CC Johannes and Vladimir - the patch is
http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]

On Fri 28-04-17 08:07:55, Michal Hocko wrote:
> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> > Michal Hocko <mhocko@kernel.org> writes:
> > 
> > > On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> > >> When page are poisoned, they should be uncharged from the root memory
> > >> cgroup.
> > >> 
> > >> This is required to avoid a BUG raised when the page is onlined back:
> > >> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> > >> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> > >> index:0x1
> > >> flags: 0x3ffff800200000(hwpoison)
> > >
> > > My knowledge of memory poisoning is very rudimentary but aren't those
> > > pages supposed to leak and never come back? In other words isn't the
> > > hoplug code broken because it should leave them alone?
> > 
> > Yes that would be the right interpretation. If it was really offlined
> > due to a hardware error the memory will be poisoned and any access
> > could cause a machine check.
> 
> OK, thanks for the clarification. Then I am not sure the patch is
> correct. Why do we need to uncharge that page at all?

Now, I have realized that we actually want to uncharge that page because
it will pin the memcg and we do not want to have that memcg and its
whole hierarchy pinned as well. This used to work before the charge
rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
because we used to uncharge on page cache removal.

I do not think the patch is correct, though. memcg_kmem_enabled() will
check whether kmem accounting is enabled and we are talking about page
cache pages here. You should be using mem_cgroup_uncharge instead.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-28  7:31           ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28  7:31 UTC (permalink / raw)
  To: Laurent Dufour, Andi Kleen
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Johannes Weiner,
	Vladimir Davydov

[CC Johannes and Vladimir - the patch is
http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]

On Fri 28-04-17 08:07:55, Michal Hocko wrote:
> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> > Michal Hocko <mhocko@kernel.org> writes:
> > 
> > > On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> > >> When page are poisoned, they should be uncharged from the root memory
> > >> cgroup.
> > >> 
> > >> This is required to avoid a BUG raised when the page is onlined back:
> > >> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> > >> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> > >> index:0x1
> > >> flags: 0x3ffff800200000(hwpoison)
> > >
> > > My knowledge of memory poisoning is very rudimentary but aren't those
> > > pages supposed to leak and never come back? In other words isn't the
> > > hoplug code broken because it should leave them alone?
> > 
> > Yes that would be the right interpretation. If it was really offlined
> > due to a hardware error the memory will be poisoned and any access
> > could cause a machine check.
> 
> OK, thanks for the clarification. Then I am not sure the patch is
> correct. Why do we need to uncharge that page at all?

Now, I have realized that we actually want to uncharge that page because
it will pin the memcg and we do not want to have that memcg and its
whole hierarchy pinned as well. This used to work before the charge
rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
because we used to uncharge on page cache removal.

I do not think the patch is correct, though. memcg_kmem_enabled() will
check whether kmem accounting is enabled and we are talking about page
cache pages here. You should be using mem_cgroup_uncharge instead.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-28  7:31           ` Michal Hocko
@ 2017-04-28  9:17             ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-28  9:17 UTC (permalink / raw)
  To: Michal Hocko, Andi Kleen
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Johannes Weiner,
	Vladimir Davydov

On 28/04/2017 09:31, Michal Hocko wrote:
> [CC Johannes and Vladimir - the patch is
> http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]
> 
> On Fri 28-04-17 08:07:55, Michal Hocko wrote:
>> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
>>> Michal Hocko <mhocko@kernel.org> writes:
>>>
>>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>>>>> When page are poisoned, they should be uncharged from the root memory
>>>>> cgroup.
>>>>>
>>>>> This is required to avoid a BUG raised when the page is onlined back:
>>>>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
>>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
>>>>> index:0x1
>>>>> flags: 0x3ffff800200000(hwpoison)
>>>>
>>>> My knowledge of memory poisoning is very rudimentary but aren't those
>>>> pages supposed to leak and never come back? In other words isn't the
>>>> hoplug code broken because it should leave them alone?
>>>
>>> Yes that would be the right interpretation. If it was really offlined
>>> due to a hardware error the memory will be poisoned and any access
>>> could cause a machine check.
>>
>> OK, thanks for the clarification. Then I am not sure the patch is
>> correct. Why do we need to uncharge that page at all?
> 
> Now, I have realized that we actually want to uncharge that page because
> it will pin the memcg and we do not want to have that memcg and its
> whole hierarchy pinned as well. This used to work before the charge
> rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
> because we used to uncharge on page cache removal.
> 
> I do not think the patch is correct, though. memcg_kmem_enabled() will
> check whether kmem accounting is enabled and we are talking about page
> cache pages here. You should be using mem_cgroup_uncharge instead.

Thanks for the review Michal.

I was not comfortable either with this patch.

I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
succeeds only, so not calling it if isolate_lru_page() failed.

This seems to work as well, so if everyone agree on that, I'll send a
new version soon.

Cheers,
Laurent.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-28  9:17             ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-28  9:17 UTC (permalink / raw)
  To: Michal Hocko, Andi Kleen
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Johannes Weiner,
	Vladimir Davydov

On 28/04/2017 09:31, Michal Hocko wrote:
> [CC Johannes and Vladimir - the patch is
> http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]
> 
> On Fri 28-04-17 08:07:55, Michal Hocko wrote:
>> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
>>> Michal Hocko <mhocko@kernel.org> writes:
>>>
>>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>>>>> When page are poisoned, they should be uncharged from the root memory
>>>>> cgroup.
>>>>>
>>>>> This is required to avoid a BUG raised when the page is onlined back:
>>>>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
>>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
>>>>> index:0x1
>>>>> flags: 0x3ffff800200000(hwpoison)
>>>>
>>>> My knowledge of memory poisoning is very rudimentary but aren't those
>>>> pages supposed to leak and never come back? In other words isn't the
>>>> hoplug code broken because it should leave them alone?
>>>
>>> Yes that would be the right interpretation. If it was really offlined
>>> due to a hardware error the memory will be poisoned and any access
>>> could cause a machine check.
>>
>> OK, thanks for the clarification. Then I am not sure the patch is
>> correct. Why do we need to uncharge that page at all?
> 
> Now, I have realized that we actually want to uncharge that page because
> it will pin the memcg and we do not want to have that memcg and its
> whole hierarchy pinned as well. This used to work before the charge
> rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
> because we used to uncharge on page cache removal.
> 
> I do not think the patch is correct, though. memcg_kmem_enabled() will
> check whether kmem accounting is enabled and we are talking about page
> cache pages here. You should be using mem_cgroup_uncharge instead.

Thanks for the review Michal.

I was not comfortable either with this patch.

I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
succeeds only, so not calling it if isolate_lru_page() failed.

This seems to work as well, so if everyone agree on that, I'll send a
new version soon.

Cheers,
Laurent.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-26  8:59             ` Balbir Singh
@ 2017-04-28  9:32               ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-28  9:32 UTC (permalink / raw)
  To: Balbir Singh, Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

On 26/04/2017 10:59, Balbir Singh wrote:
> On Wed, 2017-04-26 at 04:46 +0000, Naoya Horiguchi wrote:
>> On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
>>>>>>  static int delete_from_lru_cache(struct page *p)
>>>>>>  {
>>>>>> +	if (memcg_kmem_enabled())
>>>>>> +		memcg_kmem_uncharge(p, 0);
>>>>>> +
>>>>>
>>>>> The changelog is not quite clear, so we are uncharging a page using
>>>>> memcg_kmem_uncharge for a page in swap cache/page cache?
>>>>
>>>> Hi Balbir,
>>>>
>>>> Yes, in the normal page lifecycle, uncharge is done in page free time.
>>>> But in memory error handling case, in-use pages (i.e. swap cache and page
>>>> cache) are removed from normal path and they don't pass page freeing code.
>>>> So I think that this change is to keep the consistent charging for such a case.
>>>
>>> I agree we should uncharge, but looking at the API name, it seems to
>>> be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
>>> something?
>>
>> Thank you for pointing out.
>> Actually I had the same question and this surely looks strange.
>> But simply calling mem_cgroup_uncharge() here doesn't work because it
>> assumes that page_refcount(p) == 0, which is not true in hwpoison context.
>> We need some other clearer way or at least some justifying comment about
>> why this is ok.
>>
> 
> We should call mem_cgroup_uncharge() after isolate_lru_page()/put_page().

Thanks for the review Naoya and Balbir,

I changed the patch to call mem_cgroup_uncharge() once
isolate_lru_page() succeeded, but before calling put_page().
It seems to work fine.

> We could check if page_count() is 0 or force if required (!MF_RECOVERED &&
> !MF_DELAYED). We could even skip the VM_BUG_ON if the page is poisoned.

This doesn't seem to be needed. Am I still missing something here ?

Cheers,
Laurent.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-28  9:32               ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-04-28  9:32 UTC (permalink / raw)
  To: Balbir Singh, Naoya Horiguchi; +Cc: linux-kernel, linux-mm, akpm

On 26/04/2017 10:59, Balbir Singh wrote:
> On Wed, 2017-04-26 at 04:46 +0000, Naoya Horiguchi wrote:
>> On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
>>>>>>  static int delete_from_lru_cache(struct page *p)
>>>>>>  {
>>>>>> +	if (memcg_kmem_enabled())
>>>>>> +		memcg_kmem_uncharge(p, 0);
>>>>>> +
>>>>>
>>>>> The changelog is not quite clear, so we are uncharging a page using
>>>>> memcg_kmem_uncharge for a page in swap cache/page cache?
>>>>
>>>> Hi Balbir,
>>>>
>>>> Yes, in the normal page lifecycle, uncharge is done in page free time.
>>>> But in memory error handling case, in-use pages (i.e. swap cache and page
>>>> cache) are removed from normal path and they don't pass page freeing code.
>>>> So I think that this change is to keep the consistent charging for such a case.
>>>
>>> I agree we should uncharge, but looking at the API name, it seems to
>>> be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
>>> something?
>>
>> Thank you for pointing out.
>> Actually I had the same question and this surely looks strange.
>> But simply calling mem_cgroup_uncharge() here doesn't work because it
>> assumes that page_refcount(p) == 0, which is not true in hwpoison context.
>> We need some other clearer way or at least some justifying comment about
>> why this is ok.
>>
> 
> We should call mem_cgroup_uncharge() after isolate_lru_page()/put_page().

Thanks for the review Naoya and Balbir,

I changed the patch to call mem_cgroup_uncharge() once
isolate_lru_page() succeeded, but before calling put_page().
It seems to work fine.

> We could check if page_count() is 0 or force if required (!MF_RECOVERED &&
> !MF_DELAYED). We could even skip the VM_BUG_ON if the page is poisoned.

This doesn't seem to be needed. Am I still missing something here ?

Cheers,
Laurent.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-28  9:17             ` Laurent Dufour
@ 2017-04-28 13:48               ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28 13:48 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Andi Kleen, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Johannes Weiner, Vladimir Davydov

On Fri 28-04-17 11:17:34, Laurent Dufour wrote:
> On 28/04/2017 09:31, Michal Hocko wrote:
> > [CC Johannes and Vladimir - the patch is
> > http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]
> > 
> > On Fri 28-04-17 08:07:55, Michal Hocko wrote:
> >> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> >>> Michal Hocko <mhocko@kernel.org> writes:
> >>>
> >>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> >>>>> When page are poisoned, they should be uncharged from the root memory
> >>>>> cgroup.
> >>>>>
> >>>>> This is required to avoid a BUG raised when the page is onlined back:
> >>>>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> >>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> >>>>> index:0x1
> >>>>> flags: 0x3ffff800200000(hwpoison)
> >>>>
> >>>> My knowledge of memory poisoning is very rudimentary but aren't those
> >>>> pages supposed to leak and never come back? In other words isn't the
> >>>> hoplug code broken because it should leave them alone?
> >>>
> >>> Yes that would be the right interpretation. If it was really offlined
> >>> due to a hardware error the memory will be poisoned and any access
> >>> could cause a machine check.
> >>
> >> OK, thanks for the clarification. Then I am not sure the patch is
> >> correct. Why do we need to uncharge that page at all?
> > 
> > Now, I have realized that we actually want to uncharge that page because
> > it will pin the memcg and we do not want to have that memcg and its
> > whole hierarchy pinned as well. This used to work before the charge
> > rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
> > because we used to uncharge on page cache removal.
> > 
> > I do not think the patch is correct, though. memcg_kmem_enabled() will
> > check whether kmem accounting is enabled and we are talking about page
> > cache pages here. You should be using mem_cgroup_uncharge instead.
> 
> Thanks for the review Michal.
> 
> I was not comfortable either with this patch.
> 
> I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
> succeeds only, so not calling it if isolate_lru_page() failed.

Wait a moment. This cannot possibly work. isolate_lru_page asumes page
count > 0 and increments the counter so the resulting page count is > 1
I have only now realized that we have VM_BUG_ON_PAGE(page_count(page), page)
in uncharge_list().

This is getting quite hairy. What is the expected page count of the
hwpoison page? I guess we would need to update the VM_BUG_ON in the
memcg uncharge code to ignore the page count of hwpoison pages if it can
be arbitrary.

Before we go any further, is there any documentation about the expected
behavior and the state of the hwpoison pages? I have a very bad feeling
that the current behavior is quite arbitrary and "testing driven"
holes plugging will make it only more messy. So let's start with the
clear description of what should happen with the hwpoison pages.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-04-28 13:48               ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-04-28 13:48 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Andi Kleen, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Johannes Weiner, Vladimir Davydov

On Fri 28-04-17 11:17:34, Laurent Dufour wrote:
> On 28/04/2017 09:31, Michal Hocko wrote:
> > [CC Johannes and Vladimir - the patch is
> > http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]
> > 
> > On Fri 28-04-17 08:07:55, Michal Hocko wrote:
> >> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> >>> Michal Hocko <mhocko@kernel.org> writes:
> >>>
> >>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> >>>>> When page are poisoned, they should be uncharged from the root memory
> >>>>> cgroup.
> >>>>>
> >>>>> This is required to avoid a BUG raised when the page is onlined back:
> >>>>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
> >>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
> >>>>> index:0x1
> >>>>> flags: 0x3ffff800200000(hwpoison)
> >>>>
> >>>> My knowledge of memory poisoning is very rudimentary but aren't those
> >>>> pages supposed to leak and never come back? In other words isn't the
> >>>> hoplug code broken because it should leave them alone?
> >>>
> >>> Yes that would be the right interpretation. If it was really offlined
> >>> due to a hardware error the memory will be poisoned and any access
> >>> could cause a machine check.
> >>
> >> OK, thanks for the clarification. Then I am not sure the patch is
> >> correct. Why do we need to uncharge that page at all?
> > 
> > Now, I have realized that we actually want to uncharge that page because
> > it will pin the memcg and we do not want to have that memcg and its
> > whole hierarchy pinned as well. This used to work before the charge
> > rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
> > because we used to uncharge on page cache removal.
> > 
> > I do not think the patch is correct, though. memcg_kmem_enabled() will
> > check whether kmem accounting is enabled and we are talking about page
> > cache pages here. You should be using mem_cgroup_uncharge instead.
> 
> Thanks for the review Michal.
> 
> I was not comfortable either with this patch.
> 
> I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
> succeeds only, so not calling it if isolate_lru_page() failed.

Wait a moment. This cannot possibly work. isolate_lru_page asumes page
count > 0 and increments the counter so the resulting page count is > 1
I have only now realized that we have VM_BUG_ON_PAGE(page_count(page), page)
in uncharge_list().

This is getting quite hairy. What is the expected page count of the
hwpoison page? I guess we would need to update the VM_BUG_ON in the
memcg uncharge code to ignore the page count of hwpoison pages if it can
be arbitrary.

Before we go any further, is there any documentation about the expected
behavior and the state of the hwpoison pages? I have a very bad feeling
that the current behavior is quite arbitrary and "testing driven"
holes plugging will make it only more messy. So let's start with the
clear description of what should happen with the hwpoison pages.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-04-28 13:48               ` Michal Hocko
@ 2017-05-02 14:59                 ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-05-02 14:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andi Kleen, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Johannes Weiner, Vladimir Davydov

On 28/04/2017 15:48, Michal Hocko wrote:
> On Fri 28-04-17 11:17:34, Laurent Dufour wrote:
>> On 28/04/2017 09:31, Michal Hocko wrote:
>>> [CC Johannes and Vladimir - the patch is
>>> http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]
>>>
>>> On Fri 28-04-17 08:07:55, Michal Hocko wrote:
>>>> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
>>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>>
>>>>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>>>>>>> When page are poisoned, they should be uncharged from the root memory
>>>>>>> cgroup.
>>>>>>>
>>>>>>> This is required to avoid a BUG raised when the page is onlined back:
>>>>>>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
>>>>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
>>>>>>> index:0x1
>>>>>>> flags: 0x3ffff800200000(hwpoison)
>>>>>>
>>>>>> My knowledge of memory poisoning is very rudimentary but aren't those
>>>>>> pages supposed to leak and never come back? In other words isn't the
>>>>>> hoplug code broken because it should leave them alone?
>>>>>
>>>>> Yes that would be the right interpretation. If it was really offlined
>>>>> due to a hardware error the memory will be poisoned and any access
>>>>> could cause a machine check.
>>>>
>>>> OK, thanks for the clarification. Then I am not sure the patch is
>>>> correct. Why do we need to uncharge that page at all?
>>>
>>> Now, I have realized that we actually want to uncharge that page because
>>> it will pin the memcg and we do not want to have that memcg and its
>>> whole hierarchy pinned as well. This used to work before the charge
>>> rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
>>> because we used to uncharge on page cache removal.
>>>
>>> I do not think the patch is correct, though. memcg_kmem_enabled() will
>>> check whether kmem accounting is enabled and we are talking about page
>>> cache pages here. You should be using mem_cgroup_uncharge instead.
>>
>> Thanks for the review Michal.
>>
>> I was not comfortable either with this patch.
>>
>> I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
>> succeeds only, so not calling it if isolate_lru_page() failed.
> 
> Wait a moment. This cannot possibly work. isolate_lru_page asumes page
> count > 0 and increments the counter so the resulting page count is > 1
> I have only now realized that we have VM_BUG_ON_PAGE(page_count(page), page)
> in uncharge_list().

My mistake, my kernel was not build with CONFIG_DEBUG_VM set.
You're right this cannot work this way.

> This is getting quite hairy. What is the expected page count of the
> hwpoison page? I guess we would need to update the VM_BUG_ON in the
> memcg uncharge code to ignore the page count of hwpoison pages if it can
> be arbitrary.

Based on the experiment I did, page count == 2 when isolate_lru_page()
succeeds, even in the case of a poisoned page. In my case I think this
is because the page is still used by the process which is calling madvise().

I'm wondering if I'm looking at the right place. May be the poisoned
page should remain attach to the memory_cgroup until no one is using it.
In that case this means that something should be done when the page is
off-lined... I've to dig further here.

> 
> Before we go any further, is there any documentation about the expected
> behavior and the state of the hwpoison pages? I have a very bad feeling
> that the current behavior is quite arbitrary and "testing driven"
> holes plugging will make it only more messy. So let's start with the
> clear description of what should happen with the hwpoison pages.

I didn't find any documentation about that. The root cause is that a bug
message is displayed when a poisoned page is off-lined, may be this is
in that path that something is missing.

Cheers,
Laurent.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-02 14:59                 ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-05-02 14:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andi Kleen, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Johannes Weiner, Vladimir Davydov

On 28/04/2017 15:48, Michal Hocko wrote:
> On Fri 28-04-17 11:17:34, Laurent Dufour wrote:
>> On 28/04/2017 09:31, Michal Hocko wrote:
>>> [CC Johannes and Vladimir - the patch is
>>> http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-ldufour@linux.vnet.ibm.com]
>>>
>>> On Fri 28-04-17 08:07:55, Michal Hocko wrote:
>>>> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
>>>>> Michal Hocko <mhocko@kernel.org> writes:
>>>>>
>>>>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>>>>>>> When page are poisoned, they should be uncharged from the root memory
>>>>>>> cgroup.
>>>>>>>
>>>>>>> This is required to avoid a BUG raised when the page is onlined back:
>>>>>>> BUG: Bad page state in process mem-on-off-test  pfn:7ae3b
>>>>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping:          (null)
>>>>>>> index:0x1
>>>>>>> flags: 0x3ffff800200000(hwpoison)
>>>>>>
>>>>>> My knowledge of memory poisoning is very rudimentary but aren't those
>>>>>> pages supposed to leak and never come back? In other words isn't the
>>>>>> hoplug code broken because it should leave them alone?
>>>>>
>>>>> Yes that would be the right interpretation. If it was really offlined
>>>>> due to a hardware error the memory will be poisoned and any access
>>>>> could cause a machine check.
>>>>
>>>> OK, thanks for the clarification. Then I am not sure the patch is
>>>> correct. Why do we need to uncharge that page at all?
>>>
>>> Now, I have realized that we actually want to uncharge that page because
>>> it will pin the memcg and we do not want to have that memcg and its
>>> whole hierarchy pinned as well. This used to work before the charge
>>> rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
>>> because we used to uncharge on page cache removal.
>>>
>>> I do not think the patch is correct, though. memcg_kmem_enabled() will
>>> check whether kmem accounting is enabled and we are talking about page
>>> cache pages here. You should be using mem_cgroup_uncharge instead.
>>
>> Thanks for the review Michal.
>>
>> I was not comfortable either with this patch.
>>
>> I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
>> succeeds only, so not calling it if isolate_lru_page() failed.
> 
> Wait a moment. This cannot possibly work. isolate_lru_page asumes page
> count > 0 and increments the counter so the resulting page count is > 1
> I have only now realized that we have VM_BUG_ON_PAGE(page_count(page), page)
> in uncharge_list().

My mistake, my kernel was not build with CONFIG_DEBUG_VM set.
You're right this cannot work this way.

> This is getting quite hairy. What is the expected page count of the
> hwpoison page? I guess we would need to update the VM_BUG_ON in the
> memcg uncharge code to ignore the page count of hwpoison pages if it can
> be arbitrary.

Based on the experiment I did, page count == 2 when isolate_lru_page()
succeeds, even in the case of a poisoned page. In my case I think this
is because the page is still used by the process which is calling madvise().

I'm wondering if I'm looking at the right place. May be the poisoned
page should remain attach to the memory_cgroup until no one is using it.
In that case this means that something should be done when the page is
off-lined... I've to dig further here.

> 
> Before we go any further, is there any documentation about the expected
> behavior and the state of the hwpoison pages? I have a very bad feeling
> that the current behavior is quite arbitrary and "testing driven"
> holes plugging will make it only more messy. So let's start with the
> clear description of what should happen with the hwpoison pages.

I didn't find any documentation about that. The root cause is that a bug
message is displayed when a poisoned page is off-lined, may be this is
in that path that something is missing.

Cheers,
Laurent.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-02 14:59                 ` Laurent Dufour
@ 2017-05-02 18:55                   ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-05-02 18:55 UTC (permalink / raw)
  To: Andi Kleen, Johannes Weiner
  Cc: Laurent Dufour, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Vladimir Davydov

On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> On 28/04/2017 15:48, Michal Hocko wrote:
[...]
> > This is getting quite hairy. What is the expected page count of the
> > hwpoison page?

OK, so from the quick check of the hwpoison code it seems that the ref
count will be > 1 (from get_hwpoison_page).

> > I guess we would need to update the VM_BUG_ON in the
> > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > be arbitrary.
> 
> Based on the experiment I did, page count == 2 when isolate_lru_page()
> succeeds, even in the case of a poisoned page.

that would make some sense to me. The page should have been already
unmapped therefore but memory_failure increases the ref count and 1 is
for isolate_lru_page().

> In my case I think this
> is because the page is still used by the process which is calling madvise().
> 
> I'm wondering if I'm looking at the right place. May be the poisoned
> page should remain attach to the memory_cgroup until no one is using it.
> In that case this means that something should be done when the page is
> off-lined... I've to dig further here.

No, AFAIU the page will not drop the reference count down to 0 in most
cases. Maybe there are some scenarios where this can happen but I would
expect that the poisoned page will be mapped and in use most of the time
and won't drop down 0. And then we should really uncharge it because it
will pin the memcg and make it unfreeable which doesn't seem to be what
we want.  So does the following work reasonable? Andi, Johannes, what do
you think? I cannot say I would be really comfortable touching hwpoison
code as I really do not understand the workflow. Maybe we want to move
this uncharge down to memory_failure() right before we report success?
---
>From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 2 May 2017 20:32:24 +0200
Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages

Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
his particular case he has hit a bad_page("page still charged to cgroup")
when onlining a hwpoison page. While this looks like something that shouldn't
happen in the first place because onlining hwpages and returning them to
the page allocator makes only little sense it shows a real problem.

hwpoison pages do not get freed usually so we do not uncharge them (at
least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
reference for each charged page")) as well and so the mem_cgroup and the
associated state will never go away. Fix this leak by forcibly
uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
have to tweak uncharge_list because it cannot rely on zero ref count
for these pages.

Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c     | 2 +-
 mm/memory-failure.c | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 16c556ac103d..4cf26059adb1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
 		next = page->lru.next;
 
 		VM_BUG_ON_PAGE(PageLRU(page), page);
-		VM_BUG_ON_PAGE(page_count(page), page);
+		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
 
 		if (!page->mem_cgroup)
 			continue;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8a6bd3a9eb1e..4497d9619bb4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
 		 */
 		ClearPageActive(p);
 		ClearPageUnevictable(p);
+
+		/*
+		 * Poisoned page might never drop its ref count to 0 so we have to
+		 * uncharge it manually from its memcg.
+		 */
+		mem_cgroup_uncharge(p);
+
 		/*
 		 * drop the page count elevated by isolate_lru_page()
 		 */
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-02 18:55                   ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-05-02 18:55 UTC (permalink / raw)
  To: Andi Kleen, Johannes Weiner
  Cc: Laurent Dufour, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Vladimir Davydov

On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> On 28/04/2017 15:48, Michal Hocko wrote:
[...]
> > This is getting quite hairy. What is the expected page count of the
> > hwpoison page?

OK, so from the quick check of the hwpoison code it seems that the ref
count will be > 1 (from get_hwpoison_page).

> > I guess we would need to update the VM_BUG_ON in the
> > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > be arbitrary.
> 
> Based on the experiment I did, page count == 2 when isolate_lru_page()
> succeeds, even in the case of a poisoned page.

that would make some sense to me. The page should have been already
unmapped therefore but memory_failure increases the ref count and 1 is
for isolate_lru_page().

> In my case I think this
> is because the page is still used by the process which is calling madvise().
> 
> I'm wondering if I'm looking at the right place. May be the poisoned
> page should remain attach to the memory_cgroup until no one is using it.
> In that case this means that something should be done when the page is
> off-lined... I've to dig further here.

No, AFAIU the page will not drop the reference count down to 0 in most
cases. Maybe there are some scenarios where this can happen but I would
expect that the poisoned page will be mapped and in use most of the time
and won't drop down 0. And then we should really uncharge it because it
will pin the memcg and make it unfreeable which doesn't seem to be what
we want.  So does the following work reasonable? Andi, Johannes, what do
you think? I cannot say I would be really comfortable touching hwpoison
code as I really do not understand the workflow. Maybe we want to move
this uncharge down to memory_failure() right before we report success?
---

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-02 18:55                   ` Michal Hocko
@ 2017-05-03 11:34                     ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-05-03 11:34 UTC (permalink / raw)
  To: Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Vladimir Davydov

On 02/05/2017 20:55, Michal Hocko wrote:
> On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
>> On 28/04/2017 15:48, Michal Hocko wrote:
> [...]
>>> This is getting quite hairy. What is the expected page count of the
>>> hwpoison page?
> 
> OK, so from the quick check of the hwpoison code it seems that the ref
> count will be > 1 (from get_hwpoison_page).
> 
>>> I guess we would need to update the VM_BUG_ON in the
>>> memcg uncharge code to ignore the page count of hwpoison pages if it can
>>> be arbitrary.
>>
>> Based on the experiment I did, page count == 2 when isolate_lru_page()
>> succeeds, even in the case of a poisoned page.
> 
> that would make some sense to me. The page should have been already
> unmapped therefore but memory_failure increases the ref count and 1 is
> for isolate_lru_page().
> 
>> In my case I think this
>> is because the page is still used by the process which is calling madvise().
>>
>> I'm wondering if I'm looking at the right place. May be the poisoned
>> page should remain attach to the memory_cgroup until no one is using it.
>> In that case this means that something should be done when the page is
>> off-lined... I've to dig further here.
> 
> No, AFAIU the page will not drop the reference count down to 0 in most
> cases. Maybe there are some scenarios where this can happen but I would
> expect that the poisoned page will be mapped and in use most of the time
> and won't drop down 0. And then we should really uncharge it because it
> will pin the memcg and make it unfreeable which doesn't seem to be what
> we want.  So does the following work reasonable? Andi, Johannes, what do
> you think? I cannot say I would be really comfortable touching hwpoison
> code as I really do not understand the workflow. Maybe we want to move
> this uncharge down to memory_failure() right before we report success?
> ---
> From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 2 May 2017 20:32:24 +0200
> Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> 
> Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> his particular case he has hit a bad_page("page still charged to cgroup")
> when onlining a hwpoison page. While this looks like something that shouldn't
> happen in the first place because onlining hwpages and returning them to
> the page allocator makes only little sense it shows a real problem.
> 
> hwpoison pages do not get freed usually so we do not uncharge them (at
> least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> reference for each charged page")) as well and so the mem_cgroup and the
> associated state will never go away. Fix this leak by forcibly
> uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> have to tweak uncharge_list because it cannot rely on zero ref count
> for these pages.
> 
> Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

FWIW:
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c     | 2 +-
>  mm/memory-failure.c | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 16c556ac103d..4cf26059adb1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>  		next = page->lru.next;
> 
>  		VM_BUG_ON_PAGE(PageLRU(page), page);
> -		VM_BUG_ON_PAGE(page_count(page), page);
> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
> 
>  		if (!page->mem_cgroup)
>  			continue;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8a6bd3a9eb1e..4497d9619bb4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>  		 */
>  		ClearPageActive(p);
>  		ClearPageUnevictable(p);
> +
> +		/*
> +		 * Poisoned page might never drop its ref count to 0 so we have to
> +		 * uncharge it manually from its memcg.
> +		 */
> +		mem_cgroup_uncharge(p);
> +
>  		/*
>  		 * drop the page count elevated by isolate_lru_page()
>  		 */
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-03 11:34                     ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-05-03 11:34 UTC (permalink / raw)
  To: Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Vladimir Davydov

On 02/05/2017 20:55, Michal Hocko wrote:
> On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
>> On 28/04/2017 15:48, Michal Hocko wrote:
> [...]
>>> This is getting quite hairy. What is the expected page count of the
>>> hwpoison page?
> 
> OK, so from the quick check of the hwpoison code it seems that the ref
> count will be > 1 (from get_hwpoison_page).
> 
>>> I guess we would need to update the VM_BUG_ON in the
>>> memcg uncharge code to ignore the page count of hwpoison pages if it can
>>> be arbitrary.
>>
>> Based on the experiment I did, page count == 2 when isolate_lru_page()
>> succeeds, even in the case of a poisoned page.
> 
> that would make some sense to me. The page should have been already
> unmapped therefore but memory_failure increases the ref count and 1 is
> for isolate_lru_page().
> 
>> In my case I think this
>> is because the page is still used by the process which is calling madvise().
>>
>> I'm wondering if I'm looking at the right place. May be the poisoned
>> page should remain attach to the memory_cgroup until no one is using it.
>> In that case this means that something should be done when the page is
>> off-lined... I've to dig further here.
> 
> No, AFAIU the page will not drop the reference count down to 0 in most
> cases. Maybe there are some scenarios where this can happen but I would
> expect that the poisoned page will be mapped and in use most of the time
> and won't drop down 0. And then we should really uncharge it because it
> will pin the memcg and make it unfreeable which doesn't seem to be what
> we want.  So does the following work reasonable? Andi, Johannes, what do
> you think? I cannot say I would be really comfortable touching hwpoison
> code as I really do not understand the workflow. Maybe we want to move
> this uncharge down to memory_failure() right before we report success?
> ---
> From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 2 May 2017 20:32:24 +0200
> Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> 
> Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> his particular case he has hit a bad_page("page still charged to cgroup")
> when onlining a hwpoison page. While this looks like something that shouldn't
> happen in the first place because onlining hwpages and returning them to
> the page allocator makes only little sense it shows a real problem.
> 
> hwpoison pages do not get freed usually so we do not uncharge them (at
> least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> reference for each charged page")) as well and so the mem_cgroup and the
> associated state will never go away. Fix this leak by forcibly
> uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> have to tweak uncharge_list because it cannot rely on zero ref count
> for these pages.
> 
> Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

FWIW:
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c     | 2 +-
>  mm/memory-failure.c | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 16c556ac103d..4cf26059adb1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>  		next = page->lru.next;
> 
>  		VM_BUG_ON_PAGE(PageLRU(page), page);
> -		VM_BUG_ON_PAGE(page_count(page), page);
> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
> 
>  		if (!page->mem_cgroup)
>  			continue;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8a6bd3a9eb1e..4497d9619bb4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>  		 */
>  		ClearPageActive(p);
>  		ClearPageUnevictable(p);
> +
> +		/*
> +		 * Poisoned page might never drop its ref count to 0 so we have to
> +		 * uncharge it manually from its memcg.
> +		 */
> +		mem_cgroup_uncharge(p);
> +
>  		/*
>  		 * drop the page count elevated by isolate_lru_page()
>  		 */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-02 18:55                   ` Michal Hocko
@ 2017-05-04  1:21                     ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-05-04  1:21 UTC (permalink / raw)
  To: Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Laurent Dufour, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Vladimir Davydov

> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>  		next = page->lru.next;
>  
>  		VM_BUG_ON_PAGE(PageLRU(page), page);
> -		VM_BUG_ON_PAGE(page_count(page), page);
> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
>  
>  		if (!page->mem_cgroup)
>  			continue;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8a6bd3a9eb1e..4497d9619bb4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>  		 */
>  		ClearPageActive(p);
>  		ClearPageUnevictable(p);
> +
> +		/*
> +		 * Poisoned page might never drop its ref count to 0 so we have to
> +		 * uncharge it manually from its memcg.
> +		 */
> +		mem_cgroup_uncharge(p);
> +

Yep, that is the right fix

https://lkml.org/lkml/2017/4/26/133

Reviewed-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-04  1:21                     ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-05-04  1:21 UTC (permalink / raw)
  To: Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Laurent Dufour, Naoya Horiguchi, linux-kernel, linux-mm, akpm,
	Vladimir Davydov

> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>  		next = page->lru.next;
>  
>  		VM_BUG_ON_PAGE(PageLRU(page), page);
> -		VM_BUG_ON_PAGE(page_count(page), page);
> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
>  
>  		if (!page->mem_cgroup)
>  			continue;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8a6bd3a9eb1e..4497d9619bb4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>  		 */
>  		ClearPageActive(p);
>  		ClearPageUnevictable(p);
> +
> +		/*
> +		 * Poisoned page might never drop its ref count to 0 so we have to
> +		 * uncharge it manually from its memcg.
> +		 */
> +		mem_cgroup_uncharge(p);
> +

Yep, that is the right fix

https://lkml.org/lkml/2017/4/26/133

Reviewed-by: Balbir Singh <bsingharora@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-02 18:55                   ` Michal Hocko
@ 2017-05-08  2:58                     ` Naoya Horiguchi
  -1 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-05-08  2:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andi Kleen, Johannes Weiner, Laurent Dufour, linux-kernel,
	linux-mm, akpm, Vladimir Davydov

On Tue, May 02, 2017 at 08:55:07PM +0200, Michal Hocko wrote:
> On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> > On 28/04/2017 15:48, Michal Hocko wrote:
> [...]
> > > This is getting quite hairy. What is the expected page count of the
> > > hwpoison page?
> 
> OK, so from the quick check of the hwpoison code it seems that the ref
> count will be > 1 (from get_hwpoison_page).
> 
> > > I guess we would need to update the VM_BUG_ON in the
> > > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > > be arbitrary.
> > 
> > Based on the experiment I did, page count == 2 when isolate_lru_page()
> > succeeds, even in the case of a poisoned page.
> 
> that would make some sense to me. The page should have been already
> unmapped therefore but memory_failure increases the ref count and 1 is
> for isolate_lru_page().

# sorry for late reply, I was on holidays last week...

Right, and the refcount taken for memory_failure is not freed after
memory_failure() returns. unpoison_memory() does free the refcount.

> 
> > In my case I think this
> > is because the page is still used by the process which is calling madvise().
> > 
> > I'm wondering if I'm looking at the right place. May be the poisoned
> > page should remain attach to the memory_cgroup until no one is using it.
> > In that case this means that something should be done when the page is
> > off-lined... I've to dig further here.
> 
> No, AFAIU the page will not drop the reference count down to 0 in most
> cases. Maybe there are some scenarios where this can happen but I would
> expect that the poisoned page will be mapped and in use most of the time
> and won't drop down 0. And then we should really uncharge it because it
> will pin the memcg and make it unfreeable which doesn't seem to be what
> we want.  So does the following work reasonable? Andi, Johannes, what do
> you think? I cannot say I would be really comfortable touching hwpoison
> code as I really do not understand the workflow. Maybe we want to move
> this uncharge down to memory_failure() right before we report success?

memory_failure() can be called for any types of page (including slab or
any kernel/driver pages), and the reported problem seems happen only on
in-use user pages, so uncharging in delete_from_lru_cache() as done below
looks better to me.

> ---
> From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 2 May 2017 20:32:24 +0200
> Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> 
> Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> his particular case he has hit a bad_page("page still charged to cgroup")
> when onlining a hwpoison page.

> While this looks like something that shouldn't
> happen in the first place because onlining hwpages and returning them to
> the page allocator makes only little sense it shows a real problem.
> 
> hwpoison pages do not get freed usually so we do not uncharge them (at
> least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> reference for each charged page")) as well and so the mem_cgroup and the
> associated state will never go away. Fix this leak by forcibly
> uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> have to tweak uncharge_list because it cannot rely on zero ref count
> for these pages.
> 
> Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
>  mm/memcontrol.c     | 2 +-
>  mm/memory-failure.c | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 16c556ac103d..4cf26059adb1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>  		next = page->lru.next;
>  
>  		VM_BUG_ON_PAGE(PageLRU(page), page);
> -		VM_BUG_ON_PAGE(page_count(page), page);
> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
>  
>  		if (!page->mem_cgroup)
>  			continue;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8a6bd3a9eb1e..4497d9619bb4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>  		 */
>  		ClearPageActive(p);
>  		ClearPageUnevictable(p);
> +
> +		/*
> +		 * Poisoned page might never drop its ref count to 0 so we have to
> +		 * uncharge it manually from its memcg.
> +		 */
> +		mem_cgroup_uncharge(p);
> +
>  		/*
>  		 * drop the page count elevated by isolate_lru_page()
>  		 */
> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-08  2:58                     ` Naoya Horiguchi
  0 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-05-08  2:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andi Kleen, Johannes Weiner, Laurent Dufour, linux-kernel,
	linux-mm, akpm, Vladimir Davydov

On Tue, May 02, 2017 at 08:55:07PM +0200, Michal Hocko wrote:
> On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> > On 28/04/2017 15:48, Michal Hocko wrote:
> [...]
> > > This is getting quite hairy. What is the expected page count of the
> > > hwpoison page?
> 
> OK, so from the quick check of the hwpoison code it seems that the ref
> count will be > 1 (from get_hwpoison_page).
> 
> > > I guess we would need to update the VM_BUG_ON in the
> > > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > > be arbitrary.
> > 
> > Based on the experiment I did, page count == 2 when isolate_lru_page()
> > succeeds, even in the case of a poisoned page.
> 
> that would make some sense to me. The page should have been already
> unmapped therefore but memory_failure increases the ref count and 1 is
> for isolate_lru_page().

# sorry for late reply, I was on holidays last week...

Right, and the refcount taken for memory_failure is not freed after
memory_failure() returns. unpoison_memory() does free the refcount.

> 
> > In my case I think this
> > is because the page is still used by the process which is calling madvise().
> > 
> > I'm wondering if I'm looking at the right place. May be the poisoned
> > page should remain attach to the memory_cgroup until no one is using it.
> > In that case this means that something should be done when the page is
> > off-lined... I've to dig further here.
> 
> No, AFAIU the page will not drop the reference count down to 0 in most
> cases. Maybe there are some scenarios where this can happen but I would
> expect that the poisoned page will be mapped and in use most of the time
> and won't drop down 0. And then we should really uncharge it because it
> will pin the memcg and make it unfreeable which doesn't seem to be what
> we want.  So does the following work reasonable? Andi, Johannes, what do
> you think? I cannot say I would be really comfortable touching hwpoison
> code as I really do not understand the workflow. Maybe we want to move
> this uncharge down to memory_failure() right before we report success?

memory_failure() can be called for any types of page (including slab or
any kernel/driver pages), and the reported problem seems happen only on
in-use user pages, so uncharging in delete_from_lru_cache() as done below
looks better to me.

> ---
> From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 2 May 2017 20:32:24 +0200
> Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> 
> Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> his particular case he has hit a bad_page("page still charged to cgroup")
> when onlining a hwpoison page.

> While this looks like something that shouldn't
> happen in the first place because onlining hwpages and returning them to
> the page allocator makes only little sense it shows a real problem.
> 
> hwpoison pages do not get freed usually so we do not uncharge them (at
> least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> reference for each charged page")) as well and so the mem_cgroup and the
> associated state will never go away. Fix this leak by forcibly
> uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> have to tweak uncharge_list because it cannot rely on zero ref count
> for these pages.
> 
> Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

> ---
>  mm/memcontrol.c     | 2 +-
>  mm/memory-failure.c | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 16c556ac103d..4cf26059adb1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>  		next = page->lru.next;
>  
>  		VM_BUG_ON_PAGE(PageLRU(page), page);
> -		VM_BUG_ON_PAGE(page_count(page), page);
> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
>  
>  		if (!page->mem_cgroup)
>  			continue;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 8a6bd3a9eb1e..4497d9619bb4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>  		 */
>  		ClearPageActive(p);
>  		ClearPageUnevictable(p);
> +
> +		/*
> +		 * Poisoned page might never drop its ref count to 0 so we have to
> +		 * uncharge it manually from its memcg.
> +		 */
> +		mem_cgroup_uncharge(p);
> +
>  		/*
>  		 * drop the page count elevated by isolate_lru_page()
>  		 */
> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-04  1:21                     ` Balbir Singh
@ 2017-05-08 10:42                       ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-05-08 10:42 UTC (permalink / raw)
  To: Balbir Singh, Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Vladimir Davydov

On 04/05/2017 03:21, Balbir Singh wrote:
>> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>>  		next = page->lru.next;
>>  
>>  		VM_BUG_ON_PAGE(PageLRU(page), page);
>> -		VM_BUG_ON_PAGE(page_count(page), page);
>> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
>>  
>>  		if (!page->mem_cgroup)
>>  			continue;
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 8a6bd3a9eb1e..4497d9619bb4 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>>  		 */
>>  		ClearPageActive(p);
>>  		ClearPageUnevictable(p);
>> +
>> +		/*
>> +		 * Poisoned page might never drop its ref count to 0 so we have to
>> +		 * uncharge it manually from its memcg.
>> +		 */
>> +		mem_cgroup_uncharge(p);
>> +
> 
> Yep, that is the right fix
> 
> https://lkml.org/lkml/2017/4/26/133

Sorry Balbir,

You pointed this out since the beginning but I missed your comment.
My mistake.

Thanks,
Laurent.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-08 10:42                       ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2017-05-08 10:42 UTC (permalink / raw)
  To: Balbir Singh, Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Vladimir Davydov

On 04/05/2017 03:21, Balbir Singh wrote:
>> @@ -5527,7 +5527,7 @@ static void uncharge_list(struct list_head *page_list)
>>  		next = page->lru.next;
>>  
>>  		VM_BUG_ON_PAGE(PageLRU(page), page);
>> -		VM_BUG_ON_PAGE(page_count(page), page);
>> +		VM_BUG_ON_PAGE(!PageHWPoison(page) && page_count(page), page);
>>  
>>  		if (!page->mem_cgroup)
>>  			continue;
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 8a6bd3a9eb1e..4497d9619bb4 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -541,6 +541,13 @@ static int delete_from_lru_cache(struct page *p)
>>  		 */
>>  		ClearPageActive(p);
>>  		ClearPageUnevictable(p);
>> +
>> +		/*
>> +		 * Poisoned page might never drop its ref count to 0 so we have to
>> +		 * uncharge it manually from its memcg.
>> +		 */
>> +		mem_cgroup_uncharge(p);
>> +
> 
> Yep, that is the right fix
> 
> https://lkml.org/lkml/2017/4/26/133

Sorry Balbir,

You pointed this out since the beginning but I missed your comment.
My mistake.

Thanks,
Laurent.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-08 10:42                       ` Laurent Dufour
@ 2017-05-09  1:41                         ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-05-09  1:41 UTC (permalink / raw)
  To: Laurent Dufour, Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Vladimir Davydov

On Mon, 2017-05-08 at 12:42 +0200, Laurent Dufour wrote:
> Sorry Balbir,
> 
> You pointed this out since the beginning but I missed your comment.
> My mistake.
>

No worries, as long as the right thing gets in

Balbir Singh 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-09  1:41                         ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2017-05-09  1:41 UTC (permalink / raw)
  To: Laurent Dufour, Michal Hocko, Andi Kleen, Johannes Weiner
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, akpm, Vladimir Davydov

On Mon, 2017-05-08 at 12:42 +0200, Laurent Dufour wrote:
> Sorry Balbir,
> 
> You pointed this out since the beginning but I missed your comment.
> My mistake.
>

No worries, as long as the right thing gets in

Balbir Singh 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-08  2:58                     ` Naoya Horiguchi
@ 2017-05-09  9:18                       ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-05-09  9:18 UTC (permalink / raw)
  To: Naoya Horiguchi, akpm
  Cc: Andi Kleen, Johannes Weiner, Laurent Dufour, linux-kernel,
	linux-mm, Vladimir Davydov

On Mon 08-05-17 02:58:36, Naoya Horiguchi wrote:
> On Tue, May 02, 2017 at 08:55:07PM +0200, Michal Hocko wrote:
> > On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> > > On 28/04/2017 15:48, Michal Hocko wrote:
> > [...]
> > > > This is getting quite hairy. What is the expected page count of the
> > > > hwpoison page?
> > 
> > OK, so from the quick check of the hwpoison code it seems that the ref
> > count will be > 1 (from get_hwpoison_page).
> > 
> > > > I guess we would need to update the VM_BUG_ON in the
> > > > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > > > be arbitrary.
> > > 
> > > Based on the experiment I did, page count == 2 when isolate_lru_page()
> > > succeeds, even in the case of a poisoned page.
> > 
> > that would make some sense to me. The page should have been already
> > unmapped therefore but memory_failure increases the ref count and 1 is
> > for isolate_lru_page().
> 
> # sorry for late reply, I was on holidays last week...
> 
> Right, and the refcount taken for memory_failure is not freed after
> memory_failure() returns. unpoison_memory() does free the refcount.

OK, from the charge POV this would be safe because we clear page->memcg
so it wouldn't get uncharged more times.

> > > In my case I think this
> > > is because the page is still used by the process which is calling madvise().
> > > 
> > > I'm wondering if I'm looking at the right place. May be the poisoned
> > > page should remain attach to the memory_cgroup until no one is using it.
> > > In that case this means that something should be done when the page is
> > > off-lined... I've to dig further here.
> > 
> > No, AFAIU the page will not drop the reference count down to 0 in most
> > cases. Maybe there are some scenarios where this can happen but I would
> > expect that the poisoned page will be mapped and in use most of the time
> > and won't drop down 0. And then we should really uncharge it because it
> > will pin the memcg and make it unfreeable which doesn't seem to be what
> > we want.  So does the following work reasonable? Andi, Johannes, what do
> > you think? I cannot say I would be really comfortable touching hwpoison
> > code as I really do not understand the workflow. Maybe we want to move
> > this uncharge down to memory_failure() right before we report success?
> 
> memory_failure() can be called for any types of page (including slab or
> any kernel/driver pages), and the reported problem seems happen only on
> in-use user pages, so uncharging in delete_from_lru_cache() as done below
> looks better to me.

Yeah, we do see problems only for LRU/page cache pages but my
understanding is that error_states (e.g. me_kernel for the kernel
memory) might change in the future and then we wouldn't catch the same
bug, no?

> > ---
> > From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Tue, 2 May 2017 20:32:24 +0200
> > Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> > 
> > Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> > his particular case he has hit a bad_page("page still charged to cgroup")
> > when onlining a hwpoison page.
> 
> > While this looks like something that shouldn't
> > happen in the first place because onlining hwpages and returning them to
> > the page allocator makes only little sense it shows a real problem.
> > 
> > hwpoison pages do not get freed usually so we do not uncharge them (at
> > least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> > Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> > reference for each charged page")) as well and so the mem_cgroup and the
> > associated state will never go away. Fix this leak by forcibly
> > uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> > have to tweak uncharge_list because it cannot rely on zero ref count
> > for these pages.
> > 
> > Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> > Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks! I will wait a day or two for Johannes and repost the patch.
Andrew could you drop
http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-uncharge-poisoned-pages.patch
in the mean time, please?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-09  9:18                       ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-05-09  9:18 UTC (permalink / raw)
  To: Naoya Horiguchi, akpm
  Cc: Andi Kleen, Johannes Weiner, Laurent Dufour, linux-kernel,
	linux-mm, Vladimir Davydov

On Mon 08-05-17 02:58:36, Naoya Horiguchi wrote:
> On Tue, May 02, 2017 at 08:55:07PM +0200, Michal Hocko wrote:
> > On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> > > On 28/04/2017 15:48, Michal Hocko wrote:
> > [...]
> > > > This is getting quite hairy. What is the expected page count of the
> > > > hwpoison page?
> > 
> > OK, so from the quick check of the hwpoison code it seems that the ref
> > count will be > 1 (from get_hwpoison_page).
> > 
> > > > I guess we would need to update the VM_BUG_ON in the
> > > > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > > > be arbitrary.
> > > 
> > > Based on the experiment I did, page count == 2 when isolate_lru_page()
> > > succeeds, even in the case of a poisoned page.
> > 
> > that would make some sense to me. The page should have been already
> > unmapped therefore but memory_failure increases the ref count and 1 is
> > for isolate_lru_page().
> 
> # sorry for late reply, I was on holidays last week...
> 
> Right, and the refcount taken for memory_failure is not freed after
> memory_failure() returns. unpoison_memory() does free the refcount.

OK, from the charge POV this would be safe because we clear page->memcg
so it wouldn't get uncharged more times.

> > > In my case I think this
> > > is because the page is still used by the process which is calling madvise().
> > > 
> > > I'm wondering if I'm looking at the right place. May be the poisoned
> > > page should remain attach to the memory_cgroup until no one is using it.
> > > In that case this means that something should be done when the page is
> > > off-lined... I've to dig further here.
> > 
> > No, AFAIU the page will not drop the reference count down to 0 in most
> > cases. Maybe there are some scenarios where this can happen but I would
> > expect that the poisoned page will be mapped and in use most of the time
> > and won't drop down 0. And then we should really uncharge it because it
> > will pin the memcg and make it unfreeable which doesn't seem to be what
> > we want.  So does the following work reasonable? Andi, Johannes, what do
> > you think? I cannot say I would be really comfortable touching hwpoison
> > code as I really do not understand the workflow. Maybe we want to move
> > this uncharge down to memory_failure() right before we report success?
> 
> memory_failure() can be called for any types of page (including slab or
> any kernel/driver pages), and the reported problem seems happen only on
> in-use user pages, so uncharging in delete_from_lru_cache() as done below
> looks better to me.

Yeah, we do see problems only for LRU/page cache pages but my
understanding is that error_states (e.g. me_kernel for the kernel
memory) might change in the future and then we wouldn't catch the same
bug, no?

> > ---
> > From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Tue, 2 May 2017 20:32:24 +0200
> > Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> > 
> > Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> > his particular case he has hit a bad_page("page still charged to cgroup")
> > when onlining a hwpoison page.
> 
> > While this looks like something that shouldn't
> > happen in the first place because onlining hwpages and returning them to
> > the page allocator makes only little sense it shows a real problem.
> > 
> > hwpoison pages do not get freed usually so we do not uncharge them (at
> > least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> > Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> > reference for each charged page")) as well and so the mem_cgroup and the
> > associated state will never go away. Fix this leak by forcibly
> > uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> > have to tweak uncharge_list because it cannot rely on zero ref count
> > for these pages.
> > 
> > Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> > Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks! I will wait a day or two for Johannes and repost the patch.
Andrew could you drop
http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-uncharge-poisoned-pages.patch
in the mean time, please?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
  2017-05-09  9:18                       ` Michal Hocko
@ 2017-05-09 22:59                         ` Naoya Horiguchi
  -1 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-05-09 22:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, Andi Kleen, Johannes Weiner, Laurent Dufour, linux-kernel,
	linux-mm, Vladimir Davydov

On Tue, May 09, 2017 at 11:18:23AM +0200, Michal Hocko wrote:
> On Mon 08-05-17 02:58:36, Naoya Horiguchi wrote:
> > On Tue, May 02, 2017 at 08:55:07PM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> > > > On 28/04/2017 15:48, Michal Hocko wrote:
> > > [...]
> > > > > This is getting quite hairy. What is the expected page count of the
> > > > > hwpoison page?
> > > 
> > > OK, so from the quick check of the hwpoison code it seems that the ref
> > > count will be > 1 (from get_hwpoison_page).
> > > 
> > > > > I guess we would need to update the VM_BUG_ON in the
> > > > > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > > > > be arbitrary.
> > > > 
> > > > Based on the experiment I did, page count == 2 when isolate_lru_page()
> > > > succeeds, even in the case of a poisoned page.
> > > 
> > > that would make some sense to me. The page should have been already
> > > unmapped therefore but memory_failure increases the ref count and 1 is
> > > for isolate_lru_page().
> > 
> > # sorry for late reply, I was on holidays last week...
> > 
> > Right, and the refcount taken for memory_failure is not freed after
> > memory_failure() returns. unpoison_memory() does free the refcount.
> 
> OK, from the charge POV this would be safe because we clear page->memcg
> so it wouldn't get uncharged more times.
> 
> > > > In my case I think this
> > > > is because the page is still used by the process which is calling madvise().
> > > > 
> > > > I'm wondering if I'm looking at the right place. May be the poisoned
> > > > page should remain attach to the memory_cgroup until no one is using it.
> > > > In that case this means that something should be done when the page is
> > > > off-lined... I've to dig further here.
> > > 
> > > No, AFAIU the page will not drop the reference count down to 0 in most
> > > cases. Maybe there are some scenarios where this can happen but I would
> > > expect that the poisoned page will be mapped and in use most of the time
> > > and won't drop down 0. And then we should really uncharge it because it
> > > will pin the memcg and make it unfreeable which doesn't seem to be what
> > > we want.  So does the following work reasonable? Andi, Johannes, what do
> > > you think? I cannot say I would be really comfortable touching hwpoison
> > > code as I really do not understand the workflow. Maybe we want to move
> > > this uncharge down to memory_failure() right before we report success?
> > 
> > memory_failure() can be called for any types of page (including slab or
> > any kernel/driver pages), and the reported problem seems happen only on
> > in-use user pages, so uncharging in delete_from_lru_cache() as done below
> > looks better to me.
> 
> Yeah, we do see problems only for LRU/page cache pages but my
> understanding is that error_states (e.g. me_kernel for the kernel
> memory) might change in the future and then we wouldn't catch the same
> bug, no?

Right about future change, and we will see the same bug. I guess that the
first target of kernel page is slab page, and memcg_kmem_uncharge() will
be used there. Implementors/Reviewers should care about uncharging when the
time comes.

Thanks,
Naoya Horiguchi

> 
> > > ---
> > > From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Tue, 2 May 2017 20:32:24 +0200
> > > Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> > > 
> > > Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> > > his particular case he has hit a bad_page("page still charged to cgroup")
> > > when onlining a hwpoison page.
> > 
> > > While this looks like something that shouldn't
> > > happen in the first place because onlining hwpages and returning them to
> > > the page allocator makes only little sense it shows a real problem.
> > > 
> > > hwpoison pages do not get freed usually so we do not uncharge them (at
> > > least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> > > Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> > > reference for each charged page")) as well and so the mem_cgroup and the
> > > associated state will never go away. Fix this leak by forcibly
> > > uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> > > have to tweak uncharge_list because it cannot rely on zero ref count
> > > for these pages.
> > > 
> > > Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> > > Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Thanks! I will wait a day or two for Johannes and repost the patch.
> Andrew could you drop
> http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-uncharge-poisoned-pages.patch
> in the mean time, please?
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
@ 2017-05-09 22:59                         ` Naoya Horiguchi
  0 siblings, 0 replies; 68+ messages in thread
From: Naoya Horiguchi @ 2017-05-09 22:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, Andi Kleen, Johannes Weiner, Laurent Dufour, linux-kernel,
	linux-mm, Vladimir Davydov

On Tue, May 09, 2017 at 11:18:23AM +0200, Michal Hocko wrote:
> On Mon 08-05-17 02:58:36, Naoya Horiguchi wrote:
> > On Tue, May 02, 2017 at 08:55:07PM +0200, Michal Hocko wrote:
> > > On Tue 02-05-17 16:59:30, Laurent Dufour wrote:
> > > > On 28/04/2017 15:48, Michal Hocko wrote:
> > > [...]
> > > > > This is getting quite hairy. What is the expected page count of the
> > > > > hwpoison page?
> > > 
> > > OK, so from the quick check of the hwpoison code it seems that the ref
> > > count will be > 1 (from get_hwpoison_page).
> > > 
> > > > > I guess we would need to update the VM_BUG_ON in the
> > > > > memcg uncharge code to ignore the page count of hwpoison pages if it can
> > > > > be arbitrary.
> > > > 
> > > > Based on the experiment I did, page count == 2 when isolate_lru_page()
> > > > succeeds, even in the case of a poisoned page.
> > > 
> > > that would make some sense to me. The page should have been already
> > > unmapped therefore but memory_failure increases the ref count and 1 is
> > > for isolate_lru_page().
> > 
> > # sorry for late reply, I was on holidays last week...
> > 
> > Right, and the refcount taken for memory_failure is not freed after
> > memory_failure() returns. unpoison_memory() does free the refcount.
> 
> OK, from the charge POV this would be safe because we clear page->memcg
> so it wouldn't get uncharged more times.
> 
> > > > In my case I think this
> > > > is because the page is still used by the process which is calling madvise().
> > > > 
> > > > I'm wondering if I'm looking at the right place. May be the poisoned
> > > > page should remain attach to the memory_cgroup until no one is using it.
> > > > In that case this means that something should be done when the page is
> > > > off-lined... I've to dig further here.
> > > 
> > > No, AFAIU the page will not drop the reference count down to 0 in most
> > > cases. Maybe there are some scenarios where this can happen but I would
> > > expect that the poisoned page will be mapped and in use most of the time
> > > and won't drop down 0. And then we should really uncharge it because it
> > > will pin the memcg and make it unfreeable which doesn't seem to be what
> > > we want.  So does the following work reasonable? Andi, Johannes, what do
> > > you think? I cannot say I would be really comfortable touching hwpoison
> > > code as I really do not understand the workflow. Maybe we want to move
> > > this uncharge down to memory_failure() right before we report success?
> > 
> > memory_failure() can be called for any types of page (including slab or
> > any kernel/driver pages), and the reported problem seems happen only on
> > in-use user pages, so uncharging in delete_from_lru_cache() as done below
> > looks better to me.
> 
> Yeah, we do see problems only for LRU/page cache pages but my
> understanding is that error_states (e.g. me_kernel for the kernel
> memory) might change in the future and then we wouldn't catch the same
> bug, no?

Right about future change, and we will see the same bug. I guess that the
first target of kernel page is slab page, and memcg_kmem_uncharge() will
be used there. Implementors/Reviewers should care about uncharging when the
time comes.

Thanks,
Naoya Horiguchi

> 
> > > ---
> > > From 8bf0791bcf35996a859b6d33fb5494e5b53de49d Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Tue, 2 May 2017 20:32:24 +0200
> > > Subject: [PATCH] hwpoison, memcg: forcibly uncharge LRU pages
> > > 
> > > Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
> > > his particular case he has hit a bad_page("page still charged to cgroup")
> > > when onlining a hwpoison page.
> > 
> > > While this looks like something that shouldn't
> > > happen in the first place because onlining hwpages and returning them to
> > > the page allocator makes only little sense it shows a real problem.
> > > 
> > > hwpoison pages do not get freed usually so we do not uncharge them (at
> > > least not since 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")).
> > > Each charge pins memcg (since e8ea14cc6ead ("mm: memcontrol: take a css
> > > reference for each charged page")) as well and so the mem_cgroup and the
> > > associated state will never go away. Fix this leak by forcibly
> > > uncharging a LRU hwpoisoned page in delete_from_lru_cache(). We also
> > > have to tweak uncharge_list because it cannot rely on zero ref count
> > > for these pages.
> > > 
> > > Fixes: 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API")
> > > Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Thanks! I will wait a day or two for Johannes and repost the patch.
> Andrew could you drop
> http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-uncharge-poisoned-pages.patch
> in the mean time, please?
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-28  6:51             ` Michal Hocko
@ 2017-05-10  7:41               ` Michal Hocko
  -1 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-05-10  7:41 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm

On Fri 28-04-17 08:51:31, Michal Hocko wrote:
> On Fri 28-04-17 08:50:50, Michal Hocko wrote:
> > [Drop Wen Congyang because his address bounces - we will have to find
> > out ourselves...]
> > On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> > > On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > > > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > > > this should be skipped when onlining the pages too.
> > > > > >
> > > > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > > > ---
> > > > > >  mm/memory_hotplug.c | 4 ++++
> > > > > >  1 file changed, 4 insertions(+)
> > > > > >
> > > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > > > --- a/mm/memory_hotplug.c
> > > > > > +++ b/mm/memory_hotplug.c
> > > > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > > > >  		for (i = 0; i < nr_pages; i++) {
> > > > > >  			page = pfn_to_page(start_pfn + i);
> > > > > > +			if (PageHWPoison(page)) {
> > > > > > +				ClearPageReserved(page);
> > > > >
> > > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > > > was never offlined to begin with? Or do you expect this to be set on newly
> > > > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > > > 
> > > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > > > that we skip the page status check for hwpoisoned pages *not* to prevent
> > > > memory offlining for memblocks with hwpoisoned pages. That means that
> > > > hwpoisoned pages can be offlined.
> > > 
> > > Is this patch actually correct? I am trying to wrap my head around it
> > > but it smells like it tries to avoid the problem rather than fix it
> > > properly. I might be wrong here of course but to me it sounds like
> > > poisoned page should simply be offlined and keep its poison state all
> > > the time. If the memory is hot-removed and added again we have lost the
> > > struct page along with the state which is the expected behavior. If it
> > > is still broken we will re-poison it.
> > > 
> > > Anyway a patch to skip over poisoned pages during online makes perfect
> > > sense to me. The PageReserved fiddling around much less so.
> > > 
> > > Or am I missing something. Let's CC Wen Congyang for the clarification
> > > here.

Can we revisit this please? The PageReserved() logic for poisoned pages
is completely unclear to me. I would rather not rely on the previous
changelogs and rather build the picture from what is the expected
behavior instead.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2017-05-10  7:41               ` Michal Hocko
  0 siblings, 0 replies; 68+ messages in thread
From: Michal Hocko @ 2017-05-10  7:41 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Balbir Singh, Laurent Dufour, linux-kernel, linux-mm, akpm

On Fri 28-04-17 08:51:31, Michal Hocko wrote:
> On Fri 28-04-17 08:50:50, Michal Hocko wrote:
> > [Drop Wen Congyang because his address bounces - we will have to find
> > out ourselves...]
> > On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> > > On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > > > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > > > this should be skipped when onlining the pages too.
> > > > > >
> > > > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > > > ---
> > > > > >  mm/memory_hotplug.c | 4 ++++
> > > > > >  1 file changed, 4 insertions(+)
> > > > > >
> > > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > > > --- a/mm/memory_hotplug.c
> > > > > > +++ b/mm/memory_hotplug.c
> > > > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > > > >  		for (i = 0; i < nr_pages; i++) {
> > > > > >  			page = pfn_to_page(start_pfn + i);
> > > > > > +			if (PageHWPoison(page)) {
> > > > > > +				ClearPageReserved(page);
> > > > >
> > > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > > > was never offlined to begin with? Or do you expect this to be set on newly
> > > > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > > > 
> > > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > > > that we skip the page status check for hwpoisoned pages *not* to prevent
> > > > memory offlining for memblocks with hwpoisoned pages. That means that
> > > > hwpoisoned pages can be offlined.
> > > 
> > > Is this patch actually correct? I am trying to wrap my head around it
> > > but it smells like it tries to avoid the problem rather than fix it
> > > properly. I might be wrong here of course but to me it sounds like
> > > poisoned page should simply be offlined and keep its poison state all
> > > the time. If the memory is hot-removed and added again we have lost the
> > > struct page along with the state which is the expected behavior. If it
> > > is still broken we will re-poison it.
> > > 
> > > Anyway a patch to skip over poisoned pages during online makes perfect
> > > sense to me. The PageReserved fiddling around much less so.
> > > 
> > > Or am I missing something. Let's CC Wen Congyang for the clarification
> > > here.

Can we revisit this please? The PageReserved() logic for poisoned pages
is completely unclear to me. I would rather not rely on the previous
changelogs and rather build the picture from what is the expected
behavior instead.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2017-04-28  6:30         ` Michal Hocko
@ 2018-01-17 23:03           ` Andrew Morton
  -1 siblings, 0 replies; 68+ messages in thread
From: Andrew Morton @ 2018-01-17 23:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Balbir Singh, Laurent Dufour, linux-kernel,
	linux-mm, Wen Congyang

On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > this should be skipped when onlining the pages too.
> > > >
> > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > ---
> > > >  mm/memory_hotplug.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > >  		for (i = 0; i < nr_pages; i++) {
> > > >  			page = pfn_to_page(start_pfn + i);
> > > > +			if (PageHWPoison(page)) {
> > > > +				ClearPageReserved(page);
> > >
> > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > was never offlined to begin with? Or do you expect this to be set on newly
> > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > 
> > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > that we skip the page status check for hwpoisoned pages *not* to prevent
> > memory offlining for memblocks with hwpoisoned pages. That means that
> > hwpoisoned pages can be offlined.
> 
> Is this patch actually correct? I am trying to wrap my head around it
> but it smells like it tries to avoid the problem rather than fix it
> properly. I might be wrong here of course but to me it sounds like
> poisoned page should simply be offlined and keep its poison state all
> the time. If the memory is hot-removed and added again we have lost the
> struct page along with the state which is the expected behavior. If it
> is still broken we will re-poison it.
> 
> Anyway a patch to skip over poisoned pages during online makes perfect
> sense to me. The PageReserved fiddling around much less so.
> 
> Or am I missing something. Let's CC Wen Congyang for the clarification
> here.

Wen Congyang appears to have disappeared and this fix isn't yet
finalized.  Can we all please revisit it and have a think about
Michal's questions?

Thanks.


From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Subject: mm: skip HWPoisoned pages when onlining pages

b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
skipped the HWPoisoned pages when offlining pages, but this should be
skipped when onlining the pages too.

n-horiguchi@ah.jp.nec.com said:

: If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
: means that we skip the page status check for hwpoisoned pages *not* to
: prevent memory offlining for memblocks with hwpoisoned pages.  That
: means that hwpoisoned pages can be offlined.
: 
: And another reason to clear PageReserved is that we could reuse the
: hwpoisoned page after onlining back with replacing the broken DIMM.  In
: this usecase, we first do unpoisoning to clear PageHWPoison, but it
: doesn't work if PageReserved is set.  My simple testing shows the BUG
: below in unpoisoning (without the ClearPageReserved):
: 
:   Unpoison: Software-unpoisoned page 0x18000
:   BUG: Bad page state in process page-types  pfn:18000
:   page:ffffda5440600000 count:0 mapcount:0 mapping:          (null) index:0x70006b599
:   flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
:   raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
:   raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
:   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
:   bad because of flags: 0x800(reserved)

Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    4 ++++
 1 file changed, 4 insertions(+)

diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
+++ a/mm/memory_hotplug.c
@@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
+			if (PageHWPoison(page)) {
+				ClearPageReserved(page);
+				continue;
+			}
 			(*online_page_callback)(page);
 			onlined_pages++;
 		}
_

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2018-01-17 23:03           ` Andrew Morton
  0 siblings, 0 replies; 68+ messages in thread
From: Andrew Morton @ 2018-01-17 23:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Naoya Horiguchi, Balbir Singh, Laurent Dufour, linux-kernel,
	linux-mm, Wen Congyang

On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > this should be skipped when onlining the pages too.
> > > >
> > > > Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> > > > ---
> > > >  mm/memory_hotplug.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > >  	if (PageReserved(pfn_to_page(start_pfn)))
> > > >  		for (i = 0; i < nr_pages; i++) {
> > > >  			page = pfn_to_page(start_pfn + i);
> > > > +			if (PageHWPoison(page)) {
> > > > +				ClearPageReserved(page);
> > >
> > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > was never offlined to begin with? Or do you expect this to be set on newly
> > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > 
> > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > that we skip the page status check for hwpoisoned pages *not* to prevent
> > memory offlining for memblocks with hwpoisoned pages. That means that
> > hwpoisoned pages can be offlined.
> 
> Is this patch actually correct? I am trying to wrap my head around it
> but it smells like it tries to avoid the problem rather than fix it
> properly. I might be wrong here of course but to me it sounds like
> poisoned page should simply be offlined and keep its poison state all
> the time. If the memory is hot-removed and added again we have lost the
> struct page along with the state which is the expected behavior. If it
> is still broken we will re-poison it.
> 
> Anyway a patch to skip over poisoned pages during online makes perfect
> sense to me. The PageReserved fiddling around much less so.
> 
> Or am I missing something. Let's CC Wen Congyang for the clarification
> here.

Wen Congyang appears to have disappeared and this fix isn't yet
finalized.  Can we all please revisit it and have a think about
Michal's questions?

Thanks.


From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Subject: mm: skip HWPoisoned pages when onlining pages

b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
skipped the HWPoisoned pages when offlining pages, but this should be
skipped when onlining the pages too.

n-horiguchi@ah.jp.nec.com said:

: If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
: means that we skip the page status check for hwpoisoned pages *not* to
: prevent memory offlining for memblocks with hwpoisoned pages.  That
: means that hwpoisoned pages can be offlined.
: 
: And another reason to clear PageReserved is that we could reuse the
: hwpoisoned page after onlining back with replacing the broken DIMM.  In
: this usecase, we first do unpoisoning to clear PageHWPoison, but it
: doesn't work if PageReserved is set.  My simple testing shows the BUG
: below in unpoisoning (without the ClearPageReserved):
: 
:   Unpoison: Software-unpoisoned page 0x18000
:   BUG: Bad page state in process page-types  pfn:18000
:   page:ffffda5440600000 count:0 mapcount:0 mapping:          (null) index:0x70006b599
:   flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
:   raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
:   raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
:   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
:   bad because of flags: 0x800(reserved)

Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    4 ++++
 1 file changed, 4 insertions(+)

diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
+++ a/mm/memory_hotplug.c
@@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
 	if (PageReserved(pfn_to_page(start_pfn)))
 		for (i = 0; i < nr_pages; i++) {
 			page = pfn_to_page(start_pfn + i);
+			if (PageHWPoison(page)) {
+				ClearPageReserved(page);
+				continue;
+			}
 			(*online_page_callback)(page);
 			onlined_pages++;
 		}
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
  2018-01-17 23:03           ` Andrew Morton
@ 2018-01-23 18:15             ` Laurent Dufour
  -1 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2018-01-23 18:15 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: Naoya Horiguchi, Balbir Singh, linux-kernel, linux-mm, Wen Congyang

Hi Andrew,

On 18/01/2018 00:03, Andrew Morton wrote:
> On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
>> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
>>> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
>>>> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
>>>>> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
>>>>> offlining pages") skip the HWPoisoned pages when offlining pages, but
>>>>> this should be skipped when onlining the pages too.
>>>>>
>>>>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
>>>>> ---
>>>>>  mm/memory_hotplug.c | 4 ++++
>>>>>  1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 6fa7208bcd56..741ddb50e7d2 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>>>>  	if (PageReserved(pfn_to_page(start_pfn)))
>>>>>  		for (i = 0; i < nr_pages; i++) {
>>>>>  			page = pfn_to_page(start_pfn + i);
>>>>> +			if (PageHWPoison(page)) {
>>>>> +				ClearPageReserved(page);
>>>>
>>>> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
>>>> was never offlined to begin with? Or do you expect this to be set on newly
>>>> hotplugged memory? Also don't we need to skip the entire pageblock?
>>>
>>> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
>>> that we skip the page status check for hwpoisoned pages *not* to prevent
>>> memory offlining for memblocks with hwpoisoned pages. That means that
>>> hwpoisoned pages can be offlined.
>>
>> Is this patch actually correct? I am trying to wrap my head around it
>> but it smells like it tries to avoid the problem rather than fix it
>> properly. I might be wrong here of course but to me it sounds like
>> poisoned page should simply be offlined and keep its poison state all
>> the time. If the memory is hot-removed and added again we have lost the
>> struct page along with the state which is the expected behavior. If it
>> is still broken we will re-poison it.
>>
>> Anyway a patch to skip over poisoned pages during online makes perfect
>> sense to me. The PageReserved fiddling around much less so.
>>
>> Or am I missing something. Let's CC Wen Congyang for the clarification
>> here.
> 
> Wen Congyang appears to have disappeared and this fix isn't yet
> finalized.  Can we all please revisit it and have a think about
> Michal's questions?

I tried to recreate the original issue, but there were a lot of changes
done in this area since the last April.

I was not able to offline a poisoned page because isolate_movable_page() is
failing. I'll investigate that further...

Cheers,
Laurent.


> Thanks.
> 
> 
> From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Subject: mm: skip HWPoisoned pages when onlining pages
> 
> b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> skipped the HWPoisoned pages when offlining pages, but this should be
> skipped when onlining the pages too.
> 
> n-horiguchi@ah.jp.nec.com said:
> 
> : If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
> : means that we skip the page status check for hwpoisoned pages *not* to
> : prevent memory offlining for memblocks with hwpoisoned pages.  That
> : means that hwpoisoned pages can be offlined.
> : 
> : And another reason to clear PageReserved is that we could reuse the
> : hwpoisoned page after onlining back with replacing the broken DIMM.  In
> : this usecase, we first do unpoisoning to clear PageHWPoison, but it
> : doesn't work if PageReserved is set.  My simple testing shows the BUG
> : below in unpoisoning (without the ClearPageReserved):
> : 
> :   Unpoison: Software-unpoisoned page 0x18000
> :   BUG: Bad page state in process page-types  pfn:18000
> :   page:ffffda5440600000 count:0 mapcount:0 mapping:          (null) index:0x70006b599
> :   flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
> :   raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
> :   raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> :   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> :   bad because of flags: 0x800(reserved)
> 
> Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrey Vagin <avagin@openvz.org>
> Cc: Glauber Costa <glommer@openvz.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/memory_hotplug.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
> --- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
> +++ a/mm/memory_hotplug.c
> @@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
>  	if (PageReserved(pfn_to_page(start_pfn)))
>  		for (i = 0; i < nr_pages; i++) {
>  			page = pfn_to_page(start_pfn + i);
> +			if (PageHWPoison(page)) {
> +				ClearPageReserved(page);
> +				continue;
> +			}
>  			(*online_page_callback)(page);
>  			onlined_pages++;
>  		}
> _
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
@ 2018-01-23 18:15             ` Laurent Dufour
  0 siblings, 0 replies; 68+ messages in thread
From: Laurent Dufour @ 2018-01-23 18:15 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko
  Cc: Naoya Horiguchi, Balbir Singh, linux-kernel, linux-mm, Wen Congyang

Hi Andrew,

On 18/01/2018 00:03, Andrew Morton wrote:
> On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
>> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
>>> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
>>>> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
>>>>> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
>>>>> offlining pages") skip the HWPoisoned pages when offlining pages, but
>>>>> this should be skipped when onlining the pages too.
>>>>>
>>>>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
>>>>> ---
>>>>>  mm/memory_hotplug.c | 4 ++++
>>>>>  1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 6fa7208bcd56..741ddb50e7d2 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>>>>  	if (PageReserved(pfn_to_page(start_pfn)))
>>>>>  		for (i = 0; i < nr_pages; i++) {
>>>>>  			page = pfn_to_page(start_pfn + i);
>>>>> +			if (PageHWPoison(page)) {
>>>>> +				ClearPageReserved(page);
>>>>
>>>> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
>>>> was never offlined to begin with? Or do you expect this to be set on newly
>>>> hotplugged memory? Also don't we need to skip the entire pageblock?
>>>
>>> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
>>> that we skip the page status check for hwpoisoned pages *not* to prevent
>>> memory offlining for memblocks with hwpoisoned pages. That means that
>>> hwpoisoned pages can be offlined.
>>
>> Is this patch actually correct? I am trying to wrap my head around it
>> but it smells like it tries to avoid the problem rather than fix it
>> properly. I might be wrong here of course but to me it sounds like
>> poisoned page should simply be offlined and keep its poison state all
>> the time. If the memory is hot-removed and added again we have lost the
>> struct page along with the state which is the expected behavior. If it
>> is still broken we will re-poison it.
>>
>> Anyway a patch to skip over poisoned pages during online makes perfect
>> sense to me. The PageReserved fiddling around much less so.
>>
>> Or am I missing something. Let's CC Wen Congyang for the clarification
>> here.
> 
> Wen Congyang appears to have disappeared and this fix isn't yet
> finalized.  Can we all please revisit it and have a think about
> Michal's questions?

I tried to recreate the original issue, but there were a lot of changes
done in this area since the last April.

I was not able to offline a poisoned page because isolate_movable_page() is
failing. I'll investigate that further...

Cheers,
Laurent.


> Thanks.
> 
> 
> From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Subject: mm: skip HWPoisoned pages when onlining pages
> 
> b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> skipped the HWPoisoned pages when offlining pages, but this should be
> skipped when onlining the pages too.
> 
> n-horiguchi@ah.jp.nec.com said:
> 
> : If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
> : means that we skip the page status check for hwpoisoned pages *not* to
> : prevent memory offlining for memblocks with hwpoisoned pages.  That
> : means that hwpoisoned pages can be offlined.
> : 
> : And another reason to clear PageReserved is that we could reuse the
> : hwpoisoned page after onlining back with replacing the broken DIMM.  In
> : this usecase, we first do unpoisoning to clear PageHWPoison, but it
> : doesn't work if PageReserved is set.  My simple testing shows the BUG
> : below in unpoisoning (without the ClearPageReserved):
> : 
> :   Unpoison: Software-unpoisoned page 0x18000
> :   BUG: Bad page state in process page-types  pfn:18000
> :   page:ffffda5440600000 count:0 mapcount:0 mapping:          (null) index:0x70006b599
> :   flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
> :   raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
> :   raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> :   page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> :   bad because of flags: 0x800(reserved)
> 
> Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrey Vagin <avagin@openvz.org>
> Cc: Glauber Costa <glommer@openvz.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/memory_hotplug.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
> --- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
> +++ a/mm/memory_hotplug.c
> @@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
>  	if (PageReserved(pfn_to_page(start_pfn)))
>  		for (i = 0; i < nr_pages; i++) {
>  			page = pfn_to_page(start_pfn + i);
> +			if (PageHWPoison(page)) {
> +				ClearPageReserved(page);
> +				continue;
> +			}
>  			(*online_page_callback)(page);
>  			onlined_pages++;
>  		}
> _
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2018-01-23 18:15 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 14:27 [PATCH v2 0/2] BUG raised when onlining HWPoisoned page Laurent Dufour
2017-04-25 14:27 ` Laurent Dufour
2017-04-25 14:27 ` [PATCH v2 1/2] mm: Uncharge poisoned pages Laurent Dufour
2017-04-25 14:27   ` Laurent Dufour
2017-04-25 23:48   ` Naoya Horiguchi
2017-04-25 23:48     ` Naoya Horiguchi
2017-04-26  1:54   ` Balbir Singh
2017-04-26  1:54     ` Balbir Singh
2017-04-26  2:34     ` Naoya Horiguchi
2017-04-26  2:34       ` Naoya Horiguchi
2017-04-26  3:45       ` Balbir Singh
2017-04-26  3:45         ` Balbir Singh
2017-04-26  4:46         ` Naoya Horiguchi
2017-04-26  4:46           ` Naoya Horiguchi
2017-04-26  8:59           ` Balbir Singh
2017-04-26  8:59             ` Balbir Singh
2017-04-28  9:32             ` Laurent Dufour
2017-04-28  9:32               ` Laurent Dufour
2017-04-27 14:37   ` Michal Hocko
2017-04-27 14:37     ` Michal Hocko
2017-04-27 20:51     ` Andi Kleen
2017-04-27 20:51       ` Andi Kleen
2017-04-28  6:07       ` Michal Hocko
2017-04-28  6:07         ` Michal Hocko
2017-04-28  7:31         ` Michal Hocko
2017-04-28  7:31           ` Michal Hocko
2017-04-28  9:17           ` Laurent Dufour
2017-04-28  9:17             ` Laurent Dufour
2017-04-28 13:48             ` Michal Hocko
2017-04-28 13:48               ` Michal Hocko
2017-05-02 14:59               ` Laurent Dufour
2017-05-02 14:59                 ` Laurent Dufour
2017-05-02 18:55                 ` Michal Hocko
2017-05-02 18:55                   ` Michal Hocko
2017-05-03 11:34                   ` Laurent Dufour
2017-05-03 11:34                     ` Laurent Dufour
2017-05-04  1:21                   ` Balbir Singh
2017-05-04  1:21                     ` Balbir Singh
2017-05-08 10:42                     ` Laurent Dufour
2017-05-08 10:42                       ` Laurent Dufour
2017-05-09  1:41                       ` Balbir Singh
2017-05-09  1:41                         ` Balbir Singh
2017-05-08  2:58                   ` Naoya Horiguchi
2017-05-08  2:58                     ` Naoya Horiguchi
2017-05-09  9:18                     ` Michal Hocko
2017-05-09  9:18                       ` Michal Hocko
2017-05-09 22:59                       ` Naoya Horiguchi
2017-05-09 22:59                         ` Naoya Horiguchi
2017-04-25 14:27 ` [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages Laurent Dufour
2017-04-25 14:27   ` Laurent Dufour
2017-04-26  2:10   ` Balbir Singh
2017-04-26  2:10     ` Balbir Singh
2017-04-26  3:13     ` Naoya Horiguchi
2017-04-26  3:13       ` Naoya Horiguchi
2017-04-28  2:51       ` Balbir Singh
2017-04-28  2:51         ` Balbir Singh
2017-04-28  6:30       ` Michal Hocko
2017-04-28  6:30         ` Michal Hocko
2017-04-28  6:50         ` Michal Hocko
2017-04-28  6:50           ` Michal Hocko
2017-04-28  6:51           ` Michal Hocko
2017-04-28  6:51             ` Michal Hocko
2017-05-10  7:41             ` Michal Hocko
2017-05-10  7:41               ` Michal Hocko
2018-01-17 23:03         ` Andrew Morton
2018-01-17 23:03           ` Andrew Morton
2018-01-23 18:15           ` Laurent Dufour
2018-01-23 18:15             ` Laurent Dufour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.