linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] virtio_balloon: fix race by fill and leak
@ 2015-12-27 23:35 Minchan Kim
  2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Minchan Kim @ 2015-12-27 23:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael S. Tsirkin, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, Minchan Kim, stable

During my compaction-related stuff, I encountered a bug
with ballooning.

With repeated inflating and deflating cycle, guest memory(
ie, cat /proc/meminfo | grep MemTotal) is decreased and
couldn't be recovered.

The reason is balloon_lock doesn't cover release_pages_balloon
so struct virtio_balloon fields could be overwritten by race
of fill_balloon(e,g, vb->*pfns could be critical).

This patch fixes it in my test.

Cc: <stable@vger.kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7efc32945810..7d3e5d0e9aa4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -209,8 +209,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 */
 	if (vb->num_pfns != 0)
 		tell_host(vb, vb->deflate_vq);
-	mutex_unlock(&vb->balloon_lock);
 	release_pages_balloon(vb);
+	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2015-12-27 23:35 [PATCH 1/2] virtio_balloon: fix race by fill and leak Minchan Kim
@ 2015-12-27 23:35 ` Minchan Kim
  2015-12-27 23:36   ` Rafael Aquini
  2016-01-01  9:36   ` Michael S. Tsirkin
  2015-12-27 23:36 ` [PATCH 1/2] virtio_balloon: fix race by fill and leak Rafael Aquini
  2016-01-01  8:26 ` Michael S. Tsirkin
  2 siblings, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2015-12-27 23:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael S. Tsirkin, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, Minchan Kim, stable

In balloon_page_dequeue, pages_lock should cover the loop
(ie, list_for_each_entry_safe). Otherwise, the cursor page could
be isolated by compaction and then list_del by isolation could
poison the page->lru.{prev,next} so the loop finally could
access wrong address like this. This patch fixes the bug.

general protection fault: 0000 [#1] SMP
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
Stack:
 0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
 0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
 ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
Call Trace:
 [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
 [<ffffffff812c8bc7>] balloon+0x217/0x2a0
 [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
 [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
 [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
 [<ffffffff8105b6e9>] kthread+0xc9/0xe0
 [<ffffffff8105b620>] ? kthread_park+0x60/0x60
 [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
 [<ffffffff8105b620>] ? kthread_park+0x60/0x60
Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
 RSP <ffff8800a7fefdc0>
---[ end trace 43cf28060d708d5f ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

Cc: <stable@vger.kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/balloon_compaction.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index d3116be5a00f..300117f1a08f 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 	bool dequeued_page;
 
 	dequeued_page = false;
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
 	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
 		/*
 		 * Block others from accessing the 'page' while we get around
@@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 				continue;
 			}
 #endif
-			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
 			balloon_page_delete(page);
 			__count_vm_event(BALLOON_DEFLATE);
-			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
 			unlock_page(page);
 			dequeued_page = true;
 			break;
 		}
 	}
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
 
 	if (!dequeued_page) {
 		/*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] virtio_balloon: fix race by fill and leak
  2015-12-27 23:35 [PATCH 1/2] virtio_balloon: fix race by fill and leak Minchan Kim
  2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
@ 2015-12-27 23:36 ` Rafael Aquini
  2016-01-01  8:26 ` Michael S. Tsirkin
  2 siblings, 0 replies; 13+ messages in thread
From: Rafael Aquini @ 2015-12-27 23:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Michael S. Tsirkin, linux-mm, linux-kernel,
	virtualization, Konstantin Khlebnikov, stable

On Mon, Dec 28, 2015 at 08:35:12AM +0900, Minchan Kim wrote:
> During my compaction-related stuff, I encountered a bug
> with ballooning.
> 
> With repeated inflating and deflating cycle, guest memory(
> ie, cat /proc/meminfo | grep MemTotal) is decreased and
> couldn't be recovered.
> 
> The reason is balloon_lock doesn't cover release_pages_balloon
> so struct virtio_balloon fields could be overwritten by race
> of fill_balloon(e,g, vb->*pfns could be critical).
> 
> This patch fixes it in my test.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/virtio/virtio_balloon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 7efc32945810..7d3e5d0e9aa4 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -209,8 +209,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 */
>  	if (vb->num_pfns != 0)
>  		tell_host(vb, vb->deflate_vq);
> -	mutex_unlock(&vb->balloon_lock);
>  	release_pages_balloon(vb);
> +	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
>  
> -- 
> 1.9.1
> 
Acked-by: Rafael Aquini <aquini@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
@ 2015-12-27 23:36   ` Rafael Aquini
  2016-01-01  9:36   ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Rafael Aquini @ 2015-12-27 23:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Michael S. Tsirkin, linux-mm, linux-kernel,
	virtualization, Konstantin Khlebnikov, stable

On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> In balloon_page_dequeue, pages_lock should cover the loop
> (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> be isolated by compaction and then list_del by isolation could
> poison the page->lru.{prev,next} so the loop finally could
> access wrong address like this. This patch fixes the bug.
> 
> general protection fault: 0000 [#1] SMP
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> Stack:
>  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
>  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
>  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> Call Trace:
>  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
>  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
>  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
>  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
>  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
>  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
>  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
>  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
>  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
>  RSP <ffff8800a7fefdc0>
> ---[ end trace 43cf28060d708d5f ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/balloon_compaction.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index d3116be5a00f..300117f1a08f 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  	bool dequeued_page;
>  
>  	dequeued_page = false;
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
>  		/*
>  		 * Block others from accessing the 'page' while we get around
> @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  				continue;
>  			}
>  #endif
> -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  			balloon_page_delete(page);
>  			__count_vm_event(BALLOON_DEFLATE);
> -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  			unlock_page(page);
>  			dequeued_page = true;
>  			break;
>  		}
>  	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  
>  	if (!dequeued_page) {
>  		/*
> -- 
> 1.9.1
> 
Acked-by: Rafael Aquini <aquini@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] virtio_balloon: fix race by fill and leak
  2015-12-27 23:35 [PATCH 1/2] virtio_balloon: fix race by fill and leak Minchan Kim
  2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
  2015-12-27 23:36 ` [PATCH 1/2] virtio_balloon: fix race by fill and leak Rafael Aquini
@ 2016-01-01  8:26 ` Michael S. Tsirkin
  2 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-01-01  8:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, stable

On Mon, Dec 28, 2015 at 08:35:12AM +0900, Minchan Kim wrote:
> During my compaction-related stuff, I encountered a bug
> with ballooning.
> 
> With repeated inflating and deflating cycle, guest memory(
> ie, cat /proc/meminfo | grep MemTotal) is decreased and
> couldn't be recovered.
> 
> The reason is balloon_lock doesn't cover release_pages_balloon
> so struct virtio_balloon fields could be overwritten by race
> of fill_balloon(e,g, vb->*pfns could be critical).
> 
> This patch fixes it in my test.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/virtio/virtio_balloon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 7efc32945810..7d3e5d0e9aa4 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -209,8 +209,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 */
>  	if (vb->num_pfns != 0)
>  		tell_host(vb, vb->deflate_vq);
> -	mutex_unlock(&vb->balloon_lock);
>  	release_pages_balloon(vb);
> +	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
>  
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
  2015-12-27 23:36   ` Rafael Aquini
@ 2016-01-01  9:36   ` Michael S. Tsirkin
  2016-01-04  0:27     ` Minchan Kim
  2016-01-08 19:56     ` Rafael Aquini
  1 sibling, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-01-01  9:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, stable

On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> In balloon_page_dequeue, pages_lock should cover the loop
> (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> be isolated by compaction and then list_del by isolation could
> poison the page->lru.{prev,next} so the loop finally could
> access wrong address like this. This patch fixes the bug.
> 
> general protection fault: 0000 [#1] SMP
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> Stack:
>  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
>  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
>  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> Call Trace:
>  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
>  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
>  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
>  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
>  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
>  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
>  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
>  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
>  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
>  RSP <ffff8800a7fefdc0>
> ---[ end trace 43cf28060d708d5f ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/balloon_compaction.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index d3116be5a00f..300117f1a08f 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  	bool dequeued_page;
>  
>  	dequeued_page = false;
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
>  		/*
>  		 * Block others from accessing the 'page' while we get around
> @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  				continue;
>  			}
>  #endif
> -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  			balloon_page_delete(page);
>  			__count_vm_event(BALLOON_DEFLATE);
> -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  			unlock_page(page);
>  			dequeued_page = true;
>  			break;
>  		}
>  	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  
>  	if (!dequeued_page) {
>  		/*

I think this will cause deadlocks.

pages_lock now nests within page lock, balloon_page_putback
nests them in the reverse order.

Did you test this with lockdep? You really should for
locking changes, and I'd expect it to warn about this.

Also, there's another issue there I think: after isolation page could
also get freed before we try to lock it.

We really must take a page reference before touching
the page.

I think we need something like the below to fix this issue.
Could you please try this out, and send Tested-by?
I will repost as a proper patch if this works for you.


diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index d3116be..66d69c5 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
  */
 struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 {
-	struct page *page, *tmp;
+	struct page *page;
 	unsigned long flags;
 	bool dequeued_page;
+	LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
 
 	dequeued_page = false;
-	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
+	/*
+	 * We need to go over b_dev_info->pages and lock each page,
+	 * but b_dev_info->pages_lock must nest within page lock.
+	 *
+	 * To make this safe, remove each page from b_dev_info->pages list
+	 * under b_dev_info->pages_lock, then drop this lock. Once list is
+	 * empty, re-add them also under b_dev_info->pages_lock.
+	 */
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+	while (!list_empty(&b_dev_info->pages)) {
+		page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
+		/* move to processed list to avoid going over it another time */
+		list_move(&page->lru, &processed);
+
+		if (!get_page_unless_zero(page))
+			continue;
+		/*
+		 * pages_lock nests within page lock,
+		 * so drop it before trylock_page
+		 */
+		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
 		/*
 		 * Block others from accessing the 'page' while we get around
 		 * establishing additional references and preparing the 'page'
@@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 			if (!PagePrivate(page)) {
 				/* raced with isolation */
 				unlock_page(page);
+				put_page(page);
 				continue;
 			}
 #endif
@@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
 			__count_vm_event(BALLOON_DEFLATE);
 			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
 			unlock_page(page);
+			put_page(page);
 			dequeued_page = true;
 			break;
 		}
+		put_page(page);
+		spin_lock_irqsave(&b_dev_info->pages_lock, flags);
 	}
 
+	/* re-add remaining entries */
+	list_splice(&processed, &b_dev_info->pages);
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+
 	if (!dequeued_page) {
 		/*
 		 * If we are unable to dequeue a balloon page because the page

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-01  9:36   ` Michael S. Tsirkin
@ 2016-01-04  0:27     ` Minchan Kim
  2016-01-10 21:40       ` Michael S. Tsirkin
  2016-01-08 19:56     ` Rafael Aquini
  1 sibling, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2016-01-04  0:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Andrew Morton, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, stable

On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> > In balloon_page_dequeue, pages_lock should cover the loop
> > (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> > be isolated by compaction and then list_del by isolation could
> > poison the page->lru.{prev,next} so the loop finally could
> > access wrong address like this. This patch fixes the bug.
> > 
> > general protection fault: 0000 [#1] SMP
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> > RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> > FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> > Stack:
> >  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> >  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> >  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> > Call Trace:
> >  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> >  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> >  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> >  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> >  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> >  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> >  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> > RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> >  RSP <ffff8800a7fefdc0>
> > ---[ end trace 43cf28060d708d5f ]---
> > Kernel panic - not syncing: Fatal exception
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> > 
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/balloon_compaction.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > index d3116be5a00f..300117f1a08f 100644
> > --- a/mm/balloon_compaction.c
> > +++ b/mm/balloon_compaction.c
> > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  	bool dequeued_page;
> >  
> >  	dequeued_page = false;
> > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >  		/*
> >  		 * Block others from accessing the 'page' while we get around
> > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  				continue;
> >  			}
> >  #endif
> > -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  			balloon_page_delete(page);
> >  			__count_vm_event(BALLOON_DEFLATE);
> > -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  			unlock_page(page);
> >  			dequeued_page = true;
> >  			break;
> >  		}
> >  	}
> > +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  
> >  	if (!dequeued_page) {
> >  		/*
> 
> I think this will cause deadlocks.
> 
> pages_lock now nests within page lock, balloon_page_putback
> nests them in the reverse order.

In balloon_page_dequeu, we used trylock so I don't think it's
deadlock.

> 
> Did you test this with lockdep? You really should for
> locking changes, and I'd expect it to warn about this.

I did but I don't see any warning.

> 
> Also, there's another issue there I think: after isolation page could
> also get freed before we try to lock it.

If a page was isolated, the page shouldn't stay b_dev_info->pages
list so balloon_page_dequeue cannot see the page.
Am I missing something?

> 
> We really must take a page reference before touching
> the page.
> 
> I think we need something like the below to fix this issue.
> Could you please try this out, and send Tested-by?
> I will repost as a proper patch if this works for you.

If I missed something, I am happy to retest and report the result
when I go to the office.

Thanks.

> 
> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index d3116be..66d69c5 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>   */
>  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  {
> -	struct page *page, *tmp;
> +	struct page *page;
>  	unsigned long flags;
>  	bool dequeued_page;
> +	LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
>  
>  	dequeued_page = false;
> -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> +	/*
> +	 * We need to go over b_dev_info->pages and lock each page,
> +	 * but b_dev_info->pages_lock must nest within page lock.
> +	 *
> +	 * To make this safe, remove each page from b_dev_info->pages list
> +	 * under b_dev_info->pages_lock, then drop this lock. Once list is
> +	 * empty, re-add them also under b_dev_info->pages_lock.
> +	 */
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	while (!list_empty(&b_dev_info->pages)) {
> +		page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
> +		/* move to processed list to avoid going over it another time */
> +		list_move(&page->lru, &processed);
> +
> +		if (!get_page_unless_zero(page))
> +			continue;
> +		/*
> +		 * pages_lock nests within page lock,
> +		 * so drop it before trylock_page
> +		 */
> +		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
>  		/*
>  		 * Block others from accessing the 'page' while we get around
>  		 * establishing additional references and preparing the 'page'
> @@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			if (!PagePrivate(page)) {
>  				/* raced with isolation */
>  				unlock_page(page);
> +				put_page(page);
>  				continue;
>  			}
>  #endif
> @@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			__count_vm_event(BALLOON_DEFLATE);
>  			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  			unlock_page(page);
> +			put_page(page);
>  			dequeued_page = true;
>  			break;
>  		}
> +		put_page(page);
> +		spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  	}
>  
> +	/* re-add remaining entries */
> +	list_splice(&processed, &b_dev_info->pages);
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
>  	if (!dequeued_page) {
>  		/*
>  		 * If we are unable to dequeue a balloon page because the page

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-01  9:36   ` Michael S. Tsirkin
  2016-01-04  0:27     ` Minchan Kim
@ 2016-01-08 19:56     ` Rafael Aquini
  2016-01-08 23:43       ` Minchan Kim
  2016-01-09 21:43       ` Michael S. Tsirkin
  1 sibling, 2 replies; 13+ messages in thread
From: Rafael Aquini @ 2016-01-08 19:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel,
	virtualization, Konstantin Khlebnikov, stable

On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> > In balloon_page_dequeue, pages_lock should cover the loop
> > (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> > be isolated by compaction and then list_del by isolation could
> > poison the page->lru.{prev,next} so the loop finally could
> > access wrong address like this. This patch fixes the bug.
> > 
> > general protection fault: 0000 [#1] SMP
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Modules linked in:
> > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> > RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> > FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> > Stack:
> >  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> >  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> >  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> > Call Trace:
> >  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> >  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> >  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> >  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> >  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> >  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> >  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> > RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> >  RSP <ffff8800a7fefdc0>
> > ---[ end trace 43cf28060d708d5f ]---
> > Kernel panic - not syncing: Fatal exception
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > Kernel Offset: disabled
> > 
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/balloon_compaction.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > index d3116be5a00f..300117f1a08f 100644
> > --- a/mm/balloon_compaction.c
> > +++ b/mm/balloon_compaction.c
> > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  	bool dequeued_page;
> >  
> >  	dequeued_page = false;
> > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> >  		/*
> >  		 * Block others from accessing the 'page' while we get around
> > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  				continue;
> >  			}
> >  #endif
> > -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  			balloon_page_delete(page);
> >  			__count_vm_event(BALLOON_DEFLATE);
> > -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  			unlock_page(page);
> >  			dequeued_page = true;
> >  			break;
> >  		}
> >  	}
> > +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  
> >  	if (!dequeued_page) {
> >  		/*
> 
> I think this will cause deadlocks.
> 
> pages_lock now nests within page lock, balloon_page_putback
> nests them in the reverse order.
> 
> Did you test this with lockdep? You really should for
> locking changes, and I'd expect it to warn about this.
> 
> Also, there's another issue there I think: after isolation page could
> also get freed before we try to lock it.
> 
> We really must take a page reference before touching
> the page.
> 
> I think we need something like the below to fix this issue.
> Could you please try this out, and send Tested-by?
> I will repost as a proper patch if this works for you.
>

Nice catch! Thanks for spotting it. I just have one minor nit. See
below
 
> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index d3116be..66d69c5 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>   */
>  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  {
> -	struct page *page, *tmp;
> +	struct page *page;
>  	unsigned long flags;
>  	bool dequeued_page;
> +	LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
>  
>  	dequeued_page = false;
> -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> +	/*
> +	 * We need to go over b_dev_info->pages and lock each page,
> +	 * but b_dev_info->pages_lock must nest within page lock.
> +	 *
> +	 * To make this safe, remove each page from b_dev_info->pages list
> +	 * under b_dev_info->pages_lock, then drop this lock. Once list is
> +	 * empty, re-add them also under b_dev_info->pages_lock.
> +	 */
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	while (!list_empty(&b_dev_info->pages)) {
> +		page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
> +		/* move to processed list to avoid going over it another time */
> +		list_move(&page->lru, &processed);
> +
> +		if (!get_page_unless_zero(page))
> +			continue;
> +		/*
> +		 * pages_lock nests within page lock,
> +		 * so drop it before trylock_page
> +		 */
> +		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +
>  		/*
>  		 * Block others from accessing the 'page' while we get around
>  		 * establishing additional references and preparing the 'page'
> @@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			if (!PagePrivate(page)) {
>  				/* raced with isolation */
>  				unlock_page(page);
> +				put_page(page);
>  				continue;
>  			}
>  #endif
> @@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
>  			__count_vm_event(BALLOON_DEFLATE);
>  			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>  			unlock_page(page);
> +			put_page(page);
>  			dequeued_page = true;
>  			break;
                        ^^^^[1]

>  		}
> +		put_page(page);
> +		spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>  	}
>  
> +	/* re-add remaining entries */
> +	list_splice(&processed, &b_dev_info->pages);

By breaking the loop at its ordinary and expected way-out case [1] 
we'll hit list_splice without holding b_dev_info->pages_lock, won't we?

perhaps by adding the following on top of your patch we can address that pickle
aforementioned:

Cheers!
Rafael
--

diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 66d69c5..74b3e9c 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -58,7 +58,7 @@ struct page *balloon_page_dequeue(struct
balloon_dev_info *b_dev_info)
 {
        struct page *page;
        unsigned long flags;
-       bool dequeued_page;
+       bool dequeued_page, locked;
        LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
 
        dequeued_page = false;
@@ -105,13 +105,17 @@ struct page *balloon_page_dequeue(struct
balloon_dev_info *b_dev_info)
                        unlock_page(page);
                        put_page(page);
                        dequeued_page = true;
+                       locked = false;
                        break;
                }
                put_page(page);
                spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+               locked = true;
        }
 
        /* re-add remaining entries */
+       if (!locked)
+               spin_lock_irqsave(&b_dev_info->pages_lock, flags);
        list_splice(&processed, &b_dev_info->pages);
        spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-08 19:56     ` Rafael Aquini
@ 2016-01-08 23:43       ` Minchan Kim
  2016-01-09 21:43       ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Minchan Kim @ 2016-01-08 23:43 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: Michael S. Tsirkin, Andrew Morton, linux-mm, linux-kernel,
	virtualization, Konstantin Khlebnikov, stable

On Fri, Jan 08, 2016 at 02:56:14PM -0500, Rafael Aquini wrote:
> On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> > > In balloon_page_dequeue, pages_lock should cover the loop
> > > (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> > > be isolated by compaction and then list_del by isolation could
> > > poison the page->lru.{prev,next} so the loop finally could
> > > access wrong address like this. This patch fixes the bug.
> > > 
> > > general protection fault: 0000 [#1] SMP
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> > > RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > > RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> > > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> > > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> > > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> > > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> > > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> > > FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> > > Stack:
> > >  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> > >  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> > >  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> > > Call Trace:
> > >  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> > >  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> > >  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> > >  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> > >  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> > >  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> > >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > >  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> > >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> > > RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > >  RSP <ffff8800a7fefdc0>
> > > ---[ end trace 43cf28060d708d5f ]---
> > > Kernel panic - not syncing: Fatal exception
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Kernel Offset: disabled
> > > 
> > > Cc: <stable@vger.kernel.org>
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > ---
> > >  mm/balloon_compaction.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > > index d3116be5a00f..300117f1a08f 100644
> > > --- a/mm/balloon_compaction.c
> > > +++ b/mm/balloon_compaction.c
> > > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  	bool dequeued_page;
> > >  
> > >  	dequeued_page = false;
> > > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > >  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> > >  		/*
> > >  		 * Block others from accessing the 'page' while we get around
> > > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  				continue;
> > >  			}
> > >  #endif
> > > -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > >  			balloon_page_delete(page);
> > >  			__count_vm_event(BALLOON_DEFLATE);
> > > -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > >  			unlock_page(page);
> > >  			dequeued_page = true;
> > >  			break;
> > >  		}
> > >  	}
> > > +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > >  
> > >  	if (!dequeued_page) {
> > >  		/*
> > 
> > I think this will cause deadlocks.
> > 
> > pages_lock now nests within page lock, balloon_page_putback
> > nests them in the reverse order.
> > 
> > Did you test this with lockdep? You really should for
> > locking changes, and I'd expect it to warn about this.
> > 
> > Also, there's another issue there I think: after isolation page could
> > also get freed before we try to lock it.
> > 
> > We really must take a page reference before touching
> > the page.
> > 
> > I think we need something like the below to fix this issue.
> > Could you please try this out, and send Tested-by?
> > I will repost as a proper patch if this works for you.
> >
> 
> Nice catch! Thanks for spotting it. I just have one minor nit. See
> below

Hmm, As I replied mst's mail, I really cannot understand what you
gouys are pointing out.

If we use lock_page in balloon_page_dequeue, I agree it's deadlock
but we used trylock_page so it's not a deadlock.

About the page refcount, we don't need take a page reference because
if one of page in the list was isolated for migration, the page
shouldn't stay in b_dev_info->pages list so balloon_page_dequeue
cannot touch the page.

Could you elaborate it more detail if I have missed something?
It's a stable patch so I want to be careful.

Thanks.

balloon_page_isolate
{
        trylock_page(page)
        spin_lock_irqsave(&b_dev_info->pages_lock)
        list_del(&page->lru);
}

balloon_page_dequeue
{
        spin_lock_irqsave(&b_dev_info->pages_lock)
        list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
                trylock_page(page)
        }

}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-08 19:56     ` Rafael Aquini
  2016-01-08 23:43       ` Minchan Kim
@ 2016-01-09 21:43       ` Michael S. Tsirkin
  2016-01-09 23:03         ` Rafael Aquini
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-01-09 21:43 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel,
	virtualization, Konstantin Khlebnikov, stable

On Fri, Jan 08, 2016 at 02:56:14PM -0500, Rafael Aquini wrote:
> On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> > > In balloon_page_dequeue, pages_lock should cover the loop
> > > (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> > > be isolated by compaction and then list_del by isolation could
> > > poison the page->lru.{prev,next} so the loop finally could
> > > access wrong address like this. This patch fixes the bug.
> > > 
> > > general protection fault: 0000 [#1] SMP
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Modules linked in:
> > > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> > > RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > > RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> > > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> > > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> > > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> > > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> > > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> > > FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> > > Stack:
> > >  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> > >  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> > >  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> > > Call Trace:
> > >  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> > >  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> > >  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> > >  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> > >  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> > >  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> > >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > >  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> > >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> > > RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > >  RSP <ffff8800a7fefdc0>
> > > ---[ end trace 43cf28060d708d5f ]---
> > > Kernel panic - not syncing: Fatal exception
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > Kernel Offset: disabled
> > > 
> > > Cc: <stable@vger.kernel.org>
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > ---
> > >  mm/balloon_compaction.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > > index d3116be5a00f..300117f1a08f 100644
> > > --- a/mm/balloon_compaction.c
> > > +++ b/mm/balloon_compaction.c
> > > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  	bool dequeued_page;
> > >  
> > >  	dequeued_page = false;
> > > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > >  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> > >  		/*
> > >  		 * Block others from accessing the 'page' while we get around
> > > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  				continue;
> > >  			}
> > >  #endif
> > > -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > >  			balloon_page_delete(page);
> > >  			__count_vm_event(BALLOON_DEFLATE);
> > > -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > >  			unlock_page(page);
> > >  			dequeued_page = true;
> > >  			break;
> > >  		}
> > >  	}
> > > +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > >  
> > >  	if (!dequeued_page) {
> > >  		/*
> > 
> > I think this will cause deadlocks.
> > 
> > pages_lock now nests within page lock, balloon_page_putback
> > nests them in the reverse order.
> > 
> > Did you test this with lockdep? You really should for
> > locking changes, and I'd expect it to warn about this.
> > 
> > Also, there's another issue there I think: after isolation page could
> > also get freed before we try to lock it.
> > 
> > We really must take a page reference before touching
> > the page.
> > 
> > I think we need something like the below to fix this issue.
> > Could you please try this out, and send Tested-by?
> > I will repost as a proper patch if this works for you.
> >
> 
> Nice catch! Thanks for spotting it. I just have one minor nit. See
> below
>  
> > 
> > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > index d3116be..66d69c5 100644
> > --- a/mm/balloon_compaction.c
> > +++ b/mm/balloon_compaction.c
> > @@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
> >   */
> >  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  {
> > -	struct page *page, *tmp;
> > +	struct page *page;
> >  	unsigned long flags;
> >  	bool dequeued_page;
> > +	LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
> >  
> >  	dequeued_page = false;
> > -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> > +	/*
> > +	 * We need to go over b_dev_info->pages and lock each page,
> > +	 * but b_dev_info->pages_lock must nest within page lock.
> > +	 *
> > +	 * To make this safe, remove each page from b_dev_info->pages list
> > +	 * under b_dev_info->pages_lock, then drop this lock. Once list is
> > +	 * empty, re-add them also under b_dev_info->pages_lock.
> > +	 */
> > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > +	while (!list_empty(&b_dev_info->pages)) {
> > +		page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
> > +		/* move to processed list to avoid going over it another time */
> > +		list_move(&page->lru, &processed);
> > +
> > +		if (!get_page_unless_zero(page))
> > +			continue;
> > +		/*
> > +		 * pages_lock nests within page lock,
> > +		 * so drop it before trylock_page
> > +		 */
> > +		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > +
> >  		/*
> >  		 * Block others from accessing the 'page' while we get around
> >  		 * establishing additional references and preparing the 'page'
> > @@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  			if (!PagePrivate(page)) {
> >  				/* raced with isolation */
> >  				unlock_page(page);
> > +				put_page(page);
> >  				continue;
> >  			}
> >  #endif
> > @@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> >  			__count_vm_event(BALLOON_DEFLATE);
> >  			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> >  			unlock_page(page);
> > +			put_page(page);
> >  			dequeued_page = true;
> >  			break;
>                         ^^^^[1]
> 
> >  		}
> > +		put_page(page);
> > +		spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> >  	}
> >  
> > +	/* re-add remaining entries */
> > +	list_splice(&processed, &b_dev_info->pages);
> 
> By breaking the loop at its ordinary and expected way-out case [1] 
> we'll hit list_splice without holding b_dev_info->pages_lock, won't we?

Ouch. right.

> perhaps by adding the following on top of your patch we can address that pickle
> aforementioned:

I'd rather just goto outside or return.
But maybe Minchan is right and the original patch is ok.
I still need to go into this.

> Cheers!
> Rafael
> --
> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index 66d69c5..74b3e9c 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -58,7 +58,7 @@ struct page *balloon_page_dequeue(struct
> balloon_dev_info *b_dev_info)
>  {
>         struct page *page;
>         unsigned long flags;
> -       bool dequeued_page;
> +       bool dequeued_page, locked;
>         LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
>  
>         dequeued_page = false;
> @@ -105,13 +105,17 @@ struct page *balloon_page_dequeue(struct
> balloon_dev_info *b_dev_info)
>                         unlock_page(page);
>                         put_page(page);
>                         dequeued_page = true;
> +                       locked = false;
>                         break;
>                 }
>                 put_page(page);
>                 spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +               locked = true;
>         }
>  
>         /* re-add remaining entries */
> +       if (!locked)
> +               spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>         list_splice(&processed, &b_dev_info->pages);
>         spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-09 21:43       ` Michael S. Tsirkin
@ 2016-01-09 23:03         ` Rafael Aquini
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael Aquini @ 2016-01-09 23:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Minchan Kim, Andrew Morton, linux-mm, linux-kernel,
	virtualization, Konstantin Khlebnikov, stable

On Sat, Jan 09, 2016 at 11:43:31PM +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 08, 2016 at 02:56:14PM -0500, Rafael Aquini wrote:
> > On Fri, Jan 01, 2016 at 11:36:13AM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Dec 28, 2015 at 08:35:13AM +0900, Minchan Kim wrote:
> > > > In balloon_page_dequeue, pages_lock should cover the loop
> > > > (ie, list_for_each_entry_safe). Otherwise, the cursor page could
> > > > be isolated by compaction and then list_del by isolation could
> > > > poison the page->lru.{prev,next} so the loop finally could
> > > > access wrong address like this. This patch fixes the bug.
> > > > 
> > > > general protection fault: 0000 [#1] SMP
> > > > Dumping ftrace buffer:
> > > >    (ftrace buffer empty)
> > > > Modules linked in:
> > > > CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-mm1-access_bit+ #1906
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > > task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
> > > > RIP: 0010:[<ffffffff8115e754>]  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > > > RSP: 0018:ffff8800a7fefdc0  EFLAGS: 00010246
> > > > RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
> > > > RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
> > > > RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
> > > > R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
> > > > R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
> > > > FS:  0000000000000000(0000) GS:ffff88013fc40000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
> > > > Stack:
> > > >  0000000000000100 ffff880138809088 ffff880138809000 ffff880138809060
> > > >  0000000000000046 ffff8800a7fefe28 ffffffff812c86d3 ffff880138809020
> > > >  ffff880138809000 fffffffffff91900 0000000000000100 ffff880138809060
> > > > Call Trace:
> > > >  [<ffffffff812c86d3>] leak_balloon+0x93/0x1a0
> > > >  [<ffffffff812c8bc7>] balloon+0x217/0x2a0
> > > >  [<ffffffff8143739e>] ? __schedule+0x31e/0x8b0
> > > >  [<ffffffff81078160>] ? abort_exclusive_wait+0xb0/0xb0
> > > >  [<ffffffff812c89b0>] ? update_balloon_stats+0xf0/0xf0
> > > >  [<ffffffff8105b6e9>] kthread+0xc9/0xe0
> > > >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > > >  [<ffffffff8143b4af>] ret_from_fork+0x3f/0x70
> > > >  [<ffffffff8105b620>] ? kthread_park+0x60/0x60
> > > > Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
> > > > RIP  [<ffffffff8115e754>] balloon_page_dequeue+0x54/0x130
> > > >  RSP <ffff8800a7fefdc0>
> > > > ---[ end trace 43cf28060d708d5f ]---
> > > > Kernel panic - not syncing: Fatal exception
> > > > Dumping ftrace buffer:
> > > >    (ftrace buffer empty)
> > > > Kernel Offset: disabled
> > > > 
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > > ---
> > > >  mm/balloon_compaction.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > > > index d3116be5a00f..300117f1a08f 100644
> > > > --- a/mm/balloon_compaction.c
> > > > +++ b/mm/balloon_compaction.c
> > > > @@ -61,6 +61,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > > >  	bool dequeued_page;
> > > >  
> > > >  	dequeued_page = false;
> > > > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > > >  	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> > > >  		/*
> > > >  		 * Block others from accessing the 'page' while we get around
> > > > @@ -75,15 +76,14 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > > >  				continue;
> > > >  			}
> > > >  #endif
> > > > -			spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > > >  			balloon_page_delete(page);
> > > >  			__count_vm_event(BALLOON_DEFLATE);
> > > > -			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > > >  			unlock_page(page);
> > > >  			dequeued_page = true;
> > > >  			break;
> > > >  		}
> > > >  	}
> > > > +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > > >  
> > > >  	if (!dequeued_page) {
> > > >  		/*
> > > 
> > > I think this will cause deadlocks.
> > > 
> > > pages_lock now nests within page lock, balloon_page_putback
> > > nests them in the reverse order.
> > > 
> > > Did you test this with lockdep? You really should for
> > > locking changes, and I'd expect it to warn about this.
> > > 
> > > Also, there's another issue there I think: after isolation page could
> > > also get freed before we try to lock it.
> > > 
> > > We really must take a page reference before touching
> > > the page.
> > > 
> > > I think we need something like the below to fix this issue.
> > > Could you please try this out, and send Tested-by?
> > > I will repost as a proper patch if this works for you.
> > >
> > 
> > Nice catch! Thanks for spotting it. I just have one minor nit. See
> > below
> >  
> > > 
> > > diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> > > index d3116be..66d69c5 100644
> > > --- a/mm/balloon_compaction.c
> > > +++ b/mm/balloon_compaction.c
> > > @@ -56,12 +56,34 @@ EXPORT_SYMBOL_GPL(balloon_page_enqueue);
> > >   */
> > >  struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  {
> > > -	struct page *page, *tmp;
> > > +	struct page *page;
> > >  	unsigned long flags;
> > >  	bool dequeued_page;
> > > +	LIST_HEAD(processed); /* protected by b_dev_info->pages_lock */
> > >  
> > >  	dequeued_page = false;
> > > -	list_for_each_entry_safe(page, tmp, &b_dev_info->pages, lru) {
> > > +	/*
> > > +	 * We need to go over b_dev_info->pages and lock each page,
> > > +	 * but b_dev_info->pages_lock must nest within page lock.
> > > +	 *
> > > +	 * To make this safe, remove each page from b_dev_info->pages list
> > > +	 * under b_dev_info->pages_lock, then drop this lock. Once list is
> > > +	 * empty, re-add them also under b_dev_info->pages_lock.
> > > +	 */
> > > +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > > +	while (!list_empty(&b_dev_info->pages)) {
> > > +		page = list_first_entry(&b_dev_info->pages, typeof(*page), lru);
> > > +		/* move to processed list to avoid going over it another time */
> > > +		list_move(&page->lru, &processed);
> > > +
> > > +		if (!get_page_unless_zero(page))
> > > +			continue;
> > > +		/*
> > > +		 * pages_lock nests within page lock,
> > > +		 * so drop it before trylock_page
> > > +		 */
> > > +		spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > > +
> > >  		/*
> > >  		 * Block others from accessing the 'page' while we get around
> > >  		 * establishing additional references and preparing the 'page'
> > > @@ -72,6 +94,7 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  			if (!PagePrivate(page)) {
> > >  				/* raced with isolation */
> > >  				unlock_page(page);
> > > +				put_page(page);
> > >  				continue;
> > >  			}
> > >  #endif
> > > @@ -80,11 +103,18 @@ struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info)
> > >  			__count_vm_event(BALLOON_DEFLATE);
> > >  			spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> > >  			unlock_page(page);
> > > +			put_page(page);
> > >  			dequeued_page = true;
> > >  			break;
> >                         ^^^^[1]
> > 
> > >  		}
> > > +		put_page(page);
> > > +		spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> > >  	}
> > >  
> > > +	/* re-add remaining entries */
> > > +	list_splice(&processed, &b_dev_info->pages);
> > 
> > By breaking the loop at its ordinary and expected way-out case [1] 
> > we'll hit list_splice without holding b_dev_info->pages_lock, won't we?
> 
> Ouch. right.
> 
> > perhaps by adding the following on top of your patch we can address that pickle
> > aforementioned:
> 
> I'd rather just goto outside or return.
> But maybe Minchan is right and the original patch is ok.
> I still need to go into this.
>

I went back and did follow up Minchan's argument and his old patch and, 
yes, I think we're fine there because, as he states, we're using trylock_page()
in those points page lock will nest into b_dev_info->pages_lock.
OTOH, I understood your work on making sure we would follow up lock nesting
order with strict correctness.

As Minchan's approach keeps the code simpler, I'm voting for it. 

Cheers!
-- Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-04  0:27     ` Minchan Kim
@ 2016-01-10 21:40       ` Michael S. Tsirkin
  2016-01-10 23:54         ` Minchan Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 21:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, stable

On Mon, Jan 04, 2016 at 09:27:47AM +0900, Minchan Kim wrote:
> > I think this will cause deadlocks.
> > 
> > pages_lock now nests within page lock, balloon_page_putback
> > nests them in the reverse order.
> 
> In balloon_page_dequeu, we used trylock so I don't think it's
> deadlock.

I went over this again and I don't see the issue anymore.
I think I was mistaken, so I dropped my patch and picked
up yours. Sorry about the noise.


> > 
> > Also, there's another issue there I think: after isolation page could
> > also get freed before we try to lock it.
> 
> If a page was isolated, the page shouldn't stay b_dev_info->pages
> list so balloon_page_dequeue cannot see the page.
> Am I missing something?

I mean without locks, as it is now. With either your or my patch in
place, it's fine.

-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] virtio_balloon: fix race between migration and ballooning
  2016-01-10 21:40       ` Michael S. Tsirkin
@ 2016-01-10 23:54         ` Minchan Kim
  0 siblings, 0 replies; 13+ messages in thread
From: Minchan Kim @ 2016-01-10 23:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Andrew Morton, linux-mm, linux-kernel, virtualization,
	Konstantin Khlebnikov, Rafael Aquini, stable

On Sun, Jan 10, 2016 at 11:40:17PM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 04, 2016 at 09:27:47AM +0900, Minchan Kim wrote:
> > > I think this will cause deadlocks.
> > > 
> > > pages_lock now nests within page lock, balloon_page_putback
> > > nests them in the reverse order.
> > 
> > In balloon_page_dequeu, we used trylock so I don't think it's
> > deadlock.
> 
> I went over this again and I don't see the issue anymore.
> I think I was mistaken, so I dropped my patch and picked
> up yours. Sorry about the noise.

No problem. Thanks for the review.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-01-10 23:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-27 23:35 [PATCH 1/2] virtio_balloon: fix race by fill and leak Minchan Kim
2015-12-27 23:35 ` [PATCH 2/2] virtio_balloon: fix race between migration and ballooning Minchan Kim
2015-12-27 23:36   ` Rafael Aquini
2016-01-01  9:36   ` Michael S. Tsirkin
2016-01-04  0:27     ` Minchan Kim
2016-01-10 21:40       ` Michael S. Tsirkin
2016-01-10 23:54         ` Minchan Kim
2016-01-08 19:56     ` Rafael Aquini
2016-01-08 23:43       ` Minchan Kim
2016-01-09 21:43       ` Michael S. Tsirkin
2016-01-09 23:03         ` Rafael Aquini
2015-12-27 23:36 ` [PATCH 1/2] virtio_balloon: fix race by fill and leak Rafael Aquini
2016-01-01  8:26 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).