All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-01 22:20 ` Andrea Righi
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-01 22:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

I've just experienced this bug with ksmd:

[   55.837551] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
[   55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
[   55.837630] PGD 0 
[   55.837643] Oops: 0000 [#1] SMP 
[   55.837663] CPU 2 
[   55.837674] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211 usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
[   55.837816] 
[   55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO 4286CTO/4286CTO
[   55.837850] RIP: 0010:[<ffffffff810bb9b2>]  [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
[   55.837878] RSP: 0018:ffff88023d3abc50  EFLAGS: 00010046
[   55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX: 0000000000000001
[   55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000e8
[   55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09: 0000000000000000
[   55.837957] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023d3a3e00
[   55.837978] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[   55.837999] FS:  0000000000000000(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000
[   55.838022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4: 00000000000406e0
[   55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task ffff88023d3a3e00)
[   55.838131] Stack:
[   55.838140]  ffff88023d3abce0 0000000000000000 ffffffff81d46810 00000000000012c7
[   55.838168]  000000000000037c ffff88023d3a3e00 0000000000000001 0000000000000000
[   55.838338]  0000000000000000 0000000000000000 00000000001ba37c ffffffff81a22000
[   55.838365] Call Trace:
[   55.838375]  [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
[   55.838394]  [<ffffffff814e3360>] ? _raw_spin_unlock_irqrestore+0x40/0x70
[   55.838416]  [<ffffffff810bdc90>] lock_acquire+0x90/0x110
[   55.838434]  [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
[   55.838453]  [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
[   55.838470]  [<ffffffff814e1cbc>] down_read+0x4c/0x70
[   55.838486]  [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
[   55.838505]  [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
[   55.838523]  [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
[   55.838541]  [<ffffffff814df822>] ? schedule+0x3b2/0x960
[   55.838559]  [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
[   55.838576]  [<ffffffff8114c520>] ? run_store+0x310/0x310
[   55.838593]  [<ffffffff810a5186>] kthread+0x96/0xa0
[   55.838609]  [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
[   55.838628]  [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
[   55.838647]  [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70
[   55.838666]  [<ffffffff814e5010>] ? gs_change+0xb/0xb
[   55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6 01 00 00 
[   55.838780]  8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83 fd 
[   55.838830] RIP  [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
[   55.838850]  RSP <ffff88023d3abc50>
[   55.839567] CR2: 00000000000000e8
[   55.895721] ---[ end trace eea0fa5dfa6846f1 ]---

The bug can be easily reproduced using the following testcase:

========================
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define BUFSIZE getpagesize()

int main(int argc, char **argv)
{
	void *ptr;

	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
		perror("posix_memalign");
		exit(1);
	}
	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
		perror("madvise");
		exit(1);
	}
	*(char *)NULL = 0;

	return 0;
}
========================

It seems that when a task segfaults mm_slot->mm becomes NULL, but it's
still wrongly considered by the ksm scan. Is there a race with
__ksm_exit()?

Probably the following is not the right way to fix it, but if I apply
this the problem disappears. Anyway, I'm posting this information, it
can help you to debug the problem better.

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 mm/ksm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..f457feb 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1308,6 +1308,8 @@ next_mm:
 	}
 
 	mm = slot->mm;
+	if (unlikely(!mm))
+		return NULL;
 	down_read(&mm->mmap_sem);
 	if (ksm_test_exit(mm))
 		vma = NULL;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-01 22:20 ` Andrea Righi
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-01 22:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

I've just experienced this bug with ksmd:

[   55.837551] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
[   55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
[   55.837630] PGD 0 
[   55.837643] Oops: 0000 [#1] SMP 
[   55.837663] CPU 2 
[   55.837674] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211 usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
[   55.837816] 
[   55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO 4286CTO/4286CTO
[   55.837850] RIP: 0010:[<ffffffff810bb9b2>]  [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
[   55.837878] RSP: 0018:ffff88023d3abc50  EFLAGS: 00010046
[   55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX: 0000000000000001
[   55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000e8
[   55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09: 0000000000000000
[   55.837957] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023d3a3e00
[   55.837978] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[   55.837999] FS:  0000000000000000(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000
[   55.838022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4: 00000000000406e0
[   55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task ffff88023d3a3e00)
[   55.838131] Stack:
[   55.838140]  ffff88023d3abce0 0000000000000000 ffffffff81d46810 00000000000012c7
[   55.838168]  000000000000037c ffff88023d3a3e00 0000000000000001 0000000000000000
[   55.838338]  0000000000000000 0000000000000000 00000000001ba37c ffffffff81a22000
[   55.838365] Call Trace:
[   55.838375]  [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
[   55.838394]  [<ffffffff814e3360>] ? _raw_spin_unlock_irqrestore+0x40/0x70
[   55.838416]  [<ffffffff810bdc90>] lock_acquire+0x90/0x110
[   55.838434]  [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
[   55.838453]  [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
[   55.838470]  [<ffffffff814e1cbc>] down_read+0x4c/0x70
[   55.838486]  [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
[   55.838505]  [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
[   55.838523]  [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
[   55.838541]  [<ffffffff814df822>] ? schedule+0x3b2/0x960
[   55.838559]  [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
[   55.838576]  [<ffffffff8114c520>] ? run_store+0x310/0x310
[   55.838593]  [<ffffffff810a5186>] kthread+0x96/0xa0
[   55.838609]  [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
[   55.838628]  [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
[   55.838647]  [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70
[   55.838666]  [<ffffffff814e5010>] ? gs_change+0xb/0xb
[   55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6 01 00 00 
[   55.838780]  8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83 fd 
[   55.838830] RIP  [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
[   55.838850]  RSP <ffff88023d3abc50>
[   55.839567] CR2: 00000000000000e8
[   55.895721] ---[ end trace eea0fa5dfa6846f1 ]---

The bug can be easily reproduced using the following testcase:

========================
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define BUFSIZE getpagesize()

int main(int argc, char **argv)
{
	void *ptr;

	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
		perror("posix_memalign");
		exit(1);
	}
	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
		perror("madvise");
		exit(1);
	}
	*(char *)NULL = 0;

	return 0;
}
========================

It seems that when a task segfaults mm_slot->mm becomes NULL, but it's
still wrongly considered by the ksm scan. Is there a race with
__ksm_exit()?

Probably the following is not the right way to fix it, but if I apply
this the problem disappears. Anyway, I'm posting this information, it
can help you to debug the problem better.

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 mm/ksm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..f457feb 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1308,6 +1308,8 @@ next_mm:
 	}
 
 	mm = slot->mm;
+	if (unlikely(!mm))
+		return NULL;
 	down_read(&mm->mmap_sem);
 	if (ksm_test_exit(mm))
 		vma = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-01 22:20 ` Andrea Righi
@ 2011-06-02  1:53   ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02  1:53 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Chris Wright, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel

* Andrea Righi (andrea@betterlinux.com) wrote:
> The bug can be easily reproduced using the following testcase:

Thanks for the testcase.

> ========================
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> 
> #define BUFSIZE getpagesize()
> 
> int main(int argc, char **argv)
> {
> 	void *ptr;
> 
> 	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> 		perror("posix_memalign");
> 		exit(1);
> 	}
> 	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> 		perror("madvise");
> 		exit(1);
> 	}
> 	*(char *)NULL = 0;
> 
> 	return 0;
> }
> ========================
> 
> It seems that when a task segfaults mm_slot->mm becomes NULL, but it's
> still wrongly considered by the ksm scan. Is there a race with
> __ksm_exit()?

Hmm, wonder if khugepaged has the same issue too.  We should be holding
a reference to ->mm, but we seem to have inconsistent serialization w/
mmap_sem.  Hugh mentioned some of these concerns when introducing
ksm_exit.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02  1:53   ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02  1:53 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Chris Wright, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel

* Andrea Righi (andrea@betterlinux.com) wrote:
> The bug can be easily reproduced using the following testcase:

Thanks for the testcase.

> ========================
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> 
> #define BUFSIZE getpagesize()
> 
> int main(int argc, char **argv)
> {
> 	void *ptr;
> 
> 	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> 		perror("posix_memalign");
> 		exit(1);
> 	}
> 	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> 		perror("madvise");
> 		exit(1);
> 	}
> 	*(char *)NULL = 0;
> 
> 	return 0;
> }
> ========================
> 
> It seems that when a task segfaults mm_slot->mm becomes NULL, but it's
> still wrongly considered by the ksm scan. Is there a race with
> __ksm_exit()?

Hmm, wonder if khugepaged has the same issue too.  We should be holding
a reference to ->mm, but we seem to have inconsistent serialization w/
mmap_sem.  Hugh mentioned some of these concerns when introducing
ksm_exit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-01 22:20 ` Andrea Righi
@ 2011-06-02  7:09   ` CAI Qian
  -1 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-02  7:09 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

Hello,

----- Original Message -----
> I've just experienced this bug with ksmd:
> 
> [ 55.837551] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000e8
> [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> [ 55.837630] PGD 0
> [ 55.837643] Oops: 0000 [#1] SMP
> [ 55.837663] CPU 2
> [ 55.837674] Modules linked in: snd_hda_codec_hdmi
> snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211
> usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e
> thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
> [ 55.837816]
> [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO
> 4286CTO/4286CTO
> [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>]
> __lock_acquire+0x62/0x1d70
> [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046
> [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX:
> 0000000000000001
> [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 00000000000000e8
> [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09:
> 0000000000000000
> [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12:
> ffff88023d3a3e00
> [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15:
> 0000000000000000
> [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000)
> knlGS:0000000000000000
> [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4:
> 00000000000406e0
> [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task
> ffff88023d3a3e00)
> [ 55.838131] Stack:
> [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810
> 00000000000012c7
> [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001
> 0000000000000000
> [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c
> ffffffff81a22000
> [ 55.838365] Call Trace:
> [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
> [ 55.838394] [<ffffffff814e3360>] ?
> _raw_spin_unlock_irqrestore+0x40/0x70
> [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110
> [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
> [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70
> [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
> [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
> [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960
> [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
> [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310
> [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0
> [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
> [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
> [ 55.838647] [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70
> [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb
> [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05
> 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6
> 01 00 00
> [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83
> fd
> [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> [ 55.838850] RSP <ffff88023d3abc50>
> [ 55.839567] CR2: 00000000000000e8
> [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]---
> 
> The bug can be easily reproduced using the following testcase:
> 
> ========================
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> 
> #define BUFSIZE getpagesize()
> 
> int main(int argc, char **argv)
> {
> void *ptr;
> 
> if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> perror("posix_memalign");
> exit(1);
> }
> if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> perror("madvise");
> exit(1);
> }
> *(char *)NULL = 0;
Hmm, the reproducer gave something else here but no panic.
$ strace ./test 
execve("./test", ["./test"], [/* 26 vars */]) = 0
brk(0)                                  = 0x220f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ec0a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=41227, ...}) = 0
mmap(NULL, 41227, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd18ebff000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\355\341n<\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1912928, ...}) = 0
mmap(0x3c6ee00000, 3737768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c6ee00000
mprotect(0x3c6ef87000, 2097152, PROT_NONE) = 0
mmap(0x3c6f187000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x3c6f187000
mmap(0x3c6f18c000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c6f18c000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfc000
arch_prctl(ARCH_SET_FS, 0x7fd18ebfd700) = 0
mprotect(0x3c6f187000, 16384, PROT_READ) = 0
mprotect(0x3c6e81f000, 4096, PROT_READ) = 0
munmap(0x7fd18ebff000, 41227)           = 0
brk(0)                                  = 0x220f000
brk(0x2232000)                          = 0x2232000
madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Did I miss anything?

Thanks,
CAI Qian
> return 0;
> }
> ========================
> 
> It seems that when a task segfaults mm_slot->mm becomes NULL, but it's
> still wrongly considered by the ksm scan. Is there a race with
> __ksm_exit()?
> 
> Probably the following is not the right way to fix it, but if I apply
> this the problem disappears. Anyway, I'm posting this information, it
> can help you to debug the problem better.
> 
> Signed-off-by: Andrea Righi <andrea@betterlinux.com>
> ---
> mm/ksm.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d708b3e..f457feb 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1308,6 +1308,8 @@ next_mm:
> }
> 
> mm = slot->mm;
> + if (unlikely(!mm))
> + return NULL;
> down_read(&mm->mmap_sem);
> if (ksm_test_exit(mm))
> vma = NULL;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02  7:09   ` CAI Qian
  0 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-02  7:09 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

Hello,

----- Original Message -----
> I've just experienced this bug with ksmd:
> 
> [ 55.837551] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000e8
> [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> [ 55.837630] PGD 0
> [ 55.837643] Oops: 0000 [#1] SMP
> [ 55.837663] CPU 2
> [ 55.837674] Modules linked in: snd_hda_codec_hdmi
> snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211
> usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e
> thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
> [ 55.837816]
> [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO
> 4286CTO/4286CTO
> [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>]
> __lock_acquire+0x62/0x1d70
> [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046
> [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX:
> 0000000000000001
> [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 00000000000000e8
> [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09:
> 0000000000000000
> [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12:
> ffff88023d3a3e00
> [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15:
> 0000000000000000
> [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000)
> knlGS:0000000000000000
> [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4:
> 00000000000406e0
> [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task
> ffff88023d3a3e00)
> [ 55.838131] Stack:
> [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810
> 00000000000012c7
> [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001
> 0000000000000000
> [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c
> ffffffff81a22000
> [ 55.838365] Call Trace:
> [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
> [ 55.838394] [<ffffffff814e3360>] ?
> _raw_spin_unlock_irqrestore+0x40/0x70
> [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110
> [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
> [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70
> [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
> [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
> [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960
> [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
> [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310
> [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0
> [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
> [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
> [ 55.838647] [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70
> [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb
> [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05
> 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6
> 01 00 00
> [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83
> fd
> [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> [ 55.838850] RSP <ffff88023d3abc50>
> [ 55.839567] CR2: 00000000000000e8
> [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]---
> 
> The bug can be easily reproduced using the following testcase:
> 
> ========================
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
> 
> #define BUFSIZE getpagesize()
> 
> int main(int argc, char **argv)
> {
> void *ptr;
> 
> if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> perror("posix_memalign");
> exit(1);
> }
> if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> perror("madvise");
> exit(1);
> }
> *(char *)NULL = 0;
Hmm, the reproducer gave something else here but no panic.
$ strace ./test 
execve("./test", ["./test"], [/* 26 vars */]) = 0
brk(0)                                  = 0x220f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ec0a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=41227, ...}) = 0
mmap(NULL, 41227, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd18ebff000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\355\341n<\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1912928, ...}) = 0
mmap(0x3c6ee00000, 3737768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c6ee00000
mprotect(0x3c6ef87000, 2097152, PROT_NONE) = 0
mmap(0x3c6f187000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x3c6f187000
mmap(0x3c6f18c000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c6f18c000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfc000
arch_prctl(ARCH_SET_FS, 0x7fd18ebfd700) = 0
mprotect(0x3c6f187000, 16384, PROT_READ) = 0
mprotect(0x3c6e81f000, 4096, PROT_READ) = 0
munmap(0x7fd18ebff000, 41227)           = 0
brk(0)                                  = 0x220f000
brk(0x2232000)                          = 0x2232000
madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Did I miss anything?

Thanks,
CAI Qian
> return 0;
> }
> ========================
> 
> It seems that when a task segfaults mm_slot->mm becomes NULL, but it's
> still wrongly considered by the ksm scan. Is there a race with
> __ksm_exit()?
> 
> Probably the following is not the right way to fix it, but if I apply
> this the problem disappears. Anyway, I'm posting this information, it
> can help you to debug the problem better.
> 
> Signed-off-by: Andrea Righi <andrea@betterlinux.com>
> ---
> mm/ksm.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d708b3e..f457feb 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1308,6 +1308,8 @@ next_mm:
> }
> 
> mm = slot->mm;
> + if (unlikely(!mm))
> + return NULL;
> down_read(&mm->mmap_sem);
> if (ksm_test_exit(mm))
> vma = NULL;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign
> http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02  7:09   ` CAI Qian
@ 2011-06-02 14:19     ` Andrea Righi
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 14:19 UTC (permalink / raw)
  To: CAI Qian
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 03:09:53AM -0400, CAI Qian wrote:
> Hello,
> 
> ----- Original Message -----
> > I've just experienced this bug with ksmd:
> > 
> > [ 55.837551] BUG: unable to handle kernel NULL pointer dereference at
> > 00000000000000e8
> > [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > [ 55.837630] PGD 0
> > [ 55.837643] Oops: 0000 [#1] SMP
> > [ 55.837663] CPU 2
> > [ 55.837674] Modules linked in: snd_hda_codec_hdmi
> > snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211
> > usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e
> > thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
> > [ 55.837816]
> > [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO
> > 4286CTO/4286CTO
> > [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>]
> > __lock_acquire+0x62/0x1d70
> > [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046
> > [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX:
> > 0000000000000001
> > [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 00000000000000e8
> > [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09:
> > 0000000000000000
> > [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12:
> > ffff88023d3a3e00
> > [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15:
> > 0000000000000000
> > [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000)
> > knlGS:0000000000000000
> > [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4:
> > 00000000000406e0
> > [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task
> > ffff88023d3a3e00)
> > [ 55.838131] Stack:
> > [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810
> > 00000000000012c7
> > [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001
> > 0000000000000000
> > [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c
> > ffffffff81a22000
> > [ 55.838365] Call Trace:
> > [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
> > [ 55.838394] [<ffffffff814e3360>] ?
> > _raw_spin_unlock_irqrestore+0x40/0x70
> > [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110
> > [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
> > [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70
> > [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
> > [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
> > [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960
> > [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
> > [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310
> > [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0
> > [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
> > [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
> > [ 55.838647] [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70
> > [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb
> > [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05
> > 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6
> > 01 00 00
> > [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83
> > fd
> > [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > [ 55.838850] RSP <ffff88023d3abc50>
> > [ 55.839567] CR2: 00000000000000e8
> > [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]---
> > 
> > The bug can be easily reproduced using the following testcase:
> > 
> > ========================
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > 
> > #define BUFSIZE getpagesize()
> > 
> > int main(int argc, char **argv)
> > {
> > void *ptr;
> > 
> > if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> > perror("posix_memalign");
> > exit(1);
> > }
> > if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> > perror("madvise");
> > exit(1);
> > }
> > *(char *)NULL = 0;
> Hmm, the reproducer gave something else here but no panic.
> $ strace ./test 
> execve("./test", ["./test"], [/* 26 vars */]) = 0
> brk(0)                                  = 0x220f000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ec0a000
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=41227, ...}) = 0
> mmap(NULL, 41227, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd18ebff000
> close(3)                                = 0
> open("/lib64/libc.so.6", O_RDONLY)      = 3
> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\355\341n<\0\0\0"..., 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=1912928, ...}) = 0
> mmap(0x3c6ee00000, 3737768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c6ee00000
> mprotect(0x3c6ef87000, 2097152, PROT_NONE) = 0
> mmap(0x3c6f187000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x3c6f187000
> mmap(0x3c6f18c000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c6f18c000
> close(3)                                = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfe000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfd000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfc000
> arch_prctl(ARCH_SET_FS, 0x7fd18ebfd700) = 0
> mprotect(0x3c6f187000, 16384, PROT_READ) = 0
> mprotect(0x3c6e81f000, 4096, PROT_READ) = 0
> munmap(0x7fd18ebff000, 41227)           = 0
> brk(0)                                  = 0x220f000
> brk(0x2232000)                          = 0x2232000
> madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV (core dumped) +++
> Segmentation fault (core dumped)
> 
> Did I miss anything?

mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
kernel. Could you post your .config?

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 14:19     ` Andrea Righi
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 14:19 UTC (permalink / raw)
  To: CAI Qian
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 03:09:53AM -0400, CAI Qian wrote:
> Hello,
> 
> ----- Original Message -----
> > I've just experienced this bug with ksmd:
> > 
> > [ 55.837551] BUG: unable to handle kernel NULL pointer dereference at
> > 00000000000000e8
> > [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > [ 55.837630] PGD 0
> > [ 55.837643] Oops: 0000 [#1] SMP
> > [ 55.837663] CPU 2
> > [ 55.837674] Modules linked in: snd_hda_codec_hdmi
> > snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211
> > usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm e1000e
> > thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
> > [ 55.837816]
> > [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289 LENOVO
> > 4286CTO/4286CTO
> > [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>]
> > __lock_acquire+0x62/0x1d70
> > [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046
> > [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX:
> > 0000000000000001
> > [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 00000000000000e8
> > [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09:
> > 0000000000000000
> > [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12:
> > ffff88023d3a3e00
> > [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15:
> > 0000000000000000
> > [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000)
> > knlGS:0000000000000000
> > [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4:
> > 00000000000406e0
> > [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000, task
> > ffff88023d3a3e00)
> > [ 55.838131] Stack:
> > [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810
> > 00000000000012c7
> > [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001
> > 0000000000000000
> > [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c
> > ffffffff81a22000
> > [ 55.838365] Call Trace:
> > [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
> > [ 55.838394] [<ffffffff814e3360>] ?
> > _raw_spin_unlock_irqrestore+0x40/0x70
> > [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110
> > [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
> > [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70
> > [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
> > [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
> > [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960
> > [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
> > [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310
> > [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0
> > [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
> > [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
> > [ 55.838647] [<ffffffff810a50f0>] ? __init_kthread_worker+0x70/0x70
> > [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb
> > [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b 05
> > 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84 c6
> > 01 00 00
> > [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41 83
> > fd
> > [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > [ 55.838850] RSP <ffff88023d3abc50>
> > [ 55.839567] CR2: 00000000000000e8
> > [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]---
> > 
> > The bug can be easily reproduced using the following testcase:
> > 
> > ========================
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > 
> > #define BUFSIZE getpagesize()
> > 
> > int main(int argc, char **argv)
> > {
> > void *ptr;
> > 
> > if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> > perror("posix_memalign");
> > exit(1);
> > }
> > if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> > perror("madvise");
> > exit(1);
> > }
> > *(char *)NULL = 0;
> Hmm, the reproducer gave something else here but no panic.
> $ strace ./test 
> execve("./test", ["./test"], [/* 26 vars */]) = 0
> brk(0)                                  = 0x220f000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ec0a000
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=41227, ...}) = 0
> mmap(NULL, 41227, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd18ebff000
> close(3)                                = 0
> open("/lib64/libc.so.6", O_RDONLY)      = 3
> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\355\341n<\0\0\0"..., 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=1912928, ...}) = 0
> mmap(0x3c6ee00000, 3737768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c6ee00000
> mprotect(0x3c6ef87000, 2097152, PROT_NONE) = 0
> mmap(0x3c6f187000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x3c6f187000
> mmap(0x3c6f18c000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c6f18c000
> close(3)                                = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfe000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfd000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd18ebfc000
> arch_prctl(ARCH_SET_FS, 0x7fd18ebfd700) = 0
> mprotect(0x3c6f187000, 16384, PROT_READ) = 0
> mprotect(0x3c6e81f000, 4096, PROT_READ) = 0
> munmap(0x7fd18ebff000, 41227)           = 0
> brk(0)                                  = 0x220f000
> brk(0x2232000)                          = 0x2232000
> madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV (core dumped) +++
> Segmentation fault (core dumped)
> 
> Did I miss anything?

mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
kernel. Could you post your .config?

Thanks,
-Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02  7:09   ` CAI Qian
@ 2011-06-02 14:31     ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 14:31 UTC (permalink / raw)
  To: CAI Qian
  Cc: Andrea Righi, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Chris Wright, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

* CAI Qian (caiqian@redhat.com) wrote:
> madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---

Right, that's just what the program is trying to do, segfault.

> +++ killed by SIGSEGV (core dumped) +++
> Segmentation fault (core dumped)
> 
> Did I miss anything?

I found it works but not 100% of the time.

So I just run the bug in a loop.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 14:31     ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 14:31 UTC (permalink / raw)
  To: CAI Qian
  Cc: Andrea Righi, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Chris Wright, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

* CAI Qian (caiqian@redhat.com) wrote:
> madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---

Right, that's just what the program is trying to do, segfault.

> +++ killed by SIGSEGV (core dumped) +++
> Segmentation fault (core dumped)
> 
> Did I miss anything?

I found it works but not 100% of the time.

So I just run the bug in a loop.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 14:31     ` Chris Wright
@ 2011-06-02 14:36       ` Andrea Arcangeli
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 14:36 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Andrea Righi, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> * CAI Qian (caiqian@redhat.com) wrote:
> > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> 
> Right, that's just what the program is trying to do, segfault.
> 
> > +++ killed by SIGSEGV (core dumped) +++
> > Segmentation fault (core dumped)
> > 
> > Did I miss anything?
> 
> I found it works but not 100% of the time.
> 
> So I just run the bug in a loop.

echo 0 >scan_millisecs helps.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 14:36       ` Andrea Arcangeli
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 14:36 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Andrea Righi, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> * CAI Qian (caiqian@redhat.com) wrote:
> > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> 
> Right, that's just what the program is trying to do, segfault.
> 
> > +++ killed by SIGSEGV (core dumped) +++
> > Segmentation fault (core dumped)
> > 
> > Did I miss anything?
> 
> I found it works but not 100% of the time.
> 
> So I just run the bug in a loop.

echo 0 >scan_millisecs helps.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 14:36       ` Andrea Arcangeli
@ 2011-06-02 15:36         ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 15:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, CAI Qian, Andrea Righi, Hugh Dickins, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

* Andrea Arcangeli (aarcange@redhat.com) wrote:
> On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > * CAI Qian (caiqian@redhat.com) wrote:
> > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > 
> > Right, that's just what the program is trying to do, segfault.
> > 
> > > +++ killed by SIGSEGV (core dumped) +++
> > > Segmentation fault (core dumped)
> > > 
> > > Did I miss anything?
> > 
> > I found it works but not 100% of the time.
> > 
> > So I just run the bug in a loop.
> 
> echo 0 >scan_millisecs helps.

BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
happened to be recent regression).  It looks like mm_slot is off the list:

R10: dead000000200200 R11: dead000000100100

w/ ->mm == NULL.  Smells like use after free, but doesn't quite all add up.

BUG: unable to handle kernel 
ksmd-bug[14824]: segfault at 0 ip 0000000000400677 sp 00007fff987cb8b0 error 6 in ksmd-bug[400000+1000]
NULL pointer dereference at 0000000000000060
IP: [<ffffffff815be345>] down_read+0x19/0x28
PGD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 6 
Modules linked in: bridge stp vhost_net macvtap macvlan tun kvm_intel kvm ixgbe mdio igb [last unloaded: scsi_wait_scan]

Pid: 825, comm: ksmd Not tainted 2.6.39+ #23 Supermicro X8DTN/X8DTN
RIP: 0010:[<ffffffff815be345>]  [<ffffffff815be345>] down_read+0x19/0x28
RSP: 0018:ffff8801b5325e10  EFLAGS: 00010246
RAX: 0000000000000060 RBX: 0000000000000060 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000060
RBP: ffff8801b5325e20 R08: ffffffffffffffff R09: ffff8801b5325db0
R10: dead000000200200 R11: dead000000100100 R12: 0000000000000000
R13: ffffffff81a20e60 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88033fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000001a03000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksmd (pid: 825, threadinfo ffff8801b5324000, task ffff8801b55196b0)
Stack:
 ffff8801b5325e20 0000000000000060 ffff8801b5325ee0 ffffffff810eec3a
 ffff88033fd51cc0 ffff8801b5325e80 ffff8801b5325ee0 ffffffff00000000
 ffff8801b55196b0 ffff8801b5325e98 ffff8801b5324000 00000064b5324010
Call Trace:
 [<ffffffff810eec3a>] ksm_scan_thread+0x12d/0xc47
 [<ffffffff8105a6e7>] ? wake_up_bit+0x2a/0x2a
 [<ffffffff810eeb0d>] ? try_to_merge_with_ksm_page+0x498/0x498
 [<ffffffff8105a25e>] kthread+0x82/0x8a
 [<ffffffff815c6354>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105a1dc>] ? kthread_worker_fn+0x13f/0x13f
 [<ffffffff815c6350>] ? gs_change+0xb/0xb
Code: 48 0f c1 10 48 85 d2 74 05 e8 98 84 c8 ff 58 5b c9 c3 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 85 f4 ff ff 48 89 d8 <f0> 48 ff 00 79 05 e8 40 84 c8 ff 5a 5b c9 c3 55 48 89 e5 0f 1f 
RIP  [<ffffffff815be345>] down_read+0x19/0x28
 RSP <ffff8801b5325e10>
CR2: 0000000000000060

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 15:36         ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 15:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, CAI Qian, Andrea Righi, Hugh Dickins, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

* Andrea Arcangeli (aarcange@redhat.com) wrote:
> On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > * CAI Qian (caiqian@redhat.com) wrote:
> > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > 
> > Right, that's just what the program is trying to do, segfault.
> > 
> > > +++ killed by SIGSEGV (core dumped) +++
> > > Segmentation fault (core dumped)
> > > 
> > > Did I miss anything?
> > 
> > I found it works but not 100% of the time.
> > 
> > So I just run the bug in a loop.
> 
> echo 0 >scan_millisecs helps.

BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
happened to be recent regression).  It looks like mm_slot is off the list:

R10: dead000000200200 R11: dead000000100100

w/ ->mm == NULL.  Smells like use after free, but doesn't quite all add up.

BUG: unable to handle kernel 
ksmd-bug[14824]: segfault at 0 ip 0000000000400677 sp 00007fff987cb8b0 error 6 in ksmd-bug[400000+1000]
NULL pointer dereference at 0000000000000060
IP: [<ffffffff815be345>] down_read+0x19/0x28
PGD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 6 
Modules linked in: bridge stp vhost_net macvtap macvlan tun kvm_intel kvm ixgbe mdio igb [last unloaded: scsi_wait_scan]

Pid: 825, comm: ksmd Not tainted 2.6.39+ #23 Supermicro X8DTN/X8DTN
RIP: 0010:[<ffffffff815be345>]  [<ffffffff815be345>] down_read+0x19/0x28
RSP: 0018:ffff8801b5325e10  EFLAGS: 00010246
RAX: 0000000000000060 RBX: 0000000000000060 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000060
RBP: ffff8801b5325e20 R08: ffffffffffffffff R09: ffff8801b5325db0
R10: dead000000200200 R11: dead000000100100 R12: 0000000000000000
R13: ffffffff81a20e60 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88033fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000001a03000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksmd (pid: 825, threadinfo ffff8801b5324000, task ffff8801b55196b0)
Stack:
 ffff8801b5325e20 0000000000000060 ffff8801b5325ee0 ffffffff810eec3a
 ffff88033fd51cc0 ffff8801b5325e80 ffff8801b5325ee0 ffffffff00000000
 ffff8801b55196b0 ffff8801b5325e98 ffff8801b5324000 00000064b5324010
Call Trace:
 [<ffffffff810eec3a>] ksm_scan_thread+0x12d/0xc47
 [<ffffffff8105a6e7>] ? wake_up_bit+0x2a/0x2a
 [<ffffffff810eeb0d>] ? try_to_merge_with_ksm_page+0x498/0x498
 [<ffffffff8105a25e>] kthread+0x82/0x8a
 [<ffffffff815c6354>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105a1dc>] ? kthread_worker_fn+0x13f/0x13f
 [<ffffffff815c6350>] ? gs_change+0xb/0xb
Code: 48 0f c1 10 48 85 d2 74 05 e8 98 84 c8 ff 58 5b c9 c3 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 85 f4 ff ff 48 89 d8 <f0> 48 ff 00 79 05 e8 40 84 c8 ff 5a 5b c9 c3 55 48 89 e5 0f 1f 
RIP  [<ffffffff815be345>] down_read+0x19/0x28
 RSP <ffff8801b5325e10>
CR2: 0000000000000060

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 15:36         ` Chris Wright
@ 2011-06-02 16:44           ` Andrea Arcangeli
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 16:44 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Andrea Righi, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote:
> * Andrea Arcangeli (aarcange@redhat.com) wrote:
> > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > > * CAI Qian (caiqian@redhat.com) wrote:
> > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > > 
> > > Right, that's just what the program is trying to do, segfault.
> > > 
> > > > +++ killed by SIGSEGV (core dumped) +++
> > > > Segmentation fault (core dumped)
> > > > 
> > > > Did I miss anything?
> > > 
> > > I found it works but not 100% of the time.
> > > 
> > > So I just run the bug in a loop.
> > 
> > echo 0 >scan_millisecs helps.
> 
> BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
> happened to be recent regression).  It looks like mm_slot is off the list:
> 
> R10: dead000000200200 R11: dead000000100100

Yes it had to be use after free.

I cooked this patch, still untested but it builds. Will test it soon.

===
Subject: ksm: fix __ksm_exit vs ksm scan SMP race

From: Andrea Arcangeli <aarcange@redhat.com>

If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped
to zero but before __ksm_exit had a chance runs, both the KSM scan and
__ksm_exit will free the slot. This fixes the SMP race condition by using
test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM
scan or not.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..47ef4c1 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void)
 		if (ksm_test_exit(mm)) {
 			hlist_del(&mm_slot->link);
 			list_del(&mm_slot->mm_list);
+			/*
+			 * After releasing ksm_mmlist_lock __ksm_exit
+			 * can run and we already changed mm_slot, so
+			 * notify it with MMF_VM_MERGEABLE not to free
+			 * this again.
+			 */
+			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 			spin_unlock(&ksm_mmlist_lock);
 
 			free_mm_slot(mm_slot);
-			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 			up_read(&mm->mmap_sem);
 			mmdrop(mm);
 		} else {
@@ -1377,10 +1383,15 @@ next_mm:
 		 */
 		hlist_del(&slot->link);
 		list_del(&slot->mm_list);
+		/*
+		 * After releasing ksm_mmlist_lock __ksm_exit can run
+		 * and we already changed mm_slot, so notify it with
+		 * MMF_VM_MERGEABLE not to free this again.
+		 */
+		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		spin_unlock(&ksm_mmlist_lock);
 
 		free_mm_slot(slot);
-		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		up_read(&mm->mmap_sem);
 		mmdrop(mm);
 	} else {
@@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 				 VM_NONLINEAR | VM_MIXEDMAP | VM_SAO))
 			return 0;		/* just ignore the advice */
 
+		/*
+		 * It should be safe to test_bit instead of
+		 * test_and_bit_set because the madvise generic caller
+		 * holds the mmap_sem write mode.
+		 */
 		if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
 			err = __ksm_enter(mm);
 			if (err)
@@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm)
 	list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
 	spin_unlock(&ksm_mmlist_lock);
 
+	/*
+	 * It should be safe to set it outside ksm_mmlist_lock because
+	 * we hold a mm_user pin on the mm so __ksm_exit can't run.
+	 */
 	set_bit(MMF_VM_MERGEABLE, &mm->flags);
 	atomic_inc(&mm->mm_count);
 
@@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm)
 	mm_slot = get_mm_slot(mm);
 	if (mm_slot && ksm_scan.mm_slot != mm_slot) {
 		if (!mm_slot->rmap_list) {
-			hlist_del(&mm_slot->link);
-			list_del(&mm_slot->mm_list);
-			easy_to_free = 1;
+			/*
+			 * If MMF_VM_MERGEABLE isn't set it was freed
+			 * by the scan immediately after mm_count
+			 * reached zero (visible by the scan) but
+			 * before __ksm_exit() run, so we don't need
+			 * to do anything here. We don't even need to
+			 * wait for the KSM scan to release the
+			 * mmap_sem as it's not working on the mm
+			 * anymore but it's just releasing it, and it
+			 * probably already did and dropped its
+			 * mm_count too (it would however be safe to
+			 * take mmap_sem here even if MMF_VM_MERGEABLE
+			 * is already clear, as the actual mm can't be
+			 * freed until we return and we run mmdrop
+			 * too, but it's unnecessary).
+			 */
+			if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) {
+				hlist_del(&mm_slot->link);
+				list_del(&mm_slot->mm_list);
+				easy_to_free = 1;
+			} else
+				mm_slot = NULL;
 		} else {
 			list_move(&mm_slot->mm_list,
 				  &ksm_scan.mm_slot->mm_list);
@@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm)
 
 	if (easy_to_free) {
 		free_mm_slot(mm_slot);
-		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		mmdrop(mm);
 	} else if (mm_slot) {
 		down_write(&mm->mmap_sem);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 16:44           ` Andrea Arcangeli
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 16:44 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Andrea Righi, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote:
> * Andrea Arcangeli (aarcange@redhat.com) wrote:
> > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > > * CAI Qian (caiqian@redhat.com) wrote:
> > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > > 
> > > Right, that's just what the program is trying to do, segfault.
> > > 
> > > > +++ killed by SIGSEGV (core dumped) +++
> > > > Segmentation fault (core dumped)
> > > > 
> > > > Did I miss anything?
> > > 
> > > I found it works but not 100% of the time.
> > > 
> > > So I just run the bug in a loop.
> > 
> > echo 0 >scan_millisecs helps.
> 
> BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
> happened to be recent regression).  It looks like mm_slot is off the list:
> 
> R10: dead000000200200 R11: dead000000100100

Yes it had to be use after free.

I cooked this patch, still untested but it builds. Will test it soon.

===
Subject: ksm: fix __ksm_exit vs ksm scan SMP race

From: Andrea Arcangeli <aarcange@redhat.com>

If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped
to zero but before __ksm_exit had a chance runs, both the KSM scan and
__ksm_exit will free the slot. This fixes the SMP race condition by using
test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM
scan or not.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..47ef4c1 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void)
 		if (ksm_test_exit(mm)) {
 			hlist_del(&mm_slot->link);
 			list_del(&mm_slot->mm_list);
+			/*
+			 * After releasing ksm_mmlist_lock __ksm_exit
+			 * can run and we already changed mm_slot, so
+			 * notify it with MMF_VM_MERGEABLE not to free
+			 * this again.
+			 */
+			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 			spin_unlock(&ksm_mmlist_lock);
 
 			free_mm_slot(mm_slot);
-			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 			up_read(&mm->mmap_sem);
 			mmdrop(mm);
 		} else {
@@ -1377,10 +1383,15 @@ next_mm:
 		 */
 		hlist_del(&slot->link);
 		list_del(&slot->mm_list);
+		/*
+		 * After releasing ksm_mmlist_lock __ksm_exit can run
+		 * and we already changed mm_slot, so notify it with
+		 * MMF_VM_MERGEABLE not to free this again.
+		 */
+		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		spin_unlock(&ksm_mmlist_lock);
 
 		free_mm_slot(slot);
-		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		up_read(&mm->mmap_sem);
 		mmdrop(mm);
 	} else {
@@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 				 VM_NONLINEAR | VM_MIXEDMAP | VM_SAO))
 			return 0;		/* just ignore the advice */
 
+		/*
+		 * It should be safe to test_bit instead of
+		 * test_and_bit_set because the madvise generic caller
+		 * holds the mmap_sem write mode.
+		 */
 		if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
 			err = __ksm_enter(mm);
 			if (err)
@@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm)
 	list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
 	spin_unlock(&ksm_mmlist_lock);
 
+	/*
+	 * It should be safe to set it outside ksm_mmlist_lock because
+	 * we hold a mm_user pin on the mm so __ksm_exit can't run.
+	 */
 	set_bit(MMF_VM_MERGEABLE, &mm->flags);
 	atomic_inc(&mm->mm_count);
 
@@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm)
 	mm_slot = get_mm_slot(mm);
 	if (mm_slot && ksm_scan.mm_slot != mm_slot) {
 		if (!mm_slot->rmap_list) {
-			hlist_del(&mm_slot->link);
-			list_del(&mm_slot->mm_list);
-			easy_to_free = 1;
+			/*
+			 * If MMF_VM_MERGEABLE isn't set it was freed
+			 * by the scan immediately after mm_count
+			 * reached zero (visible by the scan) but
+			 * before __ksm_exit() run, so we don't need
+			 * to do anything here. We don't even need to
+			 * wait for the KSM scan to release the
+			 * mmap_sem as it's not working on the mm
+			 * anymore but it's just releasing it, and it
+			 * probably already did and dropped its
+			 * mm_count too (it would however be safe to
+			 * take mmap_sem here even if MMF_VM_MERGEABLE
+			 * is already clear, as the actual mm can't be
+			 * freed until we return and we run mmdrop
+			 * too, but it's unnecessary).
+			 */
+			if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) {
+				hlist_del(&mm_slot->link);
+				list_del(&mm_slot->mm_list);
+				easy_to_free = 1;
+			} else
+				mm_slot = NULL;
 		} else {
 			list_move(&mm_slot->mm_list,
 				  &ksm_scan.mm_slot->mm_list);
@@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm)
 
 	if (easy_to_free) {
 		free_mm_slot(mm_slot);
-		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
 		mmdrop(mm);
 	} else if (mm_slot) {
 		down_write(&mm->mmap_sem);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 14:19     ` Andrea Righi
@ 2011-06-02 16:48       ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 16:48 UTC (permalink / raw)
  To: Andrea Righi
  Cc: CAI Qian, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Chris Wright, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

* Andrea Righi (andrea@betterlinux.com) wrote:
> mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> kernel. Could you post your .config?

Andrea (Righi), can you tell me if this WARN fires?  This looks
like a pure race between removing from list and checking list, i.e.
insufficient locking.

ksm_scan.mm_slot == the only registered mm

CPU 1 (bug program)		CPU 2 (ksmd)
				list_empty() is false
lock
ksm_scan.mm_slot
list_del
unlock
				slot == &ksm_mm_head (but list is now empty_)


diff --git a/mm/ksm.c b/mm/ksm.c
index 942dfc7..ab79a92 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+		WARN_ON(slot == &ksm_mm_head);
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 16:48       ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 16:48 UTC (permalink / raw)
  To: Andrea Righi
  Cc: CAI Qian, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Chris Wright, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

* Andrea Righi (andrea@betterlinux.com) wrote:
> mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> kernel. Could you post your .config?

Andrea (Righi), can you tell me if this WARN fires?  This looks
like a pure race between removing from list and checking list, i.e.
insufficient locking.

ksm_scan.mm_slot == the only registered mm

CPU 1 (bug program)		CPU 2 (ksmd)
				list_empty() is false
lock
ksm_scan.mm_slot
list_del
unlock
				slot == &ksm_mm_head (but list is now empty_)


diff --git a/mm/ksm.c b/mm/ksm.c
index 942dfc7..ab79a92 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+		WARN_ON(slot == &ksm_mm_head);
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 16:48       ` Chris Wright
@ 2011-06-02 17:29         ` Hugh Dickins
  -1 siblings, 0 replies; 48+ messages in thread
From: Hugh Dickins @ 2011-06-02 17:29 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, CAI Qian, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, 2 Jun 2011, Chris Wright wrote:
> * Andrea Righi (andrea@betterlinux.com) wrote:
> > mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> > kernel. Could you post your .config?
> 
> Andrea (Righi), can you tell me if this WARN fires?  This looks
> like a pure race between removing from list and checking list, i.e.
> insufficient locking.
> 
> ksm_scan.mm_slot == the only registered mm
> 
> CPU 1 (bug program)		CPU 2 (ksmd)
> 				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
> 				slot == &ksm_mm_head (but list is now empty_)
> 
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..ab79a92 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		WARN_ON(slot == &ksm_mm_head);
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

AndreaR, good find, many thanks for discovering and reporting it.
I couldn't look at it until last night, and even then, it was not
obvious to me exactly where my assumptions were going wrong.

Even now it's unclear what role the SIGSEGV plays, as opposed to an
normal exit: I guess it just happens to change the timing enough to
make the race dangerous.

Your patch was not wrong, but I do prefer a patch that plugs the
exact hole; and I needed to understand what was going on - without
understanding it, there was a danger we might leak memory instead.

AndreaA, I didn't study the patch you posted half an hour ago,
since by that time I'd worked it out and was preparing patch below.
I think your patch would be for a different bug, hopefully one we
don't have, it looks more complicated than we should need for this.

ChrisW, yes, your WARN_ON is spot on, matches what I saw exactly.

I'll fill in the patch description later, must dash now, probably
offline until late tonight.  Or if you're satisfied and don't want to
wait, you guys fill that in and send off to Linus & Andrew - thanks.

[PATCH] ksm: fix easily reproduced NULL pointer dereference

Reported-by: Andrea Righi <andrea@betterlinux.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
---

 mm/ksm.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- 3.0-rc1/mm/ksm.c	2011-05-29 18:42:37.429882601 -0700
+++ linux/mm/ksm.c	2011-06-02 09:55:31.729702490 -0700
@@ -1302,6 +1302,13 @@ static struct rmap_item *scan_get_next_r
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+
+		/*
+		 * Although we tested list_empty() above, a racing __ksm_exit
+		 * of the last mm on the list may have removed it since then.
+		 */
+		if (slot == &ksm_mm_head)
+			return NULL;
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 17:29         ` Hugh Dickins
  0 siblings, 0 replies; 48+ messages in thread
From: Hugh Dickins @ 2011-06-02 17:29 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, CAI Qian, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, 2 Jun 2011, Chris Wright wrote:
> * Andrea Righi (andrea@betterlinux.com) wrote:
> > mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> > kernel. Could you post your .config?
> 
> Andrea (Righi), can you tell me if this WARN fires?  This looks
> like a pure race between removing from list and checking list, i.e.
> insufficient locking.
> 
> ksm_scan.mm_slot == the only registered mm
> 
> CPU 1 (bug program)		CPU 2 (ksmd)
> 				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
> 				slot == &ksm_mm_head (but list is now empty_)
> 
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..ab79a92 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		WARN_ON(slot == &ksm_mm_head);
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

AndreaR, good find, many thanks for discovering and reporting it.
I couldn't look at it until last night, and even then, it was not
obvious to me exactly where my assumptions were going wrong.

Even now it's unclear what role the SIGSEGV plays, as opposed to an
normal exit: I guess it just happens to change the timing enough to
make the race dangerous.

Your patch was not wrong, but I do prefer a patch that plugs the
exact hole; and I needed to understand what was going on - without
understanding it, there was a danger we might leak memory instead.

AndreaA, I didn't study the patch you posted half an hour ago,
since by that time I'd worked it out and was preparing patch below.
I think your patch would be for a different bug, hopefully one we
don't have, it looks more complicated than we should need for this.

ChrisW, yes, your WARN_ON is spot on, matches what I saw exactly.

I'll fill in the patch description later, must dash now, probably
offline until late tonight.  Or if you're satisfied and don't want to
wait, you guys fill that in and send off to Linus & Andrew - thanks.

[PATCH] ksm: fix easily reproduced NULL pointer dereference

Reported-by: Andrea Righi <andrea@betterlinux.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
---

 mm/ksm.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- 3.0-rc1/mm/ksm.c	2011-05-29 18:42:37.429882601 -0700
+++ linux/mm/ksm.c	2011-06-02 09:55:31.729702490 -0700
@@ -1302,6 +1302,13 @@ static struct rmap_item *scan_get_next_r
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+
+		/*
+		 * Although we tested list_empty() above, a racing __ksm_exit
+		 * of the last mm on the list may have removed it since then.
+		 */
+		if (slot == &ksm_mm_head)
+			return NULL;
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] ksm: fix race between ksmd and exiting task
  2011-06-02 16:48       ` Chris Wright
@ 2011-06-02 17:35         ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 17:35 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, CAI Qian, Hugh Dickins, Andrea Arcangeli,
	Rik van Riel, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

Andrea Righi reported a case where an exiting task can race against
ksmd.

ksm_scan.mm_slot == the only registered mm
CPU 1 (bug program)		CPU 2 (ksmd)
 				list_empty() is false
lock
ksm_scan.mm_slot
list_del
unlock
 				slot == &ksm_mm_head (but list is now empty_)

Close this race by revalidating that the new slot is not simply the list
head again.

Reported-by: Andrea Righi <andrea@betterlinux.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 mm/ksm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 942dfc7..0373ce4 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1301,6 +1301,9 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+		/* We raced against exit of last slot on the list */
+		if (slot == &ksm_mm_head)
+			return NULL;
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH] ksm: fix race between ksmd and exiting task
@ 2011-06-02 17:35         ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 17:35 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, CAI Qian, Hugh Dickins, Andrea Arcangeli,
	Rik van Riel, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

Andrea Righi reported a case where an exiting task can race against
ksmd.

ksm_scan.mm_slot == the only registered mm
CPU 1 (bug program)		CPU 2 (ksmd)
 				list_empty() is false
lock
ksm_scan.mm_slot
list_del
unlock
 				slot == &ksm_mm_head (but list is now empty_)

Close this race by revalidating that the new slot is not simply the list
head again.

Reported-by: Andrea Righi <andrea@betterlinux.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 mm/ksm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 942dfc7..0373ce4 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1301,6 +1301,9 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+		/* We raced against exit of last slot on the list */
+		if (slot == &ksm_mm_head)
+			return NULL;
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 17:29         ` Hugh Dickins
@ 2011-06-02 17:43           ` Andrea Arcangeli
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 17:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Andrea Righi, CAI Qian, Rik van Riel, Mel Gorman,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 10:29:39AM -0700, Hugh Dickins wrote:
> AndreaA, I didn't study the patch you posted half an hour ago,
> since by that time I'd worked it out and was preparing patch below.
> I think your patch would be for a different bug, hopefully one we
> don't have, it looks more complicated than we should need for this.

I didn't expect two different bugs leading to double free.

If you've time please review my other patch too because mmput runs
with no mmap_sem hold and I think the ksm scan code runs under the
assumption that __ksm_exit is waiting in down_write() when
ksm_mmlist_lock is released (before freeing the mm_slot), and that
assumption is wrong. ksm_test_exit may very well be true despite
__ksm_exit didn't run yet, and ksm scan will proceed freeing after
changing the mm_slot and ksm_exit will be free to run and free again
immediately after the ksm scan releases the ksm_mmlist_lock and before
it clears the MMF_VM_MERGEABLE (because the mm_slot has been changed
before releasing the ksm_mmlist_lock).

The rmap_list being null will kind of hide it, the fact there's so
little time between the unlock of the ksm_mmlist_lock and the clearing
of MMF_VM_MERGEABLE (that will stop ksm_exit from calling __ksm_exit
at all) will also hide it. At least in
unmerge_and_remove_all_rmap_items remove_trailing_rmap_items will nuke
the rmap_list just before this race runs so making it more likely
possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 17:43           ` Andrea Arcangeli
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 17:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Andrea Righi, CAI Qian, Rik van Riel, Mel Gorman,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 10:29:39AM -0700, Hugh Dickins wrote:
> AndreaA, I didn't study the patch you posted half an hour ago,
> since by that time I'd worked it out and was preparing patch below.
> I think your patch would be for a different bug, hopefully one we
> don't have, it looks more complicated than we should need for this.

I didn't expect two different bugs leading to double free.

If you've time please review my other patch too because mmput runs
with no mmap_sem hold and I think the ksm scan code runs under the
assumption that __ksm_exit is waiting in down_write() when
ksm_mmlist_lock is released (before freeing the mm_slot), and that
assumption is wrong. ksm_test_exit may very well be true despite
__ksm_exit didn't run yet, and ksm scan will proceed freeing after
changing the mm_slot and ksm_exit will be free to run and free again
immediately after the ksm scan releases the ksm_mmlist_lock and before
it clears the MMF_VM_MERGEABLE (because the mm_slot has been changed
before releasing the ksm_mmlist_lock).

The rmap_list being null will kind of hide it, the fact there's so
little time between the unlock of the ksm_mmlist_lock and the clearing
of MMF_VM_MERGEABLE (that will stop ksm_exit from calling __ksm_exit
at all) will also hide it. At least in
unmerge_and_remove_all_rmap_items remove_trailing_rmap_items will nuke
the rmap_list just before this race runs so making it more likely
possible.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 16:48       ` Chris Wright
@ 2011-06-02 20:10         ` Andrea Righi
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 20:10 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 09:48:41AM -0700, Chris Wright wrote:
> * Andrea Righi (andrea@betterlinux.com) wrote:
> > mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> > kernel. Could you post your .config?
> 
> Andrea (Righi), can you tell me if this WARN fires?  This looks
> like a pure race between removing from list and checking list, i.e.
> insufficient locking.

Yes, it does. With this patch:

[   50.968896] WARNING: at mm/ksm.c:1305 ksm_scan_thread+0x9e3/0xe50()

> 
> ksm_scan.mm_slot == the only registered mm
> 
> CPU 1 (bug program)		CPU 2 (ksmd)
> 				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
> 				slot == &ksm_mm_head (but list is now empty_)

It seems to be the exact problem.

-Andrea

> 
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..ab79a92 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		WARN_ON(slot == &ksm_mm_head);
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 20:10         ` Andrea Righi
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 20:10 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 09:48:41AM -0700, Chris Wright wrote:
> * Andrea Righi (andrea@betterlinux.com) wrote:
> > mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> > kernel. Could you post your .config?
> 
> Andrea (Righi), can you tell me if this WARN fires?  This looks
> like a pure race between removing from list and checking list, i.e.
> insufficient locking.

Yes, it does. With this patch:

[   50.968896] WARNING: at mm/ksm.c:1305 ksm_scan_thread+0x9e3/0xe50()

> 
> ksm_scan.mm_slot == the only registered mm
> 
> CPU 1 (bug program)		CPU 2 (ksmd)
> 				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
> 				slot == &ksm_mm_head (but list is now empty_)

It seems to be the exact problem.

-Andrea

> 
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..ab79a92 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		WARN_ON(slot == &ksm_mm_head);
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] ksm: fix race between ksmd and exiting task
  2011-06-02 17:35         ` Chris Wright
@ 2011-06-02 20:12           ` Andrea Righi
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 20:12 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 10:35:49AM -0700, Chris Wright wrote:
> Andrea Righi reported a case where an exiting task can race against
> ksmd.
> 
> ksm_scan.mm_slot == the only registered mm
> CPU 1 (bug program)		CPU 2 (ksmd)
>  				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
>  				slot == &ksm_mm_head (but list is now empty_)
> 
> Close this race by revalidating that the new slot is not simply the list
> head again.

I confirm this fixes the problem on my side.

Tested-by: Andrea Righi <andrea@betterlinux.com>

> 
> Reported-by: Andrea Righi <andrea@betterlinux.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
>  mm/ksm.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..0373ce4 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,9 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		/* We raced against exit of last slot on the list */
> +		if (slot == &ksm_mm_head)
> +			return NULL;
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] ksm: fix race between ksmd and exiting task
@ 2011-06-02 20:12           ` Andrea Righi
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 20:12 UTC (permalink / raw)
  To: Chris Wright
  Cc: CAI Qian, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton

On Thu, Jun 02, 2011 at 10:35:49AM -0700, Chris Wright wrote:
> Andrea Righi reported a case where an exiting task can race against
> ksmd.
> 
> ksm_scan.mm_slot == the only registered mm
> CPU 1 (bug program)		CPU 2 (ksmd)
>  				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
>  				slot == &ksm_mm_head (but list is now empty_)
> 
> Close this race by revalidating that the new slot is not simply the list
> head again.

I confirm this fixes the problem on my side.

Tested-by: Andrea Righi <andrea@betterlinux.com>

> 
> Reported-by: Andrea Righi <andrea@betterlinux.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
>  mm/ksm.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..0373ce4 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,9 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		/* We raced against exit of last slot on the list */
> +		if (slot == &ksm_mm_head)
> +			return NULL;
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 16:44           ` Andrea Arcangeli
@ 2011-06-02 20:15             ` Andrea Righi
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 20:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, CAI Qian, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 06:44:58PM +0200, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote:
> > * Andrea Arcangeli (aarcange@redhat.com) wrote:
> > > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > > > * CAI Qian (caiqian@redhat.com) wrote:
> > > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > > > 
> > > > Right, that's just what the program is trying to do, segfault.
> > > > 
> > > > > +++ killed by SIGSEGV (core dumped) +++
> > > > > Segmentation fault (core dumped)
> > > > > 
> > > > > Did I miss anything?
> > > > 
> > > > I found it works but not 100% of the time.
> > > > 
> > > > So I just run the bug in a loop.
> > > 
> > > echo 0 >scan_millisecs helps.
> > 
> > BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
> > happened to be recent regression).  It looks like mm_slot is off the list:
> > 
> > R10: dead000000200200 R11: dead000000100100
> 
> Yes it had to be use after free.
> 
> I cooked this patch, still untested but it builds. Will test it soon.

Hi Andrea,

I just tested this patch, but it doesn't seem to fix the problem, at
least not the one I reported. The same bug happens again.

Thanks,
-Andrea

> 
> ===
> Subject: ksm: fix __ksm_exit vs ksm scan SMP race
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped
> to zero but before __ksm_exit had a chance runs, both the KSM scan and
> __ksm_exit will free the slot. This fixes the SMP race condition by using
> test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM
> scan or not.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d708b3e..47ef4c1 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void)
>  		if (ksm_test_exit(mm)) {
>  			hlist_del(&mm_slot->link);
>  			list_del(&mm_slot->mm_list);
> +			/*
> +			 * After releasing ksm_mmlist_lock __ksm_exit
> +			 * can run and we already changed mm_slot, so
> +			 * notify it with MMF_VM_MERGEABLE not to free
> +			 * this again.
> +			 */
> +			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  			spin_unlock(&ksm_mmlist_lock);
>  
>  			free_mm_slot(mm_slot);
> -			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  			up_read(&mm->mmap_sem);
>  			mmdrop(mm);
>  		} else {
> @@ -1377,10 +1383,15 @@ next_mm:
>  		 */
>  		hlist_del(&slot->link);
>  		list_del(&slot->mm_list);
> +		/*
> +		 * After releasing ksm_mmlist_lock __ksm_exit can run
> +		 * and we already changed mm_slot, so notify it with
> +		 * MMF_VM_MERGEABLE not to free this again.
> +		 */
> +		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  		spin_unlock(&ksm_mmlist_lock);
>  
>  		free_mm_slot(slot);
> -		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  		up_read(&mm->mmap_sem);
>  		mmdrop(mm);
>  	} else {
> @@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>  				 VM_NONLINEAR | VM_MIXEDMAP | VM_SAO))
>  			return 0;		/* just ignore the advice */
>  
> +		/*
> +		 * It should be safe to test_bit instead of
> +		 * test_and_bit_set because the madvise generic caller
> +		 * holds the mmap_sem write mode.
> +		 */
>  		if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
>  			err = __ksm_enter(mm);
>  			if (err)
> @@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm)
>  	list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
>  	spin_unlock(&ksm_mmlist_lock);
>  
> +	/*
> +	 * It should be safe to set it outside ksm_mmlist_lock because
> +	 * we hold a mm_user pin on the mm so __ksm_exit can't run.
> +	 */
>  	set_bit(MMF_VM_MERGEABLE, &mm->flags);
>  	atomic_inc(&mm->mm_count);
>  
> @@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm)
>  	mm_slot = get_mm_slot(mm);
>  	if (mm_slot && ksm_scan.mm_slot != mm_slot) {
>  		if (!mm_slot->rmap_list) {
> -			hlist_del(&mm_slot->link);
> -			list_del(&mm_slot->mm_list);
> -			easy_to_free = 1;
> +			/*
> +			 * If MMF_VM_MERGEABLE isn't set it was freed
> +			 * by the scan immediately after mm_count
> +			 * reached zero (visible by the scan) but
> +			 * before __ksm_exit() run, so we don't need
> +			 * to do anything here. We don't even need to
> +			 * wait for the KSM scan to release the
> +			 * mmap_sem as it's not working on the mm
> +			 * anymore but it's just releasing it, and it
> +			 * probably already did and dropped its
> +			 * mm_count too (it would however be safe to
> +			 * take mmap_sem here even if MMF_VM_MERGEABLE
> +			 * is already clear, as the actual mm can't be
> +			 * freed until we return and we run mmdrop
> +			 * too, but it's unnecessary).
> +			 */
> +			if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) {
> +				hlist_del(&mm_slot->link);
> +				list_del(&mm_slot->mm_list);
> +				easy_to_free = 1;
> +			} else
> +				mm_slot = NULL;
>  		} else {
>  			list_move(&mm_slot->mm_list,
>  				  &ksm_scan.mm_slot->mm_list);
> @@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm)
>  
>  	if (easy_to_free) {
>  		free_mm_slot(mm_slot);
> -		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  		mmdrop(mm);
>  	} else if (mm_slot) {
>  		down_write(&mm->mmap_sem);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 20:15             ` Andrea Righi
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Righi @ 2011-06-02 20:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, CAI Qian, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 06:44:58PM +0200, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2011 at 08:36:41AM -0700, Chris Wright wrote:
> > * Andrea Arcangeli (aarcange@redhat.com) wrote:
> > > On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > > > * CAI Qian (caiqian@redhat.com) wrote:
> > > > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > > > 
> > > > Right, that's just what the program is trying to do, segfault.
> > > > 
> > > > > +++ killed by SIGSEGV (core dumped) +++
> > > > > Segmentation fault (core dumped)
> > > > > 
> > > > > Did I miss anything?
> > > > 
> > > > I found it works but not 100% of the time.
> > > > 
> > > > So I just run the bug in a loop.
> > > 
> > > echo 0 >scan_millisecs helps.
> > 
> > BTW, here's my stack trace (I dropped back to 2.6.39 just to see if it
> > happened to be recent regression).  It looks like mm_slot is off the list:
> > 
> > R10: dead000000200200 R11: dead000000100100
> 
> Yes it had to be use after free.
> 
> I cooked this patch, still untested but it builds. Will test it soon.

Hi Andrea,

I just tested this patch, but it doesn't seem to fix the problem, at
least not the one I reported. The same bug happens again.

Thanks,
-Andrea

> 
> ===
> Subject: ksm: fix __ksm_exit vs ksm scan SMP race
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> If the KSM scan releases the ksm_mmlist_lock after the mm_users already dropped
> to zero but before __ksm_exit had a chance runs, both the KSM scan and
> __ksm_exit will free the slot. This fixes the SMP race condition by using
> test_and_bit_set in __ksm_exit to see if __ksm_exit arrived before the KSM
> scan or not.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d708b3e..47ef4c1 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -645,10 +645,16 @@ static int unmerge_and_remove_all_rmap_items(void)
>  		if (ksm_test_exit(mm)) {
>  			hlist_del(&mm_slot->link);
>  			list_del(&mm_slot->mm_list);
> +			/*
> +			 * After releasing ksm_mmlist_lock __ksm_exit
> +			 * can run and we already changed mm_slot, so
> +			 * notify it with MMF_VM_MERGEABLE not to free
> +			 * this again.
> +			 */
> +			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  			spin_unlock(&ksm_mmlist_lock);
>  
>  			free_mm_slot(mm_slot);
> -			clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  			up_read(&mm->mmap_sem);
>  			mmdrop(mm);
>  		} else {
> @@ -1377,10 +1383,15 @@ next_mm:
>  		 */
>  		hlist_del(&slot->link);
>  		list_del(&slot->mm_list);
> +		/*
> +		 * After releasing ksm_mmlist_lock __ksm_exit can run
> +		 * and we already changed mm_slot, so notify it with
> +		 * MMF_VM_MERGEABLE not to free this again.
> +		 */
> +		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  		spin_unlock(&ksm_mmlist_lock);
>  
>  		free_mm_slot(slot);
> -		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  		up_read(&mm->mmap_sem);
>  		mmdrop(mm);
>  	} else {
> @@ -1463,6 +1474,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>  				 VM_NONLINEAR | VM_MIXEDMAP | VM_SAO))
>  			return 0;		/* just ignore the advice */
>  
> +		/*
> +		 * It should be safe to test_bit instead of
> +		 * test_and_bit_set because the madvise generic caller
> +		 * holds the mmap_sem write mode.
> +		 */
>  		if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
>  			err = __ksm_enter(mm);
>  			if (err)
> @@ -1511,6 +1527,10 @@ int __ksm_enter(struct mm_struct *mm)
>  	list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
>  	spin_unlock(&ksm_mmlist_lock);
>  
> +	/*
> +	 * It should be safe to set it outside ksm_mmlist_lock because
> +	 * we hold a mm_user pin on the mm so __ksm_exit can't run.
> +	 */
>  	set_bit(MMF_VM_MERGEABLE, &mm->flags);
>  	atomic_inc(&mm->mm_count);
>  
> @@ -1538,9 +1558,28 @@ void __ksm_exit(struct mm_struct *mm)
>  	mm_slot = get_mm_slot(mm);
>  	if (mm_slot && ksm_scan.mm_slot != mm_slot) {
>  		if (!mm_slot->rmap_list) {
> -			hlist_del(&mm_slot->link);
> -			list_del(&mm_slot->mm_list);
> -			easy_to_free = 1;
> +			/*
> +			 * If MMF_VM_MERGEABLE isn't set it was freed
> +			 * by the scan immediately after mm_count
> +			 * reached zero (visible by the scan) but
> +			 * before __ksm_exit() run, so we don't need
> +			 * to do anything here. We don't even need to
> +			 * wait for the KSM scan to release the
> +			 * mmap_sem as it's not working on the mm
> +			 * anymore but it's just releasing it, and it
> +			 * probably already did and dropped its
> +			 * mm_count too (it would however be safe to
> +			 * take mmap_sem here even if MMF_VM_MERGEABLE
> +			 * is already clear, as the actual mm can't be
> +			 * freed until we return and we run mmdrop
> +			 * too, but it's unnecessary).
> +			 */
> +			if (test_and_clear_bit(MMF_VM_MERGEABLE, &mm->flags)) {
> +				hlist_del(&mm_slot->link);
> +				list_del(&mm_slot->mm_list);
> +				easy_to_free = 1;
> +			} else
> +				mm_slot = NULL;
>  		} else {
>  			list_move(&mm_slot->mm_list,
>  				  &ksm_scan.mm_slot->mm_list);
> @@ -1550,7 +1589,6 @@ void __ksm_exit(struct mm_struct *mm)
>  
>  	if (easy_to_free) {
>  		free_mm_slot(mm_slot);
> -		clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>  		mmdrop(mm);
>  	} else if (mm_slot) {
>  		down_write(&mm->mmap_sem);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] ksm: fix race between ksmd and exiting task
  2011-06-02 20:12           ` Andrea Righi
@ 2011-06-02 21:23             ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 21:23 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Chris Wright, CAI Qian, Hugh Dickins, Andrea Arcangeli,
	Rik van Riel, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

* Andrea Righi (andrea@betterlinux.com) wrote:
> On Thu, Jun 02, 2011 at 10:35:49AM -0700, Chris Wright wrote:
> > Andrea Righi reported a case where an exiting task can race against
> > ksmd.
> > 
> > ksm_scan.mm_slot == the only registered mm
> > CPU 1 (bug program)		CPU 2 (ksmd)
> >  				list_empty() is false
> > lock
> > ksm_scan.mm_slot
> > list_del
> > unlock
> >  				slot == &ksm_mm_head (but list is now empty_)
> > 
> > Close this race by revalidating that the new slot is not simply the list
> > head again.
> 
> I confirm this fixes the problem on my side.
> 
> Tested-by: Andrea Righi <andrea@betterlinux.com>

Great, thanks for verifying.

thanks,
-chris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] ksm: fix race between ksmd and exiting task
@ 2011-06-02 21:23             ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-02 21:23 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Chris Wright, CAI Qian, Hugh Dickins, Andrea Arcangeli,
	Rik van Riel, Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki,
	linux-mm, linux-kernel, Andrew Morton

* Andrea Righi (andrea@betterlinux.com) wrote:
> On Thu, Jun 02, 2011 at 10:35:49AM -0700, Chris Wright wrote:
> > Andrea Righi reported a case where an exiting task can race against
> > ksmd.
> > 
> > ksm_scan.mm_slot == the only registered mm
> > CPU 1 (bug program)		CPU 2 (ksmd)
> >  				list_empty() is false
> > lock
> > ksm_scan.mm_slot
> > list_del
> > unlock
> >  				slot == &ksm_mm_head (but list is now empty_)
> > 
> > Close this race by revalidating that the new slot is not simply the list
> > head again.
> 
> I confirm this fixes the problem on my side.
> 
> Tested-by: Andrea Righi <andrea@betterlinux.com>

Great, thanks for verifying.

thanks,
-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 20:15             ` Andrea Righi
@ 2011-06-02 21:35               ` Andrea Arcangeli
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 21:35 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Chris Wright, CAI Qian, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 10:15:01PM +0200, Andrea Righi wrote:
> I just tested this patch, but it doesn't seem to fix the problem, at
> least not the one I reported. The same bug happens again.

Yes I probably found another (not reproducible) bug that would lead to
a similar error, I'm waiting some other opinion on it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-02 21:35               ` Andrea Arcangeli
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-02 21:35 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Chris Wright, CAI Qian, Hugh Dickins, Rik van Riel, Mel Gorman,
	Izik Eidus, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, Jun 02, 2011 at 10:15:01PM +0200, Andrea Righi wrote:
> I just tested this patch, but it doesn't seem to fix the problem, at
> least not the one I reported. The same bug happens again.

Yes I probably found another (not reproducible) bug that would lead to
a similar error, I'm waiting some other opinion on it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 14:19     ` Andrea Righi
@ 2011-06-03  4:42       ` CAI Qian
  -1 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-03  4:42 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton



----- Original Message -----
> On Thu, Jun 02, 2011 at 03:09:53AM -0400, CAI Qian wrote:
> > Hello,
> >
> > ----- Original Message -----
> > > I've just experienced this bug with ksmd:
> > >
> > > [ 55.837551] BUG: unable to handle kernel NULL pointer dereference
> > > at
> > > 00000000000000e8
> > > [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > > [ 55.837630] PGD 0
> > > [ 55.837643] Oops: 0000 [#1] SMP
> > > [ 55.837663] CPU 2
> > > [ 55.837674] Modules linked in: snd_hda_codec_hdmi
> > > snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211
> > > usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm
> > > e1000e
> > > thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
> > > [ 55.837816]
> > > [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289
> > > LENOVO
> > > 4286CTO/4286CTO
> > > [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>]
> > > __lock_acquire+0x62/0x1d70
> > > [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046
> > > [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX:
> > > 0000000000000001
> > > [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > 00000000000000e8
> > > [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09:
> > > 0000000000000000
> > > [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12:
> > > ffff88023d3a3e00
> > > [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15:
> > > 0000000000000000
> > > [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000)
> > > knlGS:0000000000000000
> > > [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4:
> > > 00000000000406e0
> > > [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > 0000000000000400
> > > [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000,
> > > task
> > > ffff88023d3a3e00)
> > > [ 55.838131] Stack:
> > > [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810
> > > 00000000000012c7
> > > [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001
> > > 0000000000000000
> > > [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c
> > > ffffffff81a22000
> > > [ 55.838365] Call Trace:
> > > [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
> > > [ 55.838394] [<ffffffff814e3360>] ?
> > > _raw_spin_unlock_irqrestore+0x40/0x70
> > > [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110
> > > [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > > [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
> > > [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70
> > > [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > > [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
> > > [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
> > > [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960
> > > [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
> > > [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310
> > > [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0
> > > [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
> > > [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
> > > [ 55.838647] [<ffffffff810a50f0>] ?
> > > __init_kthread_worker+0x70/0x70
> > > [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb
> > > [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b
> > > 05
> > > 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84
> > > c6
> > > 01 00 00
> > > [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41
> > > 83
> > > fd
> > > [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > > [ 55.838850] RSP <ffff88023d3abc50>
> > > [ 55.839567] CR2: 00000000000000e8
> > > [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]---
> > >
> > > The bug can be easily reproduced using the following testcase:
> > >
> > > ========================
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <unistd.h>
> > > #include <sys/mman.h>
> > >
> > > #define BUFSIZE getpagesize()
> > >
> > > int main(int argc, char **argv)
> > > {
> > > void *ptr;
> > >
> > > if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> > > perror("posix_memalign");
> > > exit(1);
> > > }
> > > if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> > > perror("madvise");
> > > exit(1);
> > > }
> > > *(char *)NULL = 0;
> > Hmm, the reproducer gave something else here but no panic.
> > $ strace ./test
> > execve("./test", ["./test"], [/* 26 vars */]) = 0
> > brk(0) = 0x220f000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ec0a000
> > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
> > directory)
> > open("/etc/ld.so.cache", O_RDONLY) = 3
> > fstat(3, {st_mode=S_IFREG|0644, st_size=41227, ...}) = 0
> > mmap(NULL, 41227, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd18ebff000
> > close(3) = 0
> > open("/lib64/libc.so.6", O_RDONLY) = 3
> > read(3,
> > "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\355\341n<\0\0\0"...,
> > 832) = 832
> > fstat(3, {st_mode=S_IFREG|0755, st_size=1912928, ...}) = 0
> > mmap(0x3c6ee00000, 3737768, PROT_READ|PROT_EXEC,
> > MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c6ee00000
> > mprotect(0x3c6ef87000, 2097152, PROT_NONE) = 0
> > mmap(0x3c6f187000, 20480, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x3c6f187000
> > mmap(0x3c6f18c000, 18600, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c6f18c000
> > close(3) = 0
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ebfe000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ebfd000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ebfc000
> > arch_prctl(ARCH_SET_FS, 0x7fd18ebfd700) = 0
> > mprotect(0x3c6f187000, 16384, PROT_READ) = 0
> > mprotect(0x3c6e81f000, 4096, PROT_READ) = 0
> > munmap(0x7fd18ebff000, 41227) = 0
> > brk(0) = 0x220f000
> > brk(0x2232000) = 0x2232000
> > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > +++ killed by SIGSEGV (core dumped) +++
> > Segmentation fault (core dumped)
> >
> > Did I miss anything?
> 
> mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> kernel. Could you post your .config?
http://people.redhat.com/qcai/config-3rc1
> Thanks,
> -Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-03  4:42       ` CAI Qian
  0 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-03  4:42 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, Chris Wright,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton



----- Original Message -----
> On Thu, Jun 02, 2011 at 03:09:53AM -0400, CAI Qian wrote:
> > Hello,
> >
> > ----- Original Message -----
> > > I've just experienced this bug with ksmd:
> > >
> > > [ 55.837551] BUG: unable to handle kernel NULL pointer dereference
> > > at
> > > 00000000000000e8
> > > [ 55.837598] IP: [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > > [ 55.837630] PGD 0
> > > [ 55.837643] Oops: 0000 [#1] SMP
> > > [ 55.837663] CPU 2
> > > [ 55.837674] Modules linked in: snd_hda_codec_hdmi
> > > snd_hda_codec_conexant rtl8192ce rtl8192c_common rtlwifi mac80211
> > > usbhid hid cfg80211 snd_hda_intel snd_hda_codec psmouse snd_pcm
> > > e1000e
> > > thinkpad_acpi snd_timer snd_page_alloc snd soundcore nvram
> > > [ 55.837816]
> > > [ 55.837824] Pid: 33, comm: ksmd Not tainted 3.0.0-rc1+ #289
> > > LENOVO
> > > 4286CTO/4286CTO
> > > [ 55.837850] RIP: 0010:[<ffffffff810bb9b2>] [<ffffffff810bb9b2>]
> > > __lock_acquire+0x62/0x1d70
> > > [ 55.837878] RSP: 0018:ffff88023d3abc50 EFLAGS: 00010046
> > > [ 55.837894] RAX: 0000000000000046 RBX: 00000000000000e8 RCX:
> > > 0000000000000001
> > > [ 55.837915] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > 00000000000000e8
> > > [ 55.837936] RBP: ffff88023d3abd40 R08: 0000000000000002 R09:
> > > 0000000000000000
> > > [ 55.837957] R10: 0000000000000001 R11: 0000000000000000 R12:
> > > ffff88023d3a3e00
> > > [ 55.837978] R13: 0000000000000000 R14: 0000000000000002 R15:
> > > 0000000000000000
> > > [ 55.837999] FS: 0000000000000000(0000) GS:ffff88023e280000(0000)
> > > knlGS:0000000000000000
> > > [ 55.838022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [ 55.838039] CR2: 00000000000000e8 CR3: 00000000016f5000 CR4:
> > > 00000000000406e0
> > > [ 55.838060] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [ 55.838081] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > 0000000000000400
> > > [ 55.838102] Process ksmd (pid: 33, threadinfo ffff88023d3aa000,
> > > task
> > > ffff88023d3a3e00)
> > > [ 55.838131] Stack:
> > > [ 55.838140] ffff88023d3abce0 0000000000000000 ffffffff81d46810
> > > 00000000000012c7
> > > [ 55.838168] 000000000000037c ffff88023d3a3e00 0000000000000001
> > > 0000000000000000
> > > [ 55.838338] 0000000000000000 0000000000000000 00000000001ba37c
> > > ffffffff81a22000
> > > [ 55.838365] Call Trace:
> > > [ 55.838375] [<ffffffff810be55f>] ? mark_held_locks+0x6f/0xa0
> > > [ 55.838394] [<ffffffff814e3360>] ?
> > > _raw_spin_unlock_irqrestore+0x40/0x70
> > > [ 55.838416] [<ffffffff810bdc90>] lock_acquire+0x90/0x110
> > > [ 55.838434] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > > [ 55.838453] [<ffffffff8112df6c>] ? free_percpu+0x9c/0x130
> > > [ 55.838470] [<ffffffff814e1cbc>] down_read+0x4c/0x70
> > > [ 55.838486] [<ffffffff8114c652>] ? ksm_scan_thread+0x132/0xe20
> > > [ 55.838505] [<ffffffff814e33bb>] ? _raw_spin_unlock+0x2b/0x40
> > > [ 55.838523] [<ffffffff8114c652>] ksm_scan_thread+0x132/0xe20
> > > [ 55.838541] [<ffffffff814df822>] ? schedule+0x3b2/0x960
> > > [ 55.838559] [<ffffffff810a5690>] ? wake_up_bit+0x40/0x40
> > > [ 55.838576] [<ffffffff8114c520>] ? run_store+0x310/0x310
> > > [ 55.838593] [<ffffffff810a5186>] kthread+0x96/0xa0
> > > [ 55.838609] [<ffffffff814e5014>] kernel_thread_helper+0x4/0x10
> > > [ 55.838628] [<ffffffff814e3700>] ? retint_restore_args+0xe/0xe
> > > [ 55.838647] [<ffffffff810a50f0>] ?
> > > __init_kthread_worker+0x70/0x70
> > > [ 55.838666] [<ffffffff814e5010>] ? gs_change+0xb/0xb
> > > [ 55.838681] Code: b7 00 00 48 89 fb 85 c0 41 89 f5 45 0f 45 f0 8b
> > > 05
> > > 84 de 68 00 85 c0 0f 84 7b 09 00 00 8b 05 7a 49 7a 00 85 c0 0f 84
> > > c6
> > > 01 00 00
> > > [ 55.838780] 8b 03 ba 01 00 00 00 48 3d e0 3c 8c 81 44 0f 44 f2 41
> > > 83
> > > fd
> > > [ 55.838830] RIP [<ffffffff810bb9b2>] __lock_acquire+0x62/0x1d70
> > > [ 55.838850] RSP <ffff88023d3abc50>
> > > [ 55.839567] CR2: 00000000000000e8
> > > [ 55.895721] ---[ end trace eea0fa5dfa6846f1 ]---
> > >
> > > The bug can be easily reproduced using the following testcase:
> > >
> > > ========================
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <unistd.h>
> > > #include <sys/mman.h>
> > >
> > > #define BUFSIZE getpagesize()
> > >
> > > int main(int argc, char **argv)
> > > {
> > > void *ptr;
> > >
> > > if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
> > > perror("posix_memalign");
> > > exit(1);
> > > }
> > > if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
> > > perror("madvise");
> > > exit(1);
> > > }
> > > *(char *)NULL = 0;
> > Hmm, the reproducer gave something else here but no panic.
> > $ strace ./test
> > execve("./test", ["./test"], [/* 26 vars */]) = 0
> > brk(0) = 0x220f000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ec0a000
> > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
> > directory)
> > open("/etc/ld.so.cache", O_RDONLY) = 3
> > fstat(3, {st_mode=S_IFREG|0644, st_size=41227, ...}) = 0
> > mmap(NULL, 41227, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd18ebff000
> > close(3) = 0
> > open("/lib64/libc.so.6", O_RDONLY) = 3
> > read(3,
> > "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\355\341n<\0\0\0"...,
> > 832) = 832
> > fstat(3, {st_mode=S_IFREG|0755, st_size=1912928, ...}) = 0
> > mmap(0x3c6ee00000, 3737768, PROT_READ|PROT_EXEC,
> > MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3c6ee00000
> > mprotect(0x3c6ef87000, 2097152, PROT_NONE) = 0
> > mmap(0x3c6f187000, 20480, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x3c6f187000
> > mmap(0x3c6f18c000, 18600, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3c6f18c000
> > close(3) = 0
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ebfe000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ebfd000
> > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> > -1, 0) = 0x7fd18ebfc000
> > arch_prctl(ARCH_SET_FS, 0x7fd18ebfd700) = 0
> > mprotect(0x3c6f187000, 16384, PROT_READ) = 0
> > mprotect(0x3c6e81f000, 4096, PROT_READ) = 0
> > munmap(0x7fd18ebff000, 41227) = 0
> > brk(0) = 0x220f000
> > brk(0x2232000) = 0x2232000
> > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > +++ killed by SIGSEGV (core dumped) +++
> > Segmentation fault (core dumped)
> >
> > Did I miss anything?
> 
> mmh.. I can reproduce the bug also with the standard ubuntu (11.04)
> kernel. Could you post your .config?
http://people.redhat.com/qcai/config-3rc1
> Thanks,
> -Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 14:31     ` Chris Wright
@ 2011-06-03  4:44       ` CAI Qian
  -1 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-03  4:44 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton



----- Original Message -----
> * CAI Qian (caiqian@redhat.com) wrote:
> > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> 
> Right, that's just what the program is trying to do, segfault.
> 
> > +++ killed by SIGSEGV (core dumped) +++
> > Segmentation fault (core dumped)
> >
> > Did I miss anything?
> 
> I found it works but not 100% of the time.
> 
> So I just run the bug in a loop.
Still no luck here.
# while :; do ./test ; done
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
...

I can't really see what different with the hardware
here. It is only a NUMA server system.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-03  4:44       ` CAI Qian
  0 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-03  4:44 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, Hugh Dickins, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, Izik Eidus, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel, Andrew Morton



----- Original Message -----
> * CAI Qian (caiqian@redhat.com) wrote:
> > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> 
> Right, that's just what the program is trying to do, segfault.
> 
> > +++ killed by SIGSEGV (core dumped) +++
> > Segmentation fault (core dumped)
> >
> > Did I miss anything?
> 
> I found it works but not 100% of the time.
> 
> So I just run the bug in a loop.
Still no luck here.
# while :; do ./test ; done
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
...

I can't really see what different with the hardware
here. It is only a NUMA server system.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 14:36       ` Andrea Arcangeli
@ 2011-06-03  4:50         ` CAI Qian
  -1 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-03  4:50 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrea Righi, Hugh Dickins, Rik van Riel, Mel Gorman, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton,
	Chris Wright



----- Original Message -----
> On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > * CAI Qian (caiqian@redhat.com) wrote:
> > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> >
> > Right, that's just what the program is trying to do, segfault.
> >
> > > +++ killed by SIGSEGV (core dumped) +++
> > > Segmentation fault (core dumped)
> > >
> > > Did I miss anything?
> >
> > I found it works but not 100% of the time.
> >
> > So I just run the bug in a loop.
> 
> echo 0 >scan_millisecs helps.
Thanks. Indeed.

NULL pointer dereference at 0000000000000060
IP: [<ffffffff814d2659>] down_read+0x19/0x30
PGD 0 
Oops: 0002 [#1] SMP 
CPU 0 
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii microcode serio_raw pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

Pid: 103, comm: ksmd Not tainted 3.0.0-rc1+ #6 IBM System x3550 M3 -[7944I21]-/69Y4438     
RIP: 0010:[<ffffffff814d2659>]  [<ffffffff814d2659>] down_read+0x19/0x30
RSP: 0018:ffff880271307e00  EFLAGS: 00010246
RAX: 0000000000000060 RBX: 0000000000000060 RCX: 00000000000000db
RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000060
RBP: ffff880271307e10 R08: 00000000000000df R09: ffff88026fcff400
R10: 0001b690ffffff90 R11: 0001b690ffffff90 R12: ffff880271307ea8
R13: 0000000000000050 R14: 0000000000000000 R15: ffffffff81a54960
FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksmd (pid: 103, threadinfo ffff880271306000, task ffff8802712fc0c0)
Stack:
 ffff880271307e10 ffff8802712fc0c0 ffff880271307e60 ffffffff81146e09
 ffff8802712fc0c0 0000000000000060 ffff880271307e80 ffff8802712fc0c0
 ffff880271307e80 ffff8802712fc0c0 ffff8802712fc0c0 0000000000000063
Call Trace:
 [<ffffffff81146e09>] scan_get_next_rmap_item+0x59/0x400
 [<ffffffff811476ce>] ksm_scan_thread+0xfe/0x2c0
 [<ffffffff81084d50>] ? wake_up_bit+0x40/0x40
 [<ffffffff811475d0>] ? cmp_and_merge_page+0x420/0x420
 [<ffffffff810846d6>] kthread+0x96/0xa0
 [<ffffffff814dc544>] kernel_thread_helper+0x4/0x10
 [<ffffffff81084640>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff814dc540>] ? gs_change+0x13/0x13
Code: 9e cd d6 ff 48 83 c4 08 5b c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb e8 0a ee ff ff 48 89 d8 <f0> 48 ff 00 79 05 e8 3c cd d6 ff 48 83 c4 08 5b c9 c3 00 00 00 
RIP  [<ffffffff814d2659>] down_read+0x19/0x30
 RSP <ffff880271307e00>
CR2: 0000000000000060
---[ end trace a6feafc139ba5f85 ]---

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-03  4:50         ` CAI Qian
  0 siblings, 0 replies; 48+ messages in thread
From: CAI Qian @ 2011-06-03  4:50 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrea Righi, Hugh Dickins, Rik van Riel, Mel Gorman, Izik Eidus,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton,
	Chris Wright



----- Original Message -----
> On Thu, Jun 02, 2011 at 07:31:43AM -0700, Chris Wright wrote:
> > * CAI Qian (caiqian@redhat.com) wrote:
> > > madvise(0x2210000, 4096, 0xc /* MADV_??? */) = 0
> > > --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> >
> > Right, that's just what the program is trying to do, segfault.
> >
> > > +++ killed by SIGSEGV (core dumped) +++
> > > Segmentation fault (core dumped)
> > >
> > > Did I miss anything?
> >
> > I found it works but not 100% of the time.
> >
> > So I just run the bug in a loop.
> 
> echo 0 >scan_millisecs helps.
Thanks. Indeed.

NULL pointer dereference at 0000000000000060
IP: [<ffffffff814d2659>] down_read+0x19/0x30
PGD 0 
Oops: 0002 [#1] SMP 
CPU 0 
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii microcode serio_raw pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg shpchp ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: scsi_wait_scan]

Pid: 103, comm: ksmd Not tainted 3.0.0-rc1+ #6 IBM System x3550 M3 -[7944I21]-/69Y4438     
RIP: 0010:[<ffffffff814d2659>]  [<ffffffff814d2659>] down_read+0x19/0x30
RSP: 0018:ffff880271307e00  EFLAGS: 00010246
RAX: 0000000000000060 RBX: 0000000000000060 RCX: 00000000000000db
RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000060
RBP: ffff880271307e10 R08: 00000000000000df R09: ffff88026fcff400
R10: 0001b690ffffff90 R11: 0001b690ffffff90 R12: ffff880271307ea8
R13: 0000000000000050 R14: 0000000000000000 R15: ffffffff81a54960
FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000060 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksmd (pid: 103, threadinfo ffff880271306000, task ffff8802712fc0c0)
Stack:
 ffff880271307e10 ffff8802712fc0c0 ffff880271307e60 ffffffff81146e09
 ffff8802712fc0c0 0000000000000060 ffff880271307e80 ffff8802712fc0c0
 ffff880271307e80 ffff8802712fc0c0 ffff8802712fc0c0 0000000000000063
Call Trace:
 [<ffffffff81146e09>] scan_get_next_rmap_item+0x59/0x400
 [<ffffffff811476ce>] ksm_scan_thread+0xfe/0x2c0
 [<ffffffff81084d50>] ? wake_up_bit+0x40/0x40
 [<ffffffff811475d0>] ? cmp_and_merge_page+0x420/0x420
 [<ffffffff810846d6>] kthread+0x96/0xa0
 [<ffffffff814dc544>] kernel_thread_helper+0x4/0x10
 [<ffffffff81084640>] ? kthread_worker_fn+0x1a0/0x1a0
 [<ffffffff814dc540>] ? gs_change+0x13/0x13
Code: 9e cd d6 ff 48 83 c4 08 5b c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb e8 0a ee ff ff 48 89 d8 <f0> 48 ff 00 79 05 e8 3c cd d6 ff 48 83 c4 08 5b c9 c3 00 00 00 
RIP  [<ffffffff814d2659>] down_read+0x19/0x30
 RSP <ffff880271307e00>
CR2: 0000000000000060
---[ end trace a6feafc139ba5f85 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] ksm: fix race between ksmd and exiting task
  2011-06-02 17:35         ` Chris Wright
@ 2011-06-03 16:37           ` Hugh Dickins
  -1 siblings, 0 replies; 48+ messages in thread
From: Hugh Dickins @ 2011-06-03 16:37 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, CAI Qian, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, 2 Jun 2011, Chris Wright wrote:

> Andrea Righi reported a case where an exiting task can race against
> ksmd.
> 
> ksm_scan.mm_slot == the only registered mm
> CPU 1 (bug program)		CPU 2 (ksmd)
>  				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
>  				slot == &ksm_mm_head (but list is now empty_)
> 
> Close this race by revalidating that the new slot is not simply the list
> head again.

Remarkably similar to my patch: it must be good!
But yours appears to be more popular - thanks, Chris.

> 
> Reported-by: Andrea Righi <andrea@betterlinux.com>
> Cc: Hugh Dickins <hughd@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>

Cc: stable@kernel.org

> ---
>  mm/ksm.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..0373ce4 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,9 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		/* We raced against exit of last slot on the list */
> +		if (slot == &ksm_mm_head)
> +			return NULL;
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH] ksm: fix race between ksmd and exiting task
@ 2011-06-03 16:37           ` Hugh Dickins
  0 siblings, 0 replies; 48+ messages in thread
From: Hugh Dickins @ 2011-06-03 16:37 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andrea Righi, CAI Qian, Andrea Arcangeli, Rik van Riel,
	Mel Gorman, KAMEZAWA Hiroyuki, linux-mm, linux-kernel,
	Andrew Morton

On Thu, 2 Jun 2011, Chris Wright wrote:

> Andrea Righi reported a case where an exiting task can race against
> ksmd.
> 
> ksm_scan.mm_slot == the only registered mm
> CPU 1 (bug program)		CPU 2 (ksmd)
>  				list_empty() is false
> lock
> ksm_scan.mm_slot
> list_del
> unlock
>  				slot == &ksm_mm_head (but list is now empty_)
> 
> Close this race by revalidating that the new slot is not simply the list
> head again.

Remarkably similar to my patch: it must be good!
But yours appears to be more popular - thanks, Chris.

> 
> Reported-by: Andrea Righi <andrea@betterlinux.com>
> Cc: Hugh Dickins <hughd@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>

Cc: stable@kernel.org

> ---
>  mm/ksm.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 942dfc7..0373ce4 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1301,6 +1301,9 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
>  		ksm_scan.mm_slot = slot;
>  		spin_unlock(&ksm_mmlist_lock);
> +		/* We raced against exit of last slot on the list */
> +		if (slot == &ksm_mm_head)
> +			return NULL;
>  next_mm:
>  		ksm_scan.address = 0;
>  		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-02 17:43           ` Andrea Arcangeli
@ 2011-06-03 17:06             ` Hugh Dickins
  -1 siblings, 0 replies; 48+ messages in thread
From: Hugh Dickins @ 2011-06-03 17:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, Andrea Righi, CAI Qian, Rik van Riel, Mel Gorman,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton

On Thu, 2 Jun 2011, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2011 at 10:29:39AM -0700, Hugh Dickins wrote:
> > AndreaA, I didn't study the patch you posted half an hour ago,
> > since by that time I'd worked it out and was preparing patch below.
> > I think your patch would be for a different bug, hopefully one we
> > don't have, it looks more complicated than we should need for this.
> 
> I didn't expect two different bugs leading to double free.

There wasn't a double free there, just failure to cope with race
emptying the list, so accessing head when expecting a full entry.

> 
> If you've time please review my other patch too because mmput runs
> with no mmap_sem hold and I think the ksm scan code runs under the
> assumption that __ksm_exit is waiting in down_write() when
> ksm_mmlist_lock is released (before freeing the mm_slot), and that
> assumption is wrong. ksm_test_exit may very well be true despite
> __ksm_exit didn't run yet, and ksm scan will proceed freeing after
> changing the mm_slot and ksm_exit will be free to run and free again
> immediately after the ksm scan releases the ksm_mmlist_lock and before
> it clears the MMF_VM_MERGEABLE (because the mm_slot has been changed
> before releasing the ksm_mmlist_lock).
> 
> The rmap_list being null will kind of hide it, the fact there's so
> little time between the unlock of the ksm_mmlist_lock and the clearing
> of MMF_VM_MERGEABLE (that will stop ksm_exit from calling __ksm_exit
> at all) will also hide it. At least in
> unmerge_and_remove_all_rmap_items remove_trailing_rmap_items will nuke
> the rmap_list just before this race runs so making it more likely
> possible.

You'll see from the "beware" comment in scan_get_next_rmap_item()
that this case is expected, that it sometimes reaches freeing the
slots before the exiting task reaches __ksm_exit().

That race should already be handled.  I believe your patch is unnecessary,
because get_mm_slot() is a hashlist lookup, and will return NULL once
either end has done the hlist_del(&mm_slot->link).

Hugh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-03 17:06             ` Hugh Dickins
  0 siblings, 0 replies; 48+ messages in thread
From: Hugh Dickins @ 2011-06-03 17:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, Andrea Righi, CAI Qian, Rik van Riel, Mel Gorman,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton

On Thu, 2 Jun 2011, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2011 at 10:29:39AM -0700, Hugh Dickins wrote:
> > AndreaA, I didn't study the patch you posted half an hour ago,
> > since by that time I'd worked it out and was preparing patch below.
> > I think your patch would be for a different bug, hopefully one we
> > don't have, it looks more complicated than we should need for this.
> 
> I didn't expect two different bugs leading to double free.

There wasn't a double free there, just failure to cope with race
emptying the list, so accessing head when expecting a full entry.

> 
> If you've time please review my other patch too because mmput runs
> with no mmap_sem hold and I think the ksm scan code runs under the
> assumption that __ksm_exit is waiting in down_write() when
> ksm_mmlist_lock is released (before freeing the mm_slot), and that
> assumption is wrong. ksm_test_exit may very well be true despite
> __ksm_exit didn't run yet, and ksm scan will proceed freeing after
> changing the mm_slot and ksm_exit will be free to run and free again
> immediately after the ksm scan releases the ksm_mmlist_lock and before
> it clears the MMF_VM_MERGEABLE (because the mm_slot has been changed
> before releasing the ksm_mmlist_lock).
> 
> The rmap_list being null will kind of hide it, the fact there's so
> little time between the unlock of the ksm_mmlist_lock and the clearing
> of MMF_VM_MERGEABLE (that will stop ksm_exit from calling __ksm_exit
> at all) will also hide it. At least in
> unmerge_and_remove_all_rmap_items remove_trailing_rmap_items will nuke
> the rmap_list just before this race runs so making it more likely
> possible.

You'll see from the "beware" comment in scan_get_next_rmap_item()
that this case is expected, that it sometimes reaches freeing the
slots before the exiting task reaches __ksm_exit().

That race should already be handled.  I believe your patch is unnecessary,
because get_mm_slot() is a hashlist lookup, and will return NULL once
either end has done the hlist_del(&mm_slot->link).

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
  2011-06-03 17:06             ` Hugh Dickins
@ 2011-06-03 18:13               ` Andrea Arcangeli
  -1 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-03 18:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Andrea Righi, CAI Qian, Rik van Riel, Mel Gorman,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton

On Fri, Jun 03, 2011 at 10:06:14AM -0700, Hugh Dickins wrote:
> On Thu, 2 Jun 2011, Andrea Arcangeli wrote:
> > On Thu, Jun 02, 2011 at 10:29:39AM -0700, Hugh Dickins wrote:
> > > AndreaA, I didn't study the patch you posted half an hour ago,
> > > since by that time I'd worked it out and was preparing patch below.
> > > I think your patch would be for a different bug, hopefully one we
> > > don't have, it looks more complicated than we should need for this.
> > 
> > I didn't expect two different bugs leading to double free.
> 
> There wasn't a double free there, just failure to cope with race
> emptying the list, so accessing head when expecting a full entry.

Yes, we thought it was a double free initially because of two dead
pointers but we couldn't explain why mm was null so consistently.

> You'll see from the "beware" comment in scan_get_next_rmap_item()
> that this case is expected, that it sometimes reaches freeing the
> slots before the exiting task reaches __ksm_exit().
> 
> That race should already be handled.  I believe your patch is unnecessary,
> because get_mm_slot() is a hashlist lookup, and will return NULL once
> either end has done the hlist_del(&mm_slot->link).

Ok so that case is handled by get_mm_slot not succeeding. I see thanks
for the review.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan()
@ 2011-06-03 18:13               ` Andrea Arcangeli
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2011-06-03 18:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Andrea Righi, CAI Qian, Rik van Riel, Mel Gorman,
	KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton

On Fri, Jun 03, 2011 at 10:06:14AM -0700, Hugh Dickins wrote:
> On Thu, 2 Jun 2011, Andrea Arcangeli wrote:
> > On Thu, Jun 02, 2011 at 10:29:39AM -0700, Hugh Dickins wrote:
> > > AndreaA, I didn't study the patch you posted half an hour ago,
> > > since by that time I'd worked it out and was preparing patch below.
> > > I think your patch would be for a different bug, hopefully one we
> > > don't have, it looks more complicated than we should need for this.
> > 
> > I didn't expect two different bugs leading to double free.
> 
> There wasn't a double free there, just failure to cope with race
> emptying the list, so accessing head when expecting a full entry.

Yes, we thought it was a double free initially because of two dead
pointers but we couldn't explain why mm was null so consistently.

> You'll see from the "beware" comment in scan_get_next_rmap_item()
> that this case is expected, that it sometimes reaches freeing the
> slots before the exiting task reaches __ksm_exit().
> 
> That race should already be handled.  I believe your patch is unnecessary,
> because get_mm_slot() is a hashlist lookup, and will return NULL once
> either end has done the hlist_del(&mm_slot->link).

Ok so that case is handled by get_mm_slot not succeeding. I see thanks
for the review.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] ksm: fix NULL pointer dereference in scan_get_next_rmap_item
  2011-06-03 16:37           ` Hugh Dickins
@ 2011-06-04  0:54             ` Chris Wright
  -1 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-04  0:54 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Chris Wright, Andrea Righi, CAI Qian, Andrea Arcangeli,
	Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

* Hugh Dickins (hughd@google.com) wrote:
> On Thu, 2 Jun 2011, Chris Wright wrote:
> > Close this race by revalidating that the new slot is not simply the list
> > head again.
> 
> Remarkably similar to my patch: it must be good!

Indeed ;)

> But yours appears to be more popular - thanks, Chris.

But I like the comment in yours better...

Andrew, here's a refresh w/ the acks/tested-bys/etc + AndreaR's test case
in the changelog.

thanks,
-chris
--

Subject: [PATCH] ksm: fix NULL pointer dereference in scan_get_next_rmap_item

From: Hugh Dickins <hughd@google.com>

Andrea Righi reported a case where an exiting task can race against
ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742)
easily triggering a NULL pointer dereference in ksmd.

ksm_scan.mm_slot == &ksm_mm_head with only one registered mm

CPU 1 (__ksm_exit)		CPU 2 (scan_get_next_rmap_item)
 				list_empty() is false
lock				slot == &ksm_mm_head
list_del(slot->mm_list)		
(list now empty)
unlock
				lock
				slot = list_entry(slot->mm_list.next)
				(list is empty, so slot is still ksm_mm_head)
				unlock
				slot->mm == NULL ... Oops

Close this race by revalidating that the new slot is not simply the list
head again.

Andrea's test case:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define BUFSIZE getpagesize()

int main(int argc, char **argv)
{
	void *ptr;

	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
		perror("posix_memalign");
		exit(1);
	}
	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
		perror("madvise");
		exit(1);
	}
	*(char *)NULL = 0;

	return 0;
}

Reported-by: Andrea Righi <andrea@betterlinux.com>
Tested-by: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 mm/ksm.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..9a68b0c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1302,6 +1302,12 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+		/*
+		 * Although we tested list_empty() above, a racing __ksm_exit
+		 * of the last mm on the list may have removed it since then.
+		 */
+		if (slot == &ksm_mm_head)
+			return NULL;
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH] ksm: fix NULL pointer dereference in scan_get_next_rmap_item
@ 2011-06-04  0:54             ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2011-06-04  0:54 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Chris Wright, Andrea Righi, CAI Qian, Andrea Arcangeli,
	Rik van Riel, Mel Gorman, KAMEZAWA Hiroyuki, linux-mm,
	linux-kernel

* Hugh Dickins (hughd@google.com) wrote:
> On Thu, 2 Jun 2011, Chris Wright wrote:
> > Close this race by revalidating that the new slot is not simply the list
> > head again.
> 
> Remarkably similar to my patch: it must be good!

Indeed ;)

> But yours appears to be more popular - thanks, Chris.

But I like the comment in yours better...

Andrew, here's a refresh w/ the acks/tested-bys/etc + AndreaR's test case
in the changelog.

thanks,
-chris
--

Subject: [PATCH] ksm: fix NULL pointer dereference in scan_get_next_rmap_item

From: Hugh Dickins <hughd@google.com>

Andrea Righi reported a case where an exiting task can race against
ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742)
easily triggering a NULL pointer dereference in ksmd.

ksm_scan.mm_slot == &ksm_mm_head with only one registered mm

CPU 1 (__ksm_exit)		CPU 2 (scan_get_next_rmap_item)
 				list_empty() is false
lock				slot == &ksm_mm_head
list_del(slot->mm_list)		
(list now empty)
unlock
				lock
				slot = list_entry(slot->mm_list.next)
				(list is empty, so slot is still ksm_mm_head)
				unlock
				slot->mm == NULL ... Oops

Close this race by revalidating that the new slot is not simply the list
head again.

Andrea's test case:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define BUFSIZE getpagesize()

int main(int argc, char **argv)
{
	void *ptr;

	if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
		perror("posix_memalign");
		exit(1);
	}
	if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
		perror("madvise");
		exit(1);
	}
	*(char *)NULL = 0;

	return 0;
}

Reported-by: Andrea Righi <andrea@betterlinux.com>
Tested-by: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 mm/ksm.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d708b3e..9a68b0c 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1302,6 +1302,12 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
 		slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
 		ksm_scan.mm_slot = slot;
 		spin_unlock(&ksm_mmlist_lock);
+		/*
+		 * Although we tested list_empty() above, a racing __ksm_exit
+		 * of the last mm on the list may have removed it since then.
+		 */
+		if (slot == &ksm_mm_head)
+			return NULL;
 next_mm:
 		ksm_scan.address = 0;
 		ksm_scan.rmap_list = &slot->rmap_list;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2011-06-04  1:03 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 22:20 [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan() Andrea Righi
2011-06-01 22:20 ` Andrea Righi
2011-06-02  1:53 ` Chris Wright
2011-06-02  1:53   ` Chris Wright
2011-06-02  7:09 ` CAI Qian
2011-06-02  7:09   ` CAI Qian
2011-06-02 14:19   ` Andrea Righi
2011-06-02 14:19     ` Andrea Righi
2011-06-02 16:48     ` Chris Wright
2011-06-02 16:48       ` Chris Wright
2011-06-02 17:29       ` Hugh Dickins
2011-06-02 17:29         ` Hugh Dickins
2011-06-02 17:43         ` Andrea Arcangeli
2011-06-02 17:43           ` Andrea Arcangeli
2011-06-03 17:06           ` Hugh Dickins
2011-06-03 17:06             ` Hugh Dickins
2011-06-03 18:13             ` Andrea Arcangeli
2011-06-03 18:13               ` Andrea Arcangeli
2011-06-02 17:35       ` [PATCH] ksm: fix race between ksmd and exiting task Chris Wright
2011-06-02 17:35         ` Chris Wright
2011-06-02 20:12         ` Andrea Righi
2011-06-02 20:12           ` Andrea Righi
2011-06-02 21:23           ` Chris Wright
2011-06-02 21:23             ` Chris Wright
2011-06-03 16:37         ` Hugh Dickins
2011-06-03 16:37           ` Hugh Dickins
2011-06-04  0:54           ` [PATCH] ksm: fix NULL pointer dereference in scan_get_next_rmap_item Chris Wright
2011-06-04  0:54             ` Chris Wright
2011-06-02 20:10       ` [BUG 3.0.0-rc1] ksm: NULL pointer dereference in ksm_do_scan() Andrea Righi
2011-06-02 20:10         ` Andrea Righi
2011-06-03  4:42     ` CAI Qian
2011-06-03  4:42       ` CAI Qian
2011-06-02 14:31   ` Chris Wright
2011-06-02 14:31     ` Chris Wright
2011-06-02 14:36     ` Andrea Arcangeli
2011-06-02 14:36       ` Andrea Arcangeli
2011-06-02 15:36       ` Chris Wright
2011-06-02 15:36         ` Chris Wright
2011-06-02 16:44         ` Andrea Arcangeli
2011-06-02 16:44           ` Andrea Arcangeli
2011-06-02 20:15           ` Andrea Righi
2011-06-02 20:15             ` Andrea Righi
2011-06-02 21:35             ` Andrea Arcangeli
2011-06-02 21:35               ` Andrea Arcangeli
2011-06-03  4:50       ` CAI Qian
2011-06-03  4:50         ` CAI Qian
2011-06-03  4:44     ` CAI Qian
2011-06-03  4:44       ` CAI Qian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.