linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
@ 2012-02-20  6:01 Li Zhong
  2012-02-20  6:04 ` [PATCH 1/2 x86] fix page faults by nmi handler " Li Zhong
  2012-02-20 11:00 ` [PATCH 0/2 x86] fix some page faults " Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Li Zhong @ 2012-02-20  6:01 UTC (permalink / raw)
  To: LKML; +Cc: tglx, mingo, hpa, x86, a.p.zijlstra, paulus, mingo, acme

If CONFIG_KMEMCHECK is enabled, there might be page faults in nmi if the
pages are marked as not present by kmemcheck, like following:

[    4.535803] WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
[    4.633429] Hardware name: System x3650 M3 -[7945AC1]-
[    4.694710] Modules linked in:
[    4.731105] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc3 #15
[    4.799654] Call Trace:
[    4.828751]  <NMI>  [<ffffffff81042eca>] warn_slowpath_common+0x7a/0xb0
[    4.907713]  [<ffffffff81042f15>] warn_slowpath_null+0x15/0x20
[    4.977301]  [<ffffffff8103ce89>] kmemcheck_fault+0xb9/0xd0
[    5.043778]  [<ffffffff81551ba6>] do_page_fault+0x406/0x550
[    5.110252]  [<ffffffff8154e235>] page_fault+0x25/0x30
[    5.171535]  [<ffffffff8154f005>] ? nmi_handle.clone.1+0x75/0xc0
[    5.243202]  [<ffffffff8154efcf>] ? nmi_handle.clone.1+0x3f/0xc0
[    5.314867]  [<ffffffff8154ef90>] ? __die+0xf0/0xf0
[    5.373038]  [<ffffffff8154f15f>] do_nmi+0x10f/0x360
[    5.432243]  [<ffffffff8154e5cd>] restart_nmi+0x1a/0x1e
[    5.494565]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
[    5.566234]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
[    5.637898]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
[    5.709566]  <<EOE>>  [<ffffffff8126d814>] ? rb_insert_color+0xa4/0x150
[    5.788526]  [<ffffffff8119d17b>] sysfs_link_sibling+0x8b/0x110
[    5.859155]  [<ffffffff8119dff1>] __sysfs_add_one+0xc1/0x100
[    5.926666]  [<ffffffff8119e056>] sysfs_add_one+0x26/0xd0
[    5.991065]  [<ffffffff8119cdf4>] sysfs_add_file_mode+0xc4/0x100
[    6.062731]  [<ffffffff8119fc41>] internal_create_group+0xc1/0x1a0
[    6.136473]  [<ffffffff8119fd4e>] sysfs_create_group+0xe/0x10
[    6.205026]  [<ffffffff81351c1a>] dpm_sysfs_add+0x2a/0xd0
[    6.269425]  [<ffffffff81349bf5>] device_add+0x5e5/0x730
[    6.332783]  [<ffffffff81349d59>] device_register+0x19/0x20
[    6.399260]  [<ffffffff8135b6b8>] add_memory_section+0x158/0x1e0
[    6.470927]  [<ffffffff81ca757e>] memory_dev_init+0x75/0x108
[    6.538439]  [<ffffffff81ca73a9>] driver_init+0x31/0x33
[    6.600762]  [<ffffffff81c72c68>] kernel_init+0xcc/0x169
[    6.664121]  [<ffffffff81555e64>] kernel_thread_helper+0x4/0x10
[    6.734749]  [<ffffffff81c72b9c>] ? start_kernel+0x3ab/0x3ab
[    6.802261]  [<ffffffff81555e60>] ? gs_change+0x13/0x13
[    6.864585] ---[ end trace a7919e7f17c0a725 ]---

These two patches tries to fix some of the problems by avoiding using the
non-present pages.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2 x86] fix page faults by nmi handler in nmi if kmemcheck is enabled
  2012-02-20  6:01 [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled Li Zhong
@ 2012-02-20  6:04 ` Li Zhong
  2012-02-20  6:07   ` [PATCH 2/2 x86] fix page faults by perf events " Li Zhong
  2012-02-20 11:00 ` [PATCH 0/2 x86] fix some page faults " Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Li Zhong @ 2012-02-20  6:04 UTC (permalink / raw)
  To: LKML; +Cc: tglx, mingo, hpa, x86, a.p.zijlstra, paulus, mingo, acme

This patch tries to
  change the nmi handler name from a pointer to an array. 
  use __get_free_pages instead of kzalloc if CONFIG_KMEMCHECK is
enabled.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/kernel/nmi.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 47acaf3..84aa03a 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -36,7 +36,7 @@ struct nmiaction {
 	struct list_head list;
 	nmi_handler_t handler;
 	unsigned int flags;
-	char *name;
+	char name[NMI_MAX_NAMELEN];
 };
 
 struct nmi_desc {
@@ -169,16 +169,18 @@ int register_nmi_handler(unsigned int type,
nmi_handler_t handler,
 	if (!handler)
 		return -EINVAL;
 
+#ifdef CONFIG_KMEMCHECK
+	action = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+				get_order(sizeof(*action)));
+#else
 	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
+#endif
 	if (!action)
 		goto fail_action;
 
 	action->handler = handler;
 	action->flags = nmiflags;
-	action->name = kstrndup(devname, NMI_MAX_NAMELEN, GFP_KERNEL);
-	if (!action->name)
-		goto fail_action_name;
-
+	strncpy(action->name, devname, sizeof(action->name));
 	retval = __setup_nmi(type, action);
 
 	if (retval)
@@ -187,9 +189,11 @@ int register_nmi_handler(unsigned int type,
nmi_handler_t handler,
 	return retval;
 
 fail_setup_nmi:
-	kfree(action->name);
-fail_action_name:
+#ifdef CONFIG_KMEMCHECK
+	free_pages((unsigned long)action, get_order(sizeof(*action)));
+#else
 	kfree(action);
+#endif
 fail_action:	
 
 	return retval;
@@ -202,8 +206,11 @@ void unregister_nmi_handler(unsigned int type,
const char *name)
 
 	a = __free_nmi(type, name);
 	if (a) {
-		kfree(a->name);
+#ifdef CONFIG_KMEMCHECK
+		free_pages((unsigned long)a, get_order(sizeof(*a)));
+#else
 		kfree(a);
+#endif
 	}
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2 x86] fix page faults by perf events in nmi if kmemcheck is enabled
  2012-02-20  6:04 ` [PATCH 1/2 x86] fix page faults by nmi handler " Li Zhong
@ 2012-02-20  6:07   ` Li Zhong
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhong @ 2012-02-20  6:07 UTC (permalink / raw)
  To: LKML; +Cc: tglx, mingo, hpa, x86, a.p.zijlstra, paulus, mingo, acme

This patch tries to use __get_free_pages for the perf_event and it's
callchain buffers if CONFIG_KMEMCHECK is enabled. 

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 kernel/events/callchain.c |   29 +++++++++++++++++++++++++++++
 kernel/events/core.c      |   13 +++++++++++++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 6581a04..57f5eaa 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -38,13 +38,25 @@ static void release_callchain_buffers_rcu(struct
rcu_head *head)
 {
 	struct callchain_cpus_entries *entries;
 	int cpu;
+#ifdef CONFIG_KMEMCHECK
+	int size;
+
+	size = sizeof(struct perf_callchain_entry) * PERF_NR_CONTEXTS;
+#endif
 
 	entries = container_of(head, struct callchain_cpus_entries, rcu_head);
 
 	for_each_possible_cpu(cpu)
+#ifdef CONFIG_KMEMCHECK
+		free_pages((unsigned long)entries->cpu_entries[cpu],
+			get_order(size));
+	size = offsetof(struct callchain_cpus_entries,
cpu_entries[nr_cpu_ids]);
+	free_pages((unsigned long)entries, get_order(size));
+#else
 		kfree(entries->cpu_entries[cpu]);
 
 	kfree(entries);
+#endif
 }
 
 static void release_callchain_buffers(void)
@@ -69,15 +81,25 @@ static int alloc_callchain_buffers(void)
 	 */
 	size = offsetof(struct callchain_cpus_entries,
cpu_entries[nr_cpu_ids]);
 
+#ifdef CONFIG_KMEMCHECK
+	entries = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					get_order(size));
+#else
 	entries = kzalloc(size, GFP_KERNEL);
+#endif
 	if (!entries)
 		return -ENOMEM;
 
 	size = sizeof(struct perf_callchain_entry) * PERF_NR_CONTEXTS;
 
 	for_each_possible_cpu(cpu) {
+#ifdef CONFIG_KMEMCHECK
+		entries->cpu_entries[cpu] = (void *)__get_free_pages(
+				GFP_KERNEL, get_order(size));
+#else
 		entries->cpu_entries[cpu] = kmalloc_node(size, GFP_KERNEL,
 							 cpu_to_node(cpu));
+#endif
 		if (!entries->cpu_entries[cpu])
 			goto fail;
 	}
@@ -88,8 +110,15 @@ static int alloc_callchain_buffers(void)
 
 fail:
 	for_each_possible_cpu(cpu)
+#ifdef CONFIG_KMEMCHECK
+		free_pages((unsigned long)entries->cpu_entries[cpu],
+				get_order(size));
+	size = offsetof(struct callchain_cpus_entries,
cpu_entries[nr_cpu_ids]);
+	free_pages((unsigned long)entries, get_order(size));
+#else
 		kfree(entries->cpu_entries[cpu]);
 	kfree(entries);
+#endif
 
 	return -ENOMEM;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ba36013..841c2ab 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2758,7 +2758,11 @@ static void free_event_rcu(struct rcu_head *head)
 	if (event->ns)
 		put_pid_ns(event->ns);
 	perf_event_free_filter(event);
+#ifdef CONFIG_KMEMCHECK
+	free_pages((unsigned long)event, get_order(sizeof(*event)));
+#else
 	kfree(event);
+#endif
 }
 
 static void ring_buffer_put(struct ring_buffer *rb);
@@ -5723,7 +5727,12 @@ perf_event_alloc(struct perf_event_attr *attr,
int cpu,
 			return ERR_PTR(-EINVAL);
 	}
 
+#ifdef CONFIG_KMEMCHECK
+	event = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+				get_order(sizeof(*event)));
+#else
 	event = kzalloc(sizeof(*event), GFP_KERNEL);
+#endif
 	if (!event)
 		return ERR_PTR(-ENOMEM);
 
@@ -5810,7 +5819,11 @@ done:
 	if (err) {
 		if (event->ns)
 			put_pid_ns(event->ns);
+#ifdef CONFIG_KMEMCHECK
+		free_pages((unsigned long)event, get_order(sizeof(*event)));
+#else
 		kfree(event);
+#endif
 		return ERR_PTR(err);
 	}
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-20  6:01 [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled Li Zhong
  2012-02-20  6:04 ` [PATCH 1/2 x86] fix page faults by nmi handler " Li Zhong
@ 2012-02-20 11:00 ` Peter Zijlstra
  2012-02-21  1:42   ` Li Zhong
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2012-02-20 11:00 UTC (permalink / raw)
  To: Li Zhong; +Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme

On Mon, 2012-02-20 at 14:01 +0800, Li Zhong wrote:
> If CONFIG_KMEMCHECK is enabled, there might be page faults in nmi if the
> pages are marked as not present by kmemcheck, like following:
> 
> [    4.535803] WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
> [    4.633429] Hardware name: System x3650 M3 -[7945AC1]-
> [    4.694710] Modules linked in:
> [    4.731105] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc3 #15
> [    4.799654] Call Trace:
> [    4.828751]  <NMI>  [<ffffffff81042eca>] warn_slowpath_common+0x7a/0xb0
> [    4.907713]  [<ffffffff81042f15>] warn_slowpath_null+0x15/0x20
> [    4.977301]  [<ffffffff8103ce89>] kmemcheck_fault+0xb9/0xd0
> [    5.043778]  [<ffffffff81551ba6>] do_page_fault+0x406/0x550
> [    5.110252]  [<ffffffff8154e235>] page_fault+0x25/0x30
> [    5.171535]  [<ffffffff8154f005>] ? nmi_handle.clone.1+0x75/0xc0
> [    5.243202]  [<ffffffff8154efcf>] ? nmi_handle.clone.1+0x3f/0xc0
> [    5.314867]  [<ffffffff8154ef90>] ? __die+0xf0/0xf0
> [    5.373038]  [<ffffffff8154f15f>] do_nmi+0x10f/0x360
> [    5.432243]  [<ffffffff8154e5cd>] restart_nmi+0x1a/0x1e
> [    5.494565]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
> [    5.566234]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
> [    5.637898]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
> [    5.709566]  <<EOE>>  [<ffffffff8126d814>] ? rb_insert_color+0xa4/0x150
> [    5.788526]  [<ffffffff8119d17b>] sysfs_link_sibling+0x8b/0x110
> [    5.859155]  [<ffffffff8119dff1>] __sysfs_add_one+0xc1/0x100
> [    5.926666]  [<ffffffff8119e056>] sysfs_add_one+0x26/0xd0
> [    5.991065]  [<ffffffff8119cdf4>] sysfs_add_file_mode+0xc4/0x100
> [    6.062731]  [<ffffffff8119fc41>] internal_create_group+0xc1/0x1a0
> [    6.136473]  [<ffffffff8119fd4e>] sysfs_create_group+0xe/0x10
> [    6.205026]  [<ffffffff81351c1a>] dpm_sysfs_add+0x2a/0xd0
> [    6.269425]  [<ffffffff81349bf5>] device_add+0x5e5/0x730
> [    6.332783]  [<ffffffff81349d59>] device_register+0x19/0x20
> [    6.399260]  [<ffffffff8135b6b8>] add_memory_section+0x158/0x1e0
> [    6.470927]  [<ffffffff81ca757e>] memory_dev_init+0x75/0x108
> [    6.538439]  [<ffffffff81ca73a9>] driver_init+0x31/0x33
> [    6.600762]  [<ffffffff81c72c68>] kernel_init+0xcc/0x169
> [    6.664121]  [<ffffffff81555e64>] kernel_thread_helper+0x4/0x10
> [    6.734749]  [<ffffffff81c72b9c>] ? start_kernel+0x3ab/0x3ab
> [    6.802261]  [<ffffffff81555e60>] ? gs_change+0x13/0x13
> [    6.864585] ---[ end trace a7919e7f17c0a725 ]---
> 
> These two patches tries to fix some of the problems by avoiding using the
> non-present pages.


Hell no, these are some of the ugliest patches I've seen in a while. Not
to mention that their changelogs are utter crap since they don't even
explain why they're doing what they're doing.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-20 11:00 ` [PATCH 0/2 x86] fix some page faults " Peter Zijlstra
@ 2012-02-21  1:42   ` Li Zhong
  2012-02-21 10:17     ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Li Zhong @ 2012-02-21  1:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme

On Mon, 2012-02-20 at 12:00 +0100, Peter Zijlstra wrote:
> On Mon, 2012-02-20 at 14:01 +0800, Li Zhong wrote:
> > If CONFIG_KMEMCHECK is enabled, there might be page faults in nmi if the
> > pages are marked as not present by kmemcheck, like following:
> > 
> > [    4.535803] WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
> > [    4.633429] Hardware name: System x3650 M3 -[7945AC1]-
> > [    4.694710] Modules linked in:
> > [    4.731105] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc3 #15
> > [    4.799654] Call Trace:
> > [    4.828751]  <NMI>  [<ffffffff81042eca>] warn_slowpath_common+0x7a/0xb0
> > [    4.907713]  [<ffffffff81042f15>] warn_slowpath_null+0x15/0x20
> > [    4.977301]  [<ffffffff8103ce89>] kmemcheck_fault+0xb9/0xd0
> > [    5.043778]  [<ffffffff81551ba6>] do_page_fault+0x406/0x550
> > [    5.110252]  [<ffffffff8154e235>] page_fault+0x25/0x30
> > [    5.171535]  [<ffffffff8154f005>] ? nmi_handle.clone.1+0x75/0xc0
> > [    5.243202]  [<ffffffff8154efcf>] ? nmi_handle.clone.1+0x3f/0xc0
> > [    5.314867]  [<ffffffff8154ef90>] ? __die+0xf0/0xf0
> > [    5.373038]  [<ffffffff8154f15f>] do_nmi+0x10f/0x360
> > [    5.432243]  [<ffffffff8154e5cd>] restart_nmi+0x1a/0x1e
> > [    5.494565]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
> > [    5.566234]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
> > [    5.637898]  [<ffffffff8154e210>] ? general_protection+0x30/0x30
> > [    5.709566]  <<EOE>>  [<ffffffff8126d814>] ? rb_insert_color+0xa4/0x150
> > [    5.788526]  [<ffffffff8119d17b>] sysfs_link_sibling+0x8b/0x110
> > [    5.859155]  [<ffffffff8119dff1>] __sysfs_add_one+0xc1/0x100
> > [    5.926666]  [<ffffffff8119e056>] sysfs_add_one+0x26/0xd0
> > [    5.991065]  [<ffffffff8119cdf4>] sysfs_add_file_mode+0xc4/0x100
> > [    6.062731]  [<ffffffff8119fc41>] internal_create_group+0xc1/0x1a0
> > [    6.136473]  [<ffffffff8119fd4e>] sysfs_create_group+0xe/0x10
> > [    6.205026]  [<ffffffff81351c1a>] dpm_sysfs_add+0x2a/0xd0
> > [    6.269425]  [<ffffffff81349bf5>] device_add+0x5e5/0x730
> > [    6.332783]  [<ffffffff81349d59>] device_register+0x19/0x20
> > [    6.399260]  [<ffffffff8135b6b8>] add_memory_section+0x158/0x1e0
> > [    6.470927]  [<ffffffff81ca757e>] memory_dev_init+0x75/0x108
> > [    6.538439]  [<ffffffff81ca73a9>] driver_init+0x31/0x33
> > [    6.600762]  [<ffffffff81c72c68>] kernel_init+0xcc/0x169
> > [    6.664121]  [<ffffffff81555e64>] kernel_thread_helper+0x4/0x10
> > [    6.734749]  [<ffffffff81c72b9c>] ? start_kernel+0x3ab/0x3ab
> > [    6.802261]  [<ffffffff81555e60>] ? gs_change+0x13/0x13
> > [    6.864585] ---[ end trace a7919e7f17c0a725 ]---
> > 
> > These two patches tries to fix some of the problems by avoiding using the
> > non-present pages.
> 
> 
> Hell no, these are some of the ugliest patches I've seen in a while. Not
> to mention that their changelogs are utter crap since they don't even
> explain why they're doing what they're doing.
> 
Hi Peter, 

I agree that the fix is ugly. I'm willing to change if there are some
better ways. 

The problem here is: 
1. It seems x86 doesn't allow page faults in nmi, and there are checks
in the code, like WARN_ON_ONCE(in_nmi()).
 
2. If CONFIG_KMEMCHECK is enabled, the pages allocated through slab will
be marked as non-present, to capture uninitialized memory access. More
information in Documentation/kmemcheck.txt .

3. From the log, there are some memories accessed in nmi, which are in
pages marked as non-present by kmemcheck, as they are allocated by
something like kmalloc(). 

Thanks,
Zhong


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-21  1:42   ` Li Zhong
@ 2012-02-21 10:17     ` Peter Zijlstra
  2012-02-23  9:53       ` Li Zhong
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2012-02-21 10:17 UTC (permalink / raw)
  To: Li Zhong; +Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme

On Tue, 2012-02-21 at 09:42 +0800, Li Zhong wrote:

> > Hell no, these are some of the ugliest patches I've seen in a while. Not
> > to mention that their changelogs are utter crap since they don't even
> > explain why they're doing what they're doing.
> > 
> Hi Peter, 
> 
> I agree that the fix is ugly. I'm willing to change if there are some
> better ways. 

There are always better ways..

> The problem here is: 
> 1. It seems x86 doesn't allow page faults in nmi, and there are checks
> in the code, like WARN_ON_ONCE(in_nmi()).

I bet that's not x86 only..

> 2. If CONFIG_KMEMCHECK is enabled, the pages allocated through slab will
> be marked as non-present, to capture uninitialized memory access. More
> information in Documentation/kmemcheck.txt .

So then kmemcheck is buggy, since the nmiaction structure is initialized
in register_nmi_handler(), so it should most definitely not be marked
non-present.

> 3. From the log, there are some memories accessed in nmi, which are in
> pages marked as non-present by kmemcheck, as they are allocated by
> something like kmalloc(). 

So figure out why and fix that instead of writing ugly ass patches that
seemingly work around the problem without actually thinking about it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-21 10:17     ` Peter Zijlstra
@ 2012-02-23  9:53       ` Li Zhong
  2012-02-27 10:58         ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Li Zhong @ 2012-02-23  9:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme

On Tue, 2012-02-21 at 11:17 +0100, Peter Zijlstra wrote:
> On Tue, 2012-02-21 at 09:42 +0800, Li Zhong wrote:
> 
> > > Hell no, these are some of the ugliest patches I've seen in a while. Not
> > > to mention that their changelogs are utter crap since they don't even
> > > explain why they're doing what they're doing.
> > > 
> > Hi Peter, 
> > 
> > I agree that the fix is ugly. I'm willing to change if there are some
> > better ways. 
> 
> There are always better ways..

Hi Peter,

I will think further about it, and would appreciate it if you could give
some good ideas. 

> 
> > The problem here is: 
> > 1. It seems x86 doesn't allow page faults in nmi, and there are checks
> > in the code, like WARN_ON_ONCE(in_nmi()).
> 
> I bet that's not x86 only..

Maybe, I found this problem on x86, didn't check other archs. 

However, from Documentation/kmemcheck.txt, seems kmemcheck only supports
x86. 

> 
> > 2. If CONFIG_KMEMCHECK is enabled, the pages allocated through slab will
> > be marked as non-present, to capture uninitialized memory access. More
> > information in Documentation/kmemcheck.txt .
> 
> So then kmemcheck is buggy, since the nmiaction structure is initialized
> in register_nmi_handler(), so it should most definitely not be marked
> non-present.
> 

I'm not sure whether I understand it correctly. Do you mean that
nmiaction is initialized in register_nmi_handler(), which indicates it
will be used in nmi, so it shouldn't be marked non-present?

But for kmemcheck, why need it know the information that page fault is
not allowed in nmi? 

Or maybe I misunderstand your point here? 

> > 3. From the log, there are some memories accessed in nmi, which are in
> > pages marked as non-present by kmemcheck, as they are allocated by
> > something like kmalloc(). 
> 
> So figure out why and fix that instead of writing ugly ass patches that
> seemingly work around the problem without actually thinking about it.
> 

I think the reason is that kmalloc ( or kzalloc ... ) uses malloc_sizes
slab caches to allocate memory. The malloc_sizes slab caches is set up
without SLAB_NOTRACK flag, then kmemcheck marks the pages non-present to
do its check in page fault handling code. I think we shouldn't disable
kmemechek for the general malloc_sizes caches. 

Thanks,
Zhong





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-23  9:53       ` Li Zhong
@ 2012-02-27 10:58         ` Peter Zijlstra
  2012-02-28  2:45           ` Li Zhong
  2012-03-05 10:05           ` [PATCH v2 x86 1/2] fix page faults by nmiaction " Li Zhong
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2012-02-27 10:58 UTC (permalink / raw)
  To: Li Zhong
  Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme, Vegard Nossum,
	Don Zickus

On Thu, 2012-02-23 at 17:53 +0800, Li Zhong wrote:

> I will think further about it, and would appreciate it if you could give
> some good ideas. 

*sigh*.. or you could do your own damn work..

> > > 2. If CONFIG_KMEMCHECK is enabled, the pages allocated through slab will
> > > be marked as non-present, to capture uninitialized memory access. More
> > > information in Documentation/kmemcheck.txt .
> > 
> > So then kmemcheck is buggy, since the nmiaction structure is initialized
> > in register_nmi_handler(), so it should most definitely not be marked
> > non-present.
> > 
> 
> I'm not sure whether I understand it correctly. Do you mean that
> nmiaction is initialized in register_nmi_handler(), which indicates it
> will be used in nmi, so it shouldn't be marked non-present?

No, you said that it marks memory non-present to detect uninitialized
stuff, but since it is initialized, it shouldn't then be non-present,
right?

> But for kmemcheck, why need it know the information that page fault is
> not allowed in nmi? 

Uh, what?

> > > 3. From the log, there are some memories accessed in nmi, which are in
> > > pages marked as non-present by kmemcheck, as they are allocated by
> > > something like kmalloc(). 
> > 
> > So figure out why and fix that instead of writing ugly ass patches that
> > seemingly work around the problem without actually thinking about it.
> > 
> 
> I think the reason is that kmalloc ( or kzalloc ... ) uses malloc_sizes
> slab caches to allocate memory. The malloc_sizes slab caches is set up
> without SLAB_NOTRACK flag, then kmemcheck marks the pages non-present to
> do its check in page fault handling code. I think we shouldn't disable
> kmemechek for the general malloc_sizes caches. 

Nobody said you should.. there's plenty of solutions that aren't ass
backward stupid nor as ugly.

First you need to figure out why the page is marked non-present since
the data structure is initialized (I've got a fair idea why), then look
if you can tell kmemcheck not to be silly like that.

Alternatively you can change the nmi stuff to use static storage like
other notifiers (see notifier_block).

What you don't ever do is write alternative code paths that are never
ever used except for debugging, that is just asking for problems.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-27 10:58         ` Peter Zijlstra
@ 2012-02-28  2:45           ` Li Zhong
  2012-03-02 19:44             ` Don Zickus
  2012-03-05 10:05           ` [PATCH v2 x86 1/2] fix page faults by nmiaction " Li Zhong
  1 sibling, 1 reply; 22+ messages in thread
From: Li Zhong @ 2012-02-28  2:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme, Vegard Nossum,
	Don Zickus

On Mon, 2012-02-27 at 11:58 +0100, Peter Zijlstra wrote:
> On Thu, 2012-02-23 at 17:53 +0800, Li Zhong wrote:
> 
> > I will think further about it, and would appreciate it if you could give
> > some good ideas. 
> 
> *sigh*.. or you could do your own damn work..

I'm still doing the work ...

> 
> > > > 2. If CONFIG_KMEMCHECK is enabled, the pages allocated through slab will
> > > > be marked as non-present, to capture uninitialized memory access. More
> > > > information in Documentation/kmemcheck.txt .
> > > 
> > > So then kmemcheck is buggy, since the nmiaction structure is initialized
> > > in register_nmi_handler(), so it should most definitely not be marked
> > > non-present.
> > > 
> > 
> > I'm not sure whether I understand it correctly. Do you mean that
> > nmiaction is initialized in register_nmi_handler(), which indicates it
> > will be used in nmi, so it shouldn't be marked non-present?
> 
> No, you said that it marks memory non-present to detect uninitialized
> stuff, but since it is initialized, it shouldn't then be non-present,
> right?

>From my understanding of kmemcheck, the checking is based on the
non-present page. So while handling page fault, if the memory hasn't
been written before read, kmemcheck knows that it is uninitialized. 

I think it is used to find code errors, so it need mark all non-present,
to check if there are any access to uninitialized memory. 

> 
> > But for kmemcheck, why need it know the information that page fault is
> > not allowed in nmi? 
> 
> Uh, what?

Please ignore it, as I misunderstood your point previously. 

> > > > 3. From the log, there are some memories accessed in nmi, which are in
> > > > pages marked as non-present by kmemcheck, as they are allocated by
> > > > something like kmalloc(). 
> > > 
> > > So figure out why and fix that instead of writing ugly ass patches that
> > > seemingly work around the problem without actually thinking about it.
> > > 
> > 
> > I think the reason is that kmalloc ( or kzalloc ... ) uses malloc_sizes
> > slab caches to allocate memory. The malloc_sizes slab caches is set up
> > without SLAB_NOTRACK flag, then kmemcheck marks the pages non-present to
> > do its check in page fault handling code. I think we shouldn't disable
> > kmemechek for the general malloc_sizes caches. 
> 
> Nobody said you should.. there's plenty of solutions that aren't ass
> backward stupid nor as ugly.
> 
> First you need to figure out why the page is marked non-present since
> the data structure is initialized (I've got a fair idea why), then look
> if you can tell kmemcheck not to be silly like that.
> 
> Alternatively you can change the nmi stuff to use static storage like
> other notifiers (see notifier_block).

OK, I will try to update the nmi one using this way.

But I think it couldn't be used to the perf stuff. 
For perf, maybe it's good for kmemcheck to have some flag like
__GFP_NO_PAGE_FAULT? Currently it seems only has flags like
__GFP_NOTRACK, which will still mark page non-present. 

> 
> What you don't ever do is write alternative code paths that are never
> ever used except for debugging, that is just asking for problems.
> 

Got it, thanks for the reminder! Previously, I thought the biggest
problem was wasting memory ...

Thanks,
Zhong



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-02-28  2:45           ` Li Zhong
@ 2012-03-02 19:44             ` Don Zickus
  2012-03-05  1:49               ` Li Zhong
  0 siblings, 1 reply; 22+ messages in thread
From: Don Zickus @ 2012-03-02 19:44 UTC (permalink / raw)
  To: Li Zhong
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum

On Tue, Feb 28, 2012 at 10:45:41AM +0800, Li Zhong wrote:
> > > I'm not sure whether I understand it correctly. Do you mean that
> > > nmiaction is initialized in register_nmi_handler(), which indicates it
> > > will be used in nmi, so it shouldn't be marked non-present?
> > 
> > No, you said that it marks memory non-present to detect uninitialized
> > stuff, but since it is initialized, it shouldn't then be non-present,
> > right?
> 
> From my understanding of kmemcheck, the checking is based on the
> non-present page. So while handling page fault, if the memory hasn't
> been written before read, kmemcheck knows that it is uninitialized. 
> 
> I think it is used to find code errors, so it need mark all non-present,
> to check if there are any access to uninitialized memory. 

I am not sure if this is what your tool is catching, but someone pointed
out to me privately that when panic'ing in an NMI, either the shutdown
path I modified or the kdump path will register an NMI handler in an NMI
context.  Thus they will try to allocate memory in the NMI context.

So I will have to fix that.  Wonder if that popular lockless memory
allocator can help me there... otherwise I have to go back to pass in
structs like the notifier blocks do.

Cheers,
Don

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled
  2012-03-02 19:44             ` Don Zickus
@ 2012-03-05  1:49               ` Li Zhong
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhong @ 2012-03-05  1:49 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum

On Fri, 2012-03-02 at 14:44 -0500, Don Zickus wrote:
> On Tue, Feb 28, 2012 at 10:45:41AM +0800, Li Zhong wrote:
> > > > I'm not sure whether I understand it correctly. Do you mean that
> > > > nmiaction is initialized in register_nmi_handler(), which indicates it
> > > > will be used in nmi, so it shouldn't be marked non-present?
> > > 
> > > No, you said that it marks memory non-present to detect uninitialized
> > > stuff, but since it is initialized, it shouldn't then be non-present,
> > > right?
> > 
> > From my understanding of kmemcheck, the checking is based on the
> > non-present page. So while handling page fault, if the memory hasn't
> > been written before read, kmemcheck knows that it is uninitialized. 
> > 
> > I think it is used to find code errors, so it need mark all non-present,
> > to check if there are any access to uninitialized memory. 
> 
> I am not sure if this is what your tool is catching, but someone pointed
> out to me privately that when panic'ing in an NMI, either the shutdown
> path I modified or the kdump path will register an NMI handler in an NMI
> context.  Thus they will try to allocate memory in the NMI context.
> 

Maybe I didn't describe it clearly. 
This is not what kmemcheck is catching, and normally I think it is fine
to allocate memory that would be accessed in nmi. 

What is not allowed is page fault ( maybe also other exceptions) in nmi.
And if CONFIG_KMEMCHECK is enabled, for example, if kmemcheck needs
check whether any fields in the allocated nmiaction is read before
written (reading uninitialized memory), it marks the pages the nmiaction
belongs to as non-present, so the check could be done in the page fault
handling. 
In this case, as all the fields in nmiaction are assigned values in
register_nmi_handler, so no errors would be reported by kmemcheck. The
problem is there would be page fault when the nmi code calling the nmi
handlers, as it accesses the memory of nmiaction. 

> So I will have to fix that.  Wonder if that popular lockless memory
> allocator can help me there... otherwise I have to go back to pass in
> structs like the notifier blocks do.

I need check what lockless memory allocator is, but guessing from its
name "lockless", seems it couldn't help this problem.

By the way, I have done a draft fix to make the nmiaction using static
storage, will send out for review soon. 

Thanks,
Zhong
 
> 
> Cheers,
> Don
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-02-27 10:58         ` Peter Zijlstra
  2012-02-28  2:45           ` Li Zhong
@ 2012-03-05 10:05           ` Li Zhong
  2012-03-05 10:29             ` Wim Van Sebroeck
                               ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Li Zhong @ 2012-03-05 10:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme, Vegard Nossum,
	Don Zickus, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

This patch tries to fix the problem of page fault exception caused by
accessing nmiaction structure in nmi if kmemcheck is enabled. 

If kmemcheck is enabled, the memory allocated through slab are in pages
that are marked non-present, so that some checks could be done in the
page fault handling code ( e.g. whether the memory is read before
written to ). 
As nmiaction is allocated in this way, so it resides in a non-present
page. Then there is a page fault while the nmi code accessing the
nmiaction structure, which would then cause a warning by
WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault().

v2: as Peter suggested, changed the nmiaction to use static storage.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/nmi.h              |   10 +++++-
 arch/x86/kernel/apic/hw_nmi.c           |    8 ++++-
 arch/x86/kernel/apic/x2apic_uv_x.c      |    7 ++++-
 arch/x86/kernel/cpu/mcheck/mce-inject.c |    8 ++++-
 arch/x86/kernel/cpu/perf_event.c        |    7 ++++-
 arch/x86/kernel/kgdb.c                  |   11 ++++--
 arch/x86/kernel/nmi.c                   |   49 ++----------------------------
 arch/x86/kernel/nmi_selftest.c          |   16 ++++++++--
 arch/x86/kernel/reboot.c                |    9 ++++-
 arch/x86/kernel/smp.c                   |    9 ++++-
 arch/x86/oprofile/nmi_int.c             |    8 ++++-
 drivers/acpi/apei/ghes.c                |    8 ++++-
 drivers/char/ipmi/ipmi_watchdog.c       |   10 +++++-
 drivers/watchdog/hpwdt.c                |   21 +++++++++++--
 14 files changed, 108 insertions(+), 73 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index fd3f9f1..08d464f 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -35,8 +35,14 @@ enum {
 
 typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
 
-int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
-			 const char *);
+struct nmiaction {
+	struct list_head list;
+	nmi_handler_t handler;
+	unsigned int flags;
+	const char *name;
+};
+
+int register_nmi_handler(unsigned int, struct nmiaction *);
 
 void unregister_nmi_handler(unsigned int, const char *);
 
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 31cb9ae..9baea77 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -80,10 +80,14 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
 	return NMI_DONE;
 }
 
+static struct nmiaction arch_trigger_all_cpu_bt_nmiaction = {
+	.handler	= arch_trigger_all_cpu_backtrace_handler,
+	.name		= "arch_bt",
+};
+
 static int __init register_trigger_all_cpu_backtrace(void)
 {
-	register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
-				0, "arch_bt");
+	register_nmi_handler(NMI_LOCAL, &arch_trigger_all_cpu_bt_nmiaction);
 	return 0;
 }
 early_initcall(register_trigger_all_cpu_backtrace);
diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index 79b05b8..5739042 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -715,9 +715,14 @@ int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
 	return NMI_HANDLED;
 }
 
+static struct nmiaction uv_nmiaction = {
+	.handler	= uv_handle_nmi,
+	.name		= "uv",
+};
+
 void uv_register_nmi_notifier(void)
 {
-	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
+	if (register_nmi_handler(NMI_UNKNOWN, &uv_nmiaction))
 		printk(KERN_WARNING "UV NMI handler failed to register\n");
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
index fc4beb3..f9ab41c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
@@ -229,14 +229,18 @@ static ssize_t mce_write(struct file *filp, const char __user *ubuf,
 	return usize;
 }
 
+static struct nmiaction mce_nmiaction = {
+	.handler	= mce_raise_notify,
+	.name		= "mce_notify",
+};
+
 static int inject_init(void)
 {
 	if (!alloc_cpumask_var(&mce_inject_cpumask, GFP_KERNEL))
 		return -ENOMEM;
 	printk(KERN_INFO "Machine check injector initialized\n");
 	register_mce_write_callback(mce_write);
-	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0,
-				"mce_notify");
+	register_nmi_handler(NMI_LOCAL, &mce_nmiaction);
 	return 0;
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 5adce10..8bdff1b 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1246,6 +1246,11 @@ static void __init pmu_check_apic(void)
 	pr_info("no hardware sampling interrupt available.\n");
 }
 
+static struct nmiaction perf_event_nmiaction = {
+	.handler	= perf_event_nmi_handler,
+	.name		= "PMI",
+};
+
 static int __init init_hw_perf_events(void)
 {
 	struct x86_pmu_quirk *quirk;
@@ -1297,7 +1302,7 @@ static int __init init_hw_perf_events(void)
 		((1LL << x86_pmu.num_counters_fixed)-1) << X86_PMC_IDX_FIXED;
 
 	perf_events_lapic_init();
-	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
+	register_nmi_handler(NMI_LOCAL, &perf_event_nmiaction);
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index faba577..d259d2a 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -601,6 +601,11 @@ static struct notifier_block kgdb_notifier = {
 	.notifier_call	= kgdb_notify,
 };
 
+static struct nmiaction kgdb_nmiaction = {
+	.handler	= kgdb_nmi_handler,
+	.name		= "kgdb",
+};
+
 /**
  *	kgdb_arch_init - Perform any architecture specific initalization.
  *
@@ -615,13 +620,11 @@ int kgdb_arch_init(void)
 	if (retval)
 		goto out;
 
-	retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
-					0, "kgdb");
+	retval = register_nmi_handler(NMI_LOCAL, &kgdb_nmiaction);
 	if (retval)
 		goto out1;
 
-	retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
-					0, "kgdb");
+	retval = register_nmi_handler(NMI_UNKNOWN, &kgdb_nmiaction);
 
 	if (retval)
 		goto out2;
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 47acaf3..d844acc 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -31,14 +31,6 @@
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
 
-#define NMI_MAX_NAMELEN	16
-struct nmiaction {
-	struct list_head list;
-	nmi_handler_t handler;
-	unsigned int flags;
-	char *name;
-};
-
 struct nmi_desc {
 	spinlock_t lock;
 	struct list_head head;
@@ -160,51 +152,18 @@ static struct nmiaction *__free_nmi(unsigned int type, const char *name)
 	return (n);
 }
 
-int register_nmi_handler(unsigned int type, nmi_handler_t handler,
-			unsigned long nmiflags, const char *devname)
+int register_nmi_handler(unsigned int type, struct nmiaction *na)
 {
-	struct nmiaction *action;
-	int retval = -ENOMEM;
-
-	if (!handler)
+	if (!na->handler)
 		return -EINVAL;
 
-	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
-	if (!action)
-		goto fail_action;
-
-	action->handler = handler;
-	action->flags = nmiflags;
-	action->name = kstrndup(devname, NMI_MAX_NAMELEN, GFP_KERNEL);
-	if (!action->name)
-		goto fail_action_name;
-
-	retval = __setup_nmi(type, action);
-
-	if (retval)
-		goto fail_setup_nmi;
-
-	return retval;
-
-fail_setup_nmi:
-	kfree(action->name);
-fail_action_name:
-	kfree(action);
-fail_action:	
-
-	return retval;
+	return __setup_nmi(type, na);
 }
 EXPORT_SYMBOL_GPL(register_nmi_handler);
 
 void unregister_nmi_handler(unsigned int type, const char *name)
 {
-	struct nmiaction *a;
-
-	a = __free_nmi(type, name);
-	if (a) {
-		kfree(a->name);
-		kfree(a);
-	}
+	__free_nmi(type, name);
 }
 
 EXPORT_SYMBOL_GPL(unregister_nmi_handler);
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index 0d01a8e..40fd637 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -37,10 +37,15 @@ static int nmi_unk_cb(unsigned int val, struct pt_regs *regs)
 	return NMI_HANDLED;
 }
 
+static struct nmiaction selftest_unk_nmiaction = {
+	.handler	= nmi_unk_cb,
+	.name		= "nmi_selftest_unk",
+};
+
 static void init_nmi_testsuite(void)
 {
 	/* trap all the unknown NMIs we may generate */
-	register_nmi_handler(NMI_UNKNOWN, nmi_unk_cb, 0, "nmi_selftest_unk");
+	register_nmi_handler(NMI_UNKNOWN, &selftest_unk_nmiaction);
 }
 
 static void cleanup_nmi_testsuite(void)
@@ -58,12 +63,17 @@ static int test_nmi_ipi_callback(unsigned int val, struct pt_regs *regs)
         return NMI_DONE;
 }
 
+static struct nmiaction selftest_nmiaction = {
+	.handler	= test_nmi_ipi_callback,
+	.flags		= NMI_FLAG_FIRST,
+	.name		= "nmi_selftest",
+};
+
 static void test_nmi_ipi(struct cpumask *mask)
 {
 	unsigned long timeout;
 
-	if (register_nmi_handler(NMI_LOCAL, test_nmi_ipi_callback,
-				 NMI_FLAG_FIRST, "nmi_selftest")) {
+	if (register_nmi_handler(NMI_LOCAL, &selftest_nmiaction)) {
 		nmi_fail = FAILURE;
 		return;
 	}
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index d840e69..a0f8c15 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -799,6 +799,12 @@ static void smp_send_nmi_allbutself(void)
 	apic->send_IPI_allbutself(NMI_VECTOR);
 }
 
+static struct nmiaction crash_nmiaction = {
+	.handler	= crash_nmi_callback,
+	.flags		= NMI_FLAG_FIRST,
+	.name		= "crash",
+};
+
 /* Halt all other CPUs, calling the specified function on each of them
  *
  * This function can be used to halt all other CPUs on crash
@@ -817,8 +823,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
 	/* Would it be better to replace the trap vector here? */
-	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
-				 NMI_FLAG_FIRST, "crash"))
+	if (register_nmi_handler(NMI_LOCAL, &crash_nmiaction))
 		return;		/* return what? */
 	/* Ensure the new callback function is set before sending
 	 * out the NMI
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 66c74f4..135d0b2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -162,6 +162,12 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 	return NMI_HANDLED;
 }
 
+static struct nmiaction smp_stop_nmiaction = {
+	.handler	= smp_stop_nmi_callback,
+	.flags		= NMI_FLAG_FIRST,
+	.name		= "smp_stop",
+};
+
 static void native_nmi_stop_other_cpus(int wait)
 {
 	unsigned long flags;
@@ -179,8 +185,7 @@ static void native_nmi_stop_other_cpus(int wait)
 		if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1)
 			return;
 
-		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-					 NMI_FLAG_FIRST, "smp_stop"))
+		if (register_nmi_handler(NMI_LOCAL, &smp_stop_nmiaction))
 			/* Note: we ignore failures here */
 			return;
 
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 26b8a85..c3408f6 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -453,6 +453,11 @@ static struct notifier_block oprofile_cpu_nb = {
 	.notifier_call = oprofile_cpu_notifier
 };
 
+static struct nmiaction oprofile_nmiaction = {
+	.handler	= profile_exceptions_notify,
+	.name		= "oprofile",
+};
+
 static int nmi_setup(void)
 {
 	int err = 0;
@@ -489,8 +494,7 @@ static int nmi_setup(void)
 	ctr_running = 0;
 	/* make variables visible to the nmi handler: */
 	smp_mb();
-	err = register_nmi_handler(NMI_LOCAL, profile_exceptions_notify,
-					0, "oprofile");
+	err = register_nmi_handler(NMI_LOCAL, &oprofile_nmiaction);
 	if (err)
 		goto fail;
 
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9b3cac0..1d38f92 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -901,6 +901,11 @@ static unsigned long ghes_esource_prealloc_size(
 	return prealloc_size;
 }
 
+static struct nmiaction ghes_nmiaction = {
+	.handler	= ghes_notify_nmi,
+	.name		= "ghes",
+};
+
 static int __devinit ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
@@ -975,8 +980,7 @@ static int __devinit ghes_probe(struct platform_device *ghes_dev)
 		ghes_estatus_pool_expand(len);
 		mutex_lock(&ghes_list_mutex);
 		if (list_empty(&ghes_nmi))
-			register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0,
-						"ghes");
+			register_nmi_handler(NMI_LOCAL, &ghes_nmiaction);
 		list_add_rcu(&ghes->list, &ghes_nmi);
 		mutex_unlock(&ghes_list_mutex);
 		break;
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 34767a6..29a6e0a 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1296,6 +1296,13 @@ static int preop_op(const char *inval, char *outval)
 	return 0;
 }
 
+#ifdef HAVE_DIE_NMI
+static struct nmiaction ipmi_nmiaction = {
+	.handler	= ipmi_nmi,
+	.name		= "ipmi",
+};
+#endif
+
 static void check_parms(void)
 {
 #ifdef HAVE_DIE_NMI
@@ -1313,8 +1320,7 @@ static void check_parms(void)
 		}
 	}
 	if (do_nmi && !nmi_handler_registered) {
-		rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
-						"ipmi");
+		rv = register_nmi_handler(NMI_UNKNOWN, &ipmi_nmiaction);
 		if (rv) {
 			printk(KERN_WARNING PFX
 			       "Can't register nmi handler\n");
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 8464ea1..e575e63 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -689,9 +689,22 @@ static void __devinit dmi_find_icru(const struct dmi_header *dm, void *dummy)
 	}
 }
 
+static struct nmiaction hpwdt_nmiaction[] = {
+	{
+		.handler	= hpwdt_pretimeout,
+		.name		= "hpwdt",
+	},
+	{
+		.handler	= hpwdt_pretimeout,
+		.flags		= NMI_FLAG_FIRST,
+		.name		= "hpwdt",
+	},
+};
+
 static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
 {
 	int retval;
+	struct nmiaction *na = hpwdt_nmiaction;
 
 	/*
 	 * On typical CRU-based systems we need to map that service in
@@ -733,9 +746,11 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	 * die notify list to handle a critical NMI. The default is to
 	 * be last so other users of the NMI signal can function.
 	 */
-	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout,
-					(priority) ? NMI_FLAG_FIRST : 0,
-					"hpwdt");
+
+	if (priority)
+		na = &hpwdt_nmiaction[1];
+
+	retval = register_nmi_handler(NMI_UNKNOWN, na);
 	if (retval != 0) {
 		dev_warn(&dev->dev,
 			"Unable to register a die notifier (err=%d).\n",
-- 
1.7.1





^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 10:05           ` [PATCH v2 x86 1/2] fix page faults by nmiaction " Li Zhong
@ 2012-03-05 10:29             ` Wim Van Sebroeck
  2012-03-06  1:46               ` Li Zhong
  2012-03-05 15:54             ` Don Zickus
  2012-03-05 17:49             ` Peter Zijlstra
  2 siblings, 1 reply; 22+ messages in thread
From: Wim Van Sebroeck @ 2012-03-05 10:29 UTC (permalink / raw)
  To: Li Zhong
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, Don Zickus, tony.luck, bp, robert.richter, lenb,
	minyard, linux-edac, oprofile-list, linux-acpi,
	openipmi-developer, linux-watchdog, Thomas.Mingarelli

Hi Li,

> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index 8464ea1..e575e63 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -689,9 +689,22 @@ static void __devinit dmi_find_icru(const struct dmi_header *dm, void *dummy)
>  	}
>  }
>  
> +static struct nmiaction hpwdt_nmiaction[] = {
> +	{
> +		.handler	= hpwdt_pretimeout,
> +		.name		= "hpwdt",
> +	},
> +	{
> +		.handler	= hpwdt_pretimeout,
> +		.flags		= NMI_FLAG_FIRST,
> +		.name		= "hpwdt",
> +	},
> +};
> +
>  static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
>  {
>  	int retval;
> +	struct nmiaction *na = hpwdt_nmiaction;
>  
>  	/*
>  	 * On typical CRU-based systems we need to map that service in
> @@ -733,9 +746,11 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
>  	 * die notify list to handle a critical NMI. The default is to
>  	 * be last so other users of the NMI signal can function.
>  	 */
> -	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout,
> -					(priority) ? NMI_FLAG_FIRST : 0,
> -					"hpwdt");
> +
> +	if (priority)
> +		na = &hpwdt_nmiaction[1];
> +
> +	retval = register_nmi_handler(NMI_UNKNOWN, na);
>  	if (retval != 0) {
>  		dev_warn(&dev->dev,
>  			"Unable to register a die notifier (err=%d).\n",

Why not do something like;

static struct nmiaction hpwdt_nmiaction = {
	.handler	= hpwdt_pretimeout,
	.name		= "hpwdt",
};

...
	if (priority)
		hpwdt_nmiaction.flags = NMI_FLAG_FIRST;
...

Kind regards,
Wim.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 10:05           ` [PATCH v2 x86 1/2] fix page faults by nmiaction " Li Zhong
  2012-03-05 10:29             ` Wim Van Sebroeck
@ 2012-03-05 15:54             ` Don Zickus
  2012-03-05 17:55               ` Peter Zijlstra
  2012-03-05 17:49             ` Peter Zijlstra
  2 siblings, 1 reply; 22+ messages in thread
From: Don Zickus @ 2012-03-05 15:54 UTC (permalink / raw)
  To: Li Zhong
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

On Mon, Mar 05, 2012 at 06:05:17PM +0800, Li Zhong wrote:
> This patch tries to fix the problem of page fault exception caused by
> accessing nmiaction structure in nmi if kmemcheck is enabled. 
> 
> If kmemcheck is enabled, the memory allocated through slab are in pages
> that are marked non-present, so that some checks could be done in the
> page fault handling code ( e.g. whether the memory is read before
> written to ). 
> As nmiaction is allocated in this way, so it resides in a non-present
> page. Then there is a page fault while the nmi code accessing the
> nmiaction structure, which would then cause a warning by
> WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault().

This is one way of doing this.  I was trying to avoid this when I rewrote the
nmi handlers, because everyone kept screwing up the structs.  I thought it
would be safer to have callers pass in data based on an api instead.

So I am not necessarily a big fan of this patch (nor is it entirely
correct because you throwaway all the checks in register_nmi_handler()).

Then again I am not a memory expert and may be misunderstanding something
simple why it is safer to do static storage.

Cheers,
Don

> 
> v2: as Peter suggested, changed the nmiaction to use static storage.
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/x86/include/asm/nmi.h              |   10 +++++-
>  arch/x86/kernel/apic/hw_nmi.c           |    8 ++++-
>  arch/x86/kernel/apic/x2apic_uv_x.c      |    7 ++++-
>  arch/x86/kernel/cpu/mcheck/mce-inject.c |    8 ++++-
>  arch/x86/kernel/cpu/perf_event.c        |    7 ++++-
>  arch/x86/kernel/kgdb.c                  |   11 ++++--
>  arch/x86/kernel/nmi.c                   |   49 ++----------------------------
>  arch/x86/kernel/nmi_selftest.c          |   16 ++++++++--
>  arch/x86/kernel/reboot.c                |    9 ++++-
>  arch/x86/kernel/smp.c                   |    9 ++++-
>  arch/x86/oprofile/nmi_int.c             |    8 ++++-
>  drivers/acpi/apei/ghes.c                |    8 ++++-
>  drivers/char/ipmi/ipmi_watchdog.c       |   10 +++++-
>  drivers/watchdog/hpwdt.c                |   21 +++++++++++--
>  14 files changed, 108 insertions(+), 73 deletions(-)
> 
> diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> index fd3f9f1..08d464f 100644
> --- a/arch/x86/include/asm/nmi.h
> +++ b/arch/x86/include/asm/nmi.h
> @@ -35,8 +35,14 @@ enum {
>  
>  typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
>  
> -int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
> -			 const char *);
> +struct nmiaction {
> +	struct list_head list;
> +	nmi_handler_t handler;
> +	unsigned int flags;
> +	const char *name;
> +};
> +
> +int register_nmi_handler(unsigned int, struct nmiaction *);
>  
>  void unregister_nmi_handler(unsigned int, const char *);
>  
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index 31cb9ae..9baea77 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -80,10 +80,14 @@ arch_trigger_all_cpu_backtrace_handler(unsigned int cmd, struct pt_regs *regs)
>  	return NMI_DONE;
>  }
>  
> +static struct nmiaction arch_trigger_all_cpu_bt_nmiaction = {
> +	.handler	= arch_trigger_all_cpu_backtrace_handler,
> +	.name		= "arch_bt",
> +};
> +
>  static int __init register_trigger_all_cpu_backtrace(void)
>  {
> -	register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
> -				0, "arch_bt");
> +	register_nmi_handler(NMI_LOCAL, &arch_trigger_all_cpu_bt_nmiaction);
>  	return 0;
>  }
>  early_initcall(register_trigger_all_cpu_backtrace);
> diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
> index 79b05b8..5739042 100644
> --- a/arch/x86/kernel/apic/x2apic_uv_x.c
> +++ b/arch/x86/kernel/apic/x2apic_uv_x.c
> @@ -715,9 +715,14 @@ int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
>  	return NMI_HANDLED;
>  }
>  
> +static struct nmiaction uv_nmiaction = {
> +	.handler	= uv_handle_nmi,
> +	.name		= "uv",
> +};
> +
>  void uv_register_nmi_notifier(void)
>  {
> -	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
> +	if (register_nmi_handler(NMI_UNKNOWN, &uv_nmiaction))
>  		printk(KERN_WARNING "UV NMI handler failed to register\n");
>  }
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-inject.c b/arch/x86/kernel/cpu/mcheck/mce-inject.c
> index fc4beb3..f9ab41c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-inject.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-inject.c
> @@ -229,14 +229,18 @@ static ssize_t mce_write(struct file *filp, const char __user *ubuf,
>  	return usize;
>  }
>  
> +static struct nmiaction mce_nmiaction = {
> +	.handler	= mce_raise_notify,
> +	.name		= "mce_notify",
> +};
> +
>  static int inject_init(void)
>  {
>  	if (!alloc_cpumask_var(&mce_inject_cpumask, GFP_KERNEL))
>  		return -ENOMEM;
>  	printk(KERN_INFO "Machine check injector initialized\n");
>  	register_mce_write_callback(mce_write);
> -	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0,
> -				"mce_notify");
> +	register_nmi_handler(NMI_LOCAL, &mce_nmiaction);
>  	return 0;
>  }
>  
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 5adce10..8bdff1b 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1246,6 +1246,11 @@ static void __init pmu_check_apic(void)
>  	pr_info("no hardware sampling interrupt available.\n");
>  }
>  
> +static struct nmiaction perf_event_nmiaction = {
> +	.handler	= perf_event_nmi_handler,
> +	.name		= "PMI",
> +};
> +
>  static int __init init_hw_perf_events(void)
>  {
>  	struct x86_pmu_quirk *quirk;
> @@ -1297,7 +1302,7 @@ static int __init init_hw_perf_events(void)
>  		((1LL << x86_pmu.num_counters_fixed)-1) << X86_PMC_IDX_FIXED;
>  
>  	perf_events_lapic_init();
> -	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
> +	register_nmi_handler(NMI_LOCAL, &perf_event_nmiaction);
>  
>  	unconstrained = (struct event_constraint)
>  		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index faba577..d259d2a 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -601,6 +601,11 @@ static struct notifier_block kgdb_notifier = {
>  	.notifier_call	= kgdb_notify,
>  };
>  
> +static struct nmiaction kgdb_nmiaction = {
> +	.handler	= kgdb_nmi_handler,
> +	.name		= "kgdb",
> +};
> +
>  /**
>   *	kgdb_arch_init - Perform any architecture specific initalization.
>   *
> @@ -615,13 +620,11 @@ int kgdb_arch_init(void)
>  	if (retval)
>  		goto out;
>  
> -	retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
> -					0, "kgdb");
> +	retval = register_nmi_handler(NMI_LOCAL, &kgdb_nmiaction);
>  	if (retval)
>  		goto out1;
>  
> -	retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
> -					0, "kgdb");
> +	retval = register_nmi_handler(NMI_UNKNOWN, &kgdb_nmiaction);
>  
>  	if (retval)
>  		goto out2;
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 47acaf3..d844acc 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -31,14 +31,6 @@
>  #include <asm/nmi.h>
>  #include <asm/x86_init.h>
>  
> -#define NMI_MAX_NAMELEN	16
> -struct nmiaction {
> -	struct list_head list;
> -	nmi_handler_t handler;
> -	unsigned int flags;
> -	char *name;
> -};
> -
>  struct nmi_desc {
>  	spinlock_t lock;
>  	struct list_head head;
> @@ -160,51 +152,18 @@ static struct nmiaction *__free_nmi(unsigned int type, const char *name)
>  	return (n);
>  }
>  
> -int register_nmi_handler(unsigned int type, nmi_handler_t handler,
> -			unsigned long nmiflags, const char *devname)
> +int register_nmi_handler(unsigned int type, struct nmiaction *na)
>  {
> -	struct nmiaction *action;
> -	int retval = -ENOMEM;
> -
> -	if (!handler)
> +	if (!na->handler)
>  		return -EINVAL;
>  
> -	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
> -	if (!action)
> -		goto fail_action;
> -
> -	action->handler = handler;
> -	action->flags = nmiflags;
> -	action->name = kstrndup(devname, NMI_MAX_NAMELEN, GFP_KERNEL);
> -	if (!action->name)
> -		goto fail_action_name;
> -
> -	retval = __setup_nmi(type, action);
> -
> -	if (retval)
> -		goto fail_setup_nmi;
> -
> -	return retval;
> -
> -fail_setup_nmi:
> -	kfree(action->name);
> -fail_action_name:
> -	kfree(action);
> -fail_action:	
> -
> -	return retval;
> +	return __setup_nmi(type, na);
>  }
>  EXPORT_SYMBOL_GPL(register_nmi_handler);
>  
>  void unregister_nmi_handler(unsigned int type, const char *name)
>  {
> -	struct nmiaction *a;
> -
> -	a = __free_nmi(type, name);
> -	if (a) {
> -		kfree(a->name);
> -		kfree(a);
> -	}
> +	__free_nmi(type, name);
>  }
>  
>  EXPORT_SYMBOL_GPL(unregister_nmi_handler);
> diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
> index 0d01a8e..40fd637 100644
> --- a/arch/x86/kernel/nmi_selftest.c
> +++ b/arch/x86/kernel/nmi_selftest.c
> @@ -37,10 +37,15 @@ static int nmi_unk_cb(unsigned int val, struct pt_regs *regs)
>  	return NMI_HANDLED;
>  }
>  
> +static struct nmiaction selftest_unk_nmiaction = {
> +	.handler	= nmi_unk_cb,
> +	.name		= "nmi_selftest_unk",
> +};
> +
>  static void init_nmi_testsuite(void)
>  {
>  	/* trap all the unknown NMIs we may generate */
> -	register_nmi_handler(NMI_UNKNOWN, nmi_unk_cb, 0, "nmi_selftest_unk");
> +	register_nmi_handler(NMI_UNKNOWN, &selftest_unk_nmiaction);
>  }
>  
>  static void cleanup_nmi_testsuite(void)
> @@ -58,12 +63,17 @@ static int test_nmi_ipi_callback(unsigned int val, struct pt_regs *regs)
>          return NMI_DONE;
>  }
>  
> +static struct nmiaction selftest_nmiaction = {
> +	.handler	= test_nmi_ipi_callback,
> +	.flags		= NMI_FLAG_FIRST,
> +	.name		= "nmi_selftest",
> +};
> +
>  static void test_nmi_ipi(struct cpumask *mask)
>  {
>  	unsigned long timeout;
>  
> -	if (register_nmi_handler(NMI_LOCAL, test_nmi_ipi_callback,
> -				 NMI_FLAG_FIRST, "nmi_selftest")) {
> +	if (register_nmi_handler(NMI_LOCAL, &selftest_nmiaction)) {
>  		nmi_fail = FAILURE;
>  		return;
>  	}
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index d840e69..a0f8c15 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -799,6 +799,12 @@ static void smp_send_nmi_allbutself(void)
>  	apic->send_IPI_allbutself(NMI_VECTOR);
>  }
>  
> +static struct nmiaction crash_nmiaction = {
> +	.handler	= crash_nmi_callback,
> +	.flags		= NMI_FLAG_FIRST,
> +	.name		= "crash",
> +};
> +
>  /* Halt all other CPUs, calling the specified function on each of them
>   *
>   * This function can be used to halt all other CPUs on crash
> @@ -817,8 +823,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
>  
>  	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
>  	/* Would it be better to replace the trap vector here? */
> -	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
> -				 NMI_FLAG_FIRST, "crash"))
> +	if (register_nmi_handler(NMI_LOCAL, &crash_nmiaction))
>  		return;		/* return what? */
>  	/* Ensure the new callback function is set before sending
>  	 * out the NMI
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 66c74f4..135d0b2 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -162,6 +162,12 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
>  	return NMI_HANDLED;
>  }
>  
> +static struct nmiaction smp_stop_nmiaction = {
> +	.handler	= smp_stop_nmi_callback,
> +	.flags		= NMI_FLAG_FIRST,
> +	.name		= "smp_stop",
> +};
> +
>  static void native_nmi_stop_other_cpus(int wait)
>  {
>  	unsigned long flags;
> @@ -179,8 +185,7 @@ static void native_nmi_stop_other_cpus(int wait)
>  		if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1)
>  			return;
>  
> -		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
> -					 NMI_FLAG_FIRST, "smp_stop"))
> +		if (register_nmi_handler(NMI_LOCAL, &smp_stop_nmiaction))
>  			/* Note: we ignore failures here */
>  			return;
>  
> diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
> index 26b8a85..c3408f6 100644
> --- a/arch/x86/oprofile/nmi_int.c
> +++ b/arch/x86/oprofile/nmi_int.c
> @@ -453,6 +453,11 @@ static struct notifier_block oprofile_cpu_nb = {
>  	.notifier_call = oprofile_cpu_notifier
>  };
>  
> +static struct nmiaction oprofile_nmiaction = {
> +	.handler	= profile_exceptions_notify,
> +	.name		= "oprofile",
> +};
> +
>  static int nmi_setup(void)
>  {
>  	int err = 0;
> @@ -489,8 +494,7 @@ static int nmi_setup(void)
>  	ctr_running = 0;
>  	/* make variables visible to the nmi handler: */
>  	smp_mb();
> -	err = register_nmi_handler(NMI_LOCAL, profile_exceptions_notify,
> -					0, "oprofile");
> +	err = register_nmi_handler(NMI_LOCAL, &oprofile_nmiaction);
>  	if (err)
>  		goto fail;
>  
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 9b3cac0..1d38f92 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -901,6 +901,11 @@ static unsigned long ghes_esource_prealloc_size(
>  	return prealloc_size;
>  }
>  
> +static struct nmiaction ghes_nmiaction = {
> +	.handler	= ghes_notify_nmi,
> +	.name		= "ghes",
> +};
> +
>  static int __devinit ghes_probe(struct platform_device *ghes_dev)
>  {
>  	struct acpi_hest_generic *generic;
> @@ -975,8 +980,7 @@ static int __devinit ghes_probe(struct platform_device *ghes_dev)
>  		ghes_estatus_pool_expand(len);
>  		mutex_lock(&ghes_list_mutex);
>  		if (list_empty(&ghes_nmi))
> -			register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0,
> -						"ghes");
> +			register_nmi_handler(NMI_LOCAL, &ghes_nmiaction);
>  		list_add_rcu(&ghes->list, &ghes_nmi);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
> index 34767a6..29a6e0a 100644
> --- a/drivers/char/ipmi/ipmi_watchdog.c
> +++ b/drivers/char/ipmi/ipmi_watchdog.c
> @@ -1296,6 +1296,13 @@ static int preop_op(const char *inval, char *outval)
>  	return 0;
>  }
>  
> +#ifdef HAVE_DIE_NMI
> +static struct nmiaction ipmi_nmiaction = {
> +	.handler	= ipmi_nmi,
> +	.name		= "ipmi",
> +};
> +#endif
> +
>  static void check_parms(void)
>  {
>  #ifdef HAVE_DIE_NMI
> @@ -1313,8 +1320,7 @@ static void check_parms(void)
>  		}
>  	}
>  	if (do_nmi && !nmi_handler_registered) {
> -		rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
> -						"ipmi");
> +		rv = register_nmi_handler(NMI_UNKNOWN, &ipmi_nmiaction);
>  		if (rv) {
>  			printk(KERN_WARNING PFX
>  			       "Can't register nmi handler\n");
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index 8464ea1..e575e63 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -689,9 +689,22 @@ static void __devinit dmi_find_icru(const struct dmi_header *dm, void *dummy)
>  	}
>  }
>  
> +static struct nmiaction hpwdt_nmiaction[] = {
> +	{
> +		.handler	= hpwdt_pretimeout,
> +		.name		= "hpwdt",
> +	},
> +	{
> +		.handler	= hpwdt_pretimeout,
> +		.flags		= NMI_FLAG_FIRST,
> +		.name		= "hpwdt",
> +	},
> +};
> +
>  static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
>  {
>  	int retval;
> +	struct nmiaction *na = hpwdt_nmiaction;
>  
>  	/*
>  	 * On typical CRU-based systems we need to map that service in
> @@ -733,9 +746,11 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
>  	 * die notify list to handle a critical NMI. The default is to
>  	 * be last so other users of the NMI signal can function.
>  	 */
> -	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout,
> -					(priority) ? NMI_FLAG_FIRST : 0,
> -					"hpwdt");
> +
> +	if (priority)
> +		na = &hpwdt_nmiaction[1];
> +
> +	retval = register_nmi_handler(NMI_UNKNOWN, na);
>  	if (retval != 0) {
>  		dev_warn(&dev->dev,
>  			"Unable to register a die notifier (err=%d).\n",
> -- 
> 1.7.1
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 10:05           ` [PATCH v2 x86 1/2] fix page faults by nmiaction " Li Zhong
  2012-03-05 10:29             ` Wim Van Sebroeck
  2012-03-05 15:54             ` Don Zickus
@ 2012-03-05 17:49             ` Peter Zijlstra
  2012-03-05 21:45               ` Don Zickus
  2 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2012-03-05 17:49 UTC (permalink / raw)
  To: Li Zhong
  Cc: LKML, tglx, mingo, hpa, x86, paulus, mingo, acme, Vegard Nossum,
	Don Zickus, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

On Mon, 2012-03-05 at 18:05 +0800, Li Zhong wrote:
> +static struct nmiaction arch_trigger_all_cpu_bt_nmiaction = {
> +       .handler        = arch_trigger_all_cpu_backtrace_handler,
> +       .name           = "arch_bt",
> +};
> +
>  static int __init register_trigger_all_cpu_backtrace(void)
>  {
> -       register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
> -                               0, "arch_bt");
> +       register_nmi_handler(NMI_LOCAL, &arch_trigger_all_cpu_bt_nmiaction);
>         return 0;
>  } 

If you look at things like cpu_notifier() you can shorten this still:

#define nmi_notifier(t, fn, n)				\
do {							\
	static struct nmiaction fn##_na = {		\
		.handler = (fn),			\
		.name = (n),				\
	};						\
	register_nmi_handler((t), &fn##_na);		\
} while (0)

The whole thing then becomes:

  nmi_notifier(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler, "arch_bt");



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 15:54             ` Don Zickus
@ 2012-03-05 17:55               ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2012-03-05 17:55 UTC (permalink / raw)
  To: Don Zickus
  Cc: Li Zhong, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

On Mon, 2012-03-05 at 10:54 -0500, Don Zickus wrote:
> This is one way of doing this.  I was trying to avoid this when I rewrote the
> nmi handlers, because everyone kept screwing up the structs.  I thought it
> would be safer to have callers pass in data based on an api instead.

Apparently kmemcheck marks pages as non-present and does magic in the
fault handler. Having the action thing allocated meant kmemcheck also
marks that thing as non-present in the page-tables, the list iteration
from NMI context would then fault and things would go funny.

There's two ways out, help kmemcheck with a new annotation (which of
course starts with checking if there isn't already such a thing).

Or this one, avoid the action things from being allocated, this
side-steps kmemcheck and avoids the problem thusly.

Sadly this patch doesn't at all mention the first possibility and why
that isn't a feasible approach. A well...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 17:49             ` Peter Zijlstra
@ 2012-03-05 21:45               ` Don Zickus
  2012-03-06 10:09                 ` [PATCH v3 " Li Zhong
  0 siblings, 1 reply; 22+ messages in thread
From: Don Zickus @ 2012-03-05 21:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zhong, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

On Mon, Mar 05, 2012 at 06:49:07PM +0100, Peter Zijlstra wrote:
> On Mon, 2012-03-05 at 18:05 +0800, Li Zhong wrote:
> > +static struct nmiaction arch_trigger_all_cpu_bt_nmiaction = {
> > +       .handler        = arch_trigger_all_cpu_backtrace_handler,
> > +       .name           = "arch_bt",
> > +};
> > +
> >  static int __init register_trigger_all_cpu_backtrace(void)
> >  {
> > -       register_nmi_handler(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler,
> > -                               0, "arch_bt");
> > +       register_nmi_handler(NMI_LOCAL, &arch_trigger_all_cpu_bt_nmiaction);
> >         return 0;
> >  } 
> 
> If you look at things like cpu_notifier() you can shorten this still:
> 
> #define nmi_notifier(t, fn, n)				\
> do {							\
> 	static struct nmiaction fn##_na = {		\
> 		.handler = (fn),			\
> 		.name = (n),				\
> 	};						\
> 	register_nmi_handler((t), &fn##_na);		\
> } while (0)
> 
> The whole thing then becomes:
> 
>   nmi_notifier(NMI_LOCAL, arch_trigger_all_cpu_backtrace_handler, "arch_bt");

Hi Li,

I would be open to this suggestion.  If you want to modify the patch for
this approach.

Cheers,
Don

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 10:29             ` Wim Van Sebroeck
@ 2012-03-06  1:46               ` Li Zhong
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhong @ 2012-03-06  1:46 UTC (permalink / raw)
  To: Wim Van Sebroeck
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, Don Zickus, tony.luck, bp, robert.richter, lenb,
	minyard, linux-edac, oprofile-list, linux-acpi,
	openipmi-developer, linux-watchdog, Thomas.Mingarelli

On Mon, 2012-03-05 at 11:29 +0100, Wim Van Sebroeck wrote:
> Hi Li,
> 
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index 8464ea1..e575e63 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -689,9 +689,22 @@ static void __devinit dmi_find_icru(const struct dmi_header *dm, void *dummy)
> >  	}
> >  }
> >  
> > +static struct nmiaction hpwdt_nmiaction[] = {
> > +	{
> > +		.handler	= hpwdt_pretimeout,
> > +		.name		= "hpwdt",
> > +	},
> > +	{
> > +		.handler	= hpwdt_pretimeout,
> > +		.flags		= NMI_FLAG_FIRST,
> > +		.name		= "hpwdt",
> > +	},
> > +};
> > +
> >  static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
> >  {
> >  	int retval;
> > +	struct nmiaction *na = hpwdt_nmiaction;
> >  
> >  	/*
> >  	 * On typical CRU-based systems we need to map that service in
> > @@ -733,9 +746,11 @@ static int __devinit hpwdt_init_nmi_decoding(struct pci_dev *dev)
> >  	 * die notify list to handle a critical NMI. The default is to
> >  	 * be last so other users of the NMI signal can function.
> >  	 */
> > -	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout,
> > -					(priority) ? NMI_FLAG_FIRST : 0,
> > -					"hpwdt");
> > +
> > +	if (priority)
> > +		na = &hpwdt_nmiaction[1];
> > +
> > +	retval = register_nmi_handler(NMI_UNKNOWN, na);
> >  	if (retval != 0) {
> >  		dev_warn(&dev->dev,
> >  			"Unable to register a die notifier (err=%d).\n",
> 
> Why not do something like;
> 
> static struct nmiaction hpwdt_nmiaction = {
> 	.handler	= hpwdt_pretimeout,
> 	.name		= "hpwdt",
> };
> 
> ...
> 	if (priority)
> 		hpwdt_nmiaction.flags = NMI_FLAG_FIRST;
> ...
> 

Thank you, Wim. I'll update it. 

Thanks,
Zhong

> Kind regards,
> Wim.
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-05 21:45               ` Don Zickus
@ 2012-03-06 10:09                 ` Li Zhong
  2012-03-06 10:27                   ` Vegard Nossum
  2012-03-06 15:00                   ` Don Zickus
  0 siblings, 2 replies; 22+ messages in thread
From: Li Zhong @ 2012-03-06 10:09 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

This patch tries to fix the problem of page fault exception caused by
accessing nmiaction structure in nmi if kmemcheck is enabled. 

If kmemcheck is enabled, the memory allocated through slab are in pages
that are marked non-present, so that some checks could be done in the
page fault handling code ( e.g. whether the memory is read before
written to ). 
As nmiaction is allocated in this way, so it resides in a non-present
page. Then there is a page fault while the nmi code accessing the
nmiaction structure, which would then cause a warning by
WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault().

v2: as Peter suggested, changed the nmiaction to use static storage.

v3: as Peter suggested, use macro to shorten the codes. Also keep the
original usage of register_nmi_handler, so users of this call doesn't
need change. 

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/nmi.h |   19 ++++++++++++++-
 arch/x86/kernel/nmi.c      |   52
++++++--------------------------------------
 2 files changed, 24 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index fd3f9f1..5a2b2c6 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -35,8 +35,23 @@ enum {
 
 typedef int (*nmi_handler_t)(unsigned int, struct pt_regs *);
 
-int register_nmi_handler(unsigned int, nmi_handler_t, unsigned long,
-			 const char *);
+struct nmiaction {
+	struct list_head list;
+	nmi_handler_t handler;
+	unsigned int flags;
+	const char *name;
+};
+
+#define register_nmi_handler(t, fn, fg, n)		\
+({							\
+	static struct nmiaction fn##_na = {		\
+		.handler = (fn),			\
+		.name = (n),				\
+	};						\
+	__register_nmi_handler((t), (fg), &fn##_na);	\
+})
+
+int __register_nmi_handler(unsigned int, unsigned int, struct nmiaction
*);
 
 void unregister_nmi_handler(unsigned int, const char *);
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 47acaf3..a097559 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -31,14 +31,6 @@
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
 
-#define NMI_MAX_NAMELEN	16
-struct nmiaction {
-	struct list_head list;
-	nmi_handler_t handler;
-	unsigned int flags;
-	char *name;
-};
-
 struct nmi_desc {
 	spinlock_t lock;
 	struct list_head head;
@@ -160,51 +152,21 @@ static struct nmiaction *__free_nmi(unsigned int
type, const char *name)
 	return (n);
 }
 
-int register_nmi_handler(unsigned int type, nmi_handler_t handler,
-			unsigned long nmiflags, const char *devname)
+int __register_nmi_handler(unsigned int type, unsigned int nmiflags,
+						struct nmiaction *na)
 {
-	struct nmiaction *action;
-	int retval = -ENOMEM;
-
-	if (!handler)
+	if (!na->handler)
 		return -EINVAL;
 
-	action = kzalloc(sizeof(struct nmiaction), GFP_KERNEL);
-	if (!action)
-		goto fail_action;
-
-	action->handler = handler;
-	action->flags = nmiflags;
-	action->name = kstrndup(devname, NMI_MAX_NAMELEN, GFP_KERNEL);
-	if (!action->name)
-		goto fail_action_name;
-
-	retval = __setup_nmi(type, action);
-
-	if (retval)
-		goto fail_setup_nmi;
+	na->flags = nmiflags;
 
-	return retval;
-
-fail_setup_nmi:
-	kfree(action->name);
-fail_action_name:
-	kfree(action);
-fail_action:	
-
-	return retval;
+	return __setup_nmi(type, na);
 }
-EXPORT_SYMBOL_GPL(register_nmi_handler);
+EXPORT_SYMBOL_GPL(__register_nmi_handler);
 
 void unregister_nmi_handler(unsigned int type, const char *name)
 {
-	struct nmiaction *a;
-
-	a = __free_nmi(type, name);
-	if (a) {
-		kfree(a->name);
-		kfree(a);
-	}
+	__free_nmi(type, name);
 }
 
 EXPORT_SYMBOL_GPL(unregister_nmi_handler);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 x86 1/2] fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-06 10:09                 ` [PATCH v3 " Li Zhong
@ 2012-03-06 10:27                   ` Vegard Nossum
  2012-03-09  9:52                     ` Li Zhong
  2012-03-06 15:00                   ` Don Zickus
  1 sibling, 1 reply; 22+ messages in thread
From: Vegard Nossum @ 2012-03-06 10:27 UTC (permalink / raw)
  To: Li Zhong
  Cc: Don Zickus, Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus,
	mingo, acme, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog, Pekka Enberg

On 6 March 2012 11:09, Li Zhong <zhong@linux.vnet.ibm.com> wrote:
> This patch tries to fix the problem of page fault exception caused by
> accessing nmiaction structure in nmi if kmemcheck is enabled.
>
> If kmemcheck is enabled, the memory allocated through slab are in pages
> that are marked non-present, so that some checks could be done in the
> page fault handling code ( e.g. whether the memory is read before
> written to ).
> As nmiaction is allocated in this way, so it resides in a non-present
> page. Then there is a page fault while the nmi code accessing the
> nmiaction structure, which would then cause a warning by
> WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault().
>
> v2: as Peter suggested, changed the nmiaction to use static storage.
>
> v3: as Peter suggested, use macro to shorten the codes. Also keep the
> original usage of register_nmi_handler, so users of this call doesn't
> need change.
>
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>

Looks like you've solved this now. Thanks.

For the record, another way to prevent the page fault from happening
in the first place is to set up a new slab cache with the flag
SLAB_NOTRACK. This is different from the GFP_NOTRACK flag which, as
you noted, doesn't prevent page faults, just inhibits
checking/warnings for those memory areas.

It's a bit of a hassle, I admit. Maybe we could create an additional,
separate set of slab caches (using SLAB_NOTRACK) and a new GFP flag
which selects this set of caches instead. This would allow anything
that takes a gfp_t to allocate memory that is guaranteed not to page
fault when using kmemcheck. Pekka, any thoughts?


Vegard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 x86 1/2]  fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-06 10:09                 ` [PATCH v3 " Li Zhong
  2012-03-06 10:27                   ` Vegard Nossum
@ 2012-03-06 15:00                   ` Don Zickus
  1 sibling, 0 replies; 22+ messages in thread
From: Don Zickus @ 2012-03-06 15:00 UTC (permalink / raw)
  To: Li Zhong
  Cc: Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus, mingo, acme,
	Vegard Nossum, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog

On Tue, Mar 06, 2012 at 06:09:43PM +0800, Li Zhong wrote:
> This patch tries to fix the problem of page fault exception caused by
> accessing nmiaction structure in nmi if kmemcheck is enabled. 
> 
> If kmemcheck is enabled, the memory allocated through slab are in pages
> that are marked non-present, so that some checks could be done in the
> page fault handling code ( e.g. whether the memory is read before
> written to ). 
> As nmiaction is allocated in this way, so it resides in a non-present
> page. Then there is a page fault while the nmi code accessing the
> nmiaction structure, which would then cause a warning by
> WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault().
> 
> v2: as Peter suggested, changed the nmiaction to use static storage.
> 
> v3: as Peter suggested, use macro to shorten the codes. Also keep the
> original usage of register_nmi_handler, so users of this call doesn't
> need change. 

Thanks Li!  Looks fine to me.  I'll run some tests to make sure things
still work correctly.

Cheers,
Don

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 x86 1/2] fix page faults by nmiaction in nmi if kmemcheck is enabled
  2012-03-06 10:27                   ` Vegard Nossum
@ 2012-03-09  9:52                     ` Li Zhong
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhong @ 2012-03-09  9:52 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Don Zickus, Peter Zijlstra, LKML, tglx, mingo, hpa, x86, paulus,
	mingo, acme, tony.luck, bp, robert.richter, lenb, minyard, wim,
	linux-edac, oprofile-list, linux-acpi, openipmi-developer,
	linux-watchdog, Pekka Enberg

On Tue, 2012-03-06 at 11:27 +0100, Vegard Nossum wrote:
> On 6 March 2012 11:09, Li Zhong <zhong@linux.vnet.ibm.com> wrote:
> > This patch tries to fix the problem of page fault exception caused by
> > accessing nmiaction structure in nmi if kmemcheck is enabled.
> >
> > If kmemcheck is enabled, the memory allocated through slab are in pages
> > that are marked non-present, so that some checks could be done in the
> > page fault handling code ( e.g. whether the memory is read before
> > written to ).
> > As nmiaction is allocated in this way, so it resides in a non-present
> > page. Then there is a page fault while the nmi code accessing the
> > nmiaction structure, which would then cause a warning by
> > WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called by do_page_fault().
> >
> > v2: as Peter suggested, changed the nmiaction to use static storage.
> >
> > v3: as Peter suggested, use macro to shorten the codes. Also keep the
> > original usage of register_nmi_handler, so users of this call doesn't
> > need change.
> >
> > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> 
> Looks like you've solved this now. Thanks.

There is still one place [2/2] not solved ... and I guess it might need
the way you suggested below.

> 
> For the record, another way to prevent the page fault from happening
> in the first place is to set up a new slab cache with the flag
> SLAB_NOTRACK. This is different from the GFP_NOTRACK flag which, as
> you noted, doesn't prevent page faults, just inhibits
> checking/warnings for those memory areas.
> 
> It's a bit of a hassle, I admit. Maybe we could create an additional,
> separate set of slab caches (using SLAB_NOTRACK) and a new GFP flag
> which selects this set of caches instead. This would allow anything
> that takes a gfp_t to allocate memory that is guaranteed not to page
> fault when using kmemcheck. Pekka, any thoughts?
> 

I'm not sure whether I understand it correctly? 
  If CONFIG_KMEMCHECK is enabled, create another two sets of
malloc_sizes caches, one for cs_cachep, one for cs_dmacachep, both with
SLAB_NOTRACK flag. 

  Create a new GFP flag, like __GFP_NO_PF for those places where page
fault is not allowed, and return memory from the caches created above.
This new GFP flag is set to 0 if CONFIG_KMEMCHECK is not enabled. 

I think there shouldn't be many such cases, so most of these caches
wouldn't actually be used ...

Thanks,
Zhong

> 
> Vegard
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-03-09  9:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-20  6:01 [PATCH 0/2 x86] fix some page faults in nmi if kmemcheck is enabled Li Zhong
2012-02-20  6:04 ` [PATCH 1/2 x86] fix page faults by nmi handler " Li Zhong
2012-02-20  6:07   ` [PATCH 2/2 x86] fix page faults by perf events " Li Zhong
2012-02-20 11:00 ` [PATCH 0/2 x86] fix some page faults " Peter Zijlstra
2012-02-21  1:42   ` Li Zhong
2012-02-21 10:17     ` Peter Zijlstra
2012-02-23  9:53       ` Li Zhong
2012-02-27 10:58         ` Peter Zijlstra
2012-02-28  2:45           ` Li Zhong
2012-03-02 19:44             ` Don Zickus
2012-03-05  1:49               ` Li Zhong
2012-03-05 10:05           ` [PATCH v2 x86 1/2] fix page faults by nmiaction " Li Zhong
2012-03-05 10:29             ` Wim Van Sebroeck
2012-03-06  1:46               ` Li Zhong
2012-03-05 15:54             ` Don Zickus
2012-03-05 17:55               ` Peter Zijlstra
2012-03-05 17:49             ` Peter Zijlstra
2012-03-05 21:45               ` Don Zickus
2012-03-06 10:09                 ` [PATCH v3 " Li Zhong
2012-03-06 10:27                   ` Vegard Nossum
2012-03-09  9:52                     ` Li Zhong
2012-03-06 15:00                   ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).