All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
       [not found] <tip-3ff0141aa3a03ca3388b40b36167d0a37919f3fd@git.kernel.org>
@ 2009-06-15 14:46 ` Peter Zijlstra
  2009-06-15 15:30   ` Hugh Dickins
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 14:46 UTC (permalink / raw)
  To: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx, mingo
  Cc: linux-tip-commits, hugh.dickins

On Mon, 2009-06-15 at 14:07 +0000, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  3ff0141aa3a03ca3388b40b36167d0a37919f3fd
> Gitweb:     http://git.kernel.org/tip/3ff0141aa3a03ca3388b40b36167d0a37919f3fd
> Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
> AuthorDate: Mon, 15 Jun 2009 12:40:41 +0200
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Mon, 15 Jun 2009 15:57:52 +0200
> 
> x86: Add NMI types for kmap_atomic
> 
> Two new kmap_atomic slots for NMI context. And teach pte_offset_map()
> about NMI context.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> CC: Nick Piggin <npiggin@suse.de>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> 
> ---
>  arch/x86/include/asm/kmap_types.h |    4 +++-
>  arch/x86/include/asm/pgtable_32.h |    5 +++--
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kmap_types.h b/arch/x86/include/asm/kmap_types.h
> index 5759c16..ff00a44 100644
> --- a/arch/x86/include/asm/kmap_types.h
> +++ b/arch/x86/include/asm/kmap_types.h
> @@ -21,7 +21,9 @@ D(9)	KM_IRQ0,
>  D(10)	KM_IRQ1,
>  D(11)	KM_SOFTIRQ0,
>  D(12)	KM_SOFTIRQ1,
> -D(13)	KM_TYPE_NR
> +D(13)	KM_NMI,
> +D(14)	KM_NMI_PTE,
> +D(15)	KM_TYPE_NR
>  };
>  
>  #undef D
> diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
> index 31bd120..8546497 100644
> --- a/arch/x86/include/asm/pgtable_32.h
> +++ b/arch/x86/include/asm/pgtable_32.h
> @@ -49,13 +49,14 @@ extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t);
>  #endif
>  
>  #if defined(CONFIG_HIGHPTE)
> +#define __KM_PTE	(in_nmi() ? KM_NMI_PTE : KM_PTE0)
>  #define pte_offset_map(dir, address)					\
> -	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE0) +		\
> +	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) +		\
>  	 pte_index((address)))
>  #define pte_offset_map_nested(dir, address)				\
>  	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE1) +		\
>  	 pte_index((address)))
> -#define pte_unmap(pte) kunmap_atomic((pte), KM_PTE0)
> +#define pte_unmap(pte) kunmap_atomic((pte), __KM_PTE)
>  #define pte_unmap_nested(pte) kunmap_atomic((pte), KM_PTE1)
>  #else
>  #define pte_offset_map(dir, address)					\

I just realized this has a kmap_atomic bug in... 

The below would fix it, but that's getting rather ugly :-/,
alternatively I would have to introduce something like
pte_offset_map_irq() which would make the irq/nmi detection and leave
the regular code paths alone, however that would mean either duplicating
the gup_fast() pagewalk or passing down a pte function pointer, which
would only duplicate the gup_pte_range() bit, neither is really
attractive...

Index: linux-2.6/arch/x86/include/asm/kmap_types.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/kmap_types.h
+++ linux-2.6/arch/x86/include/asm/kmap_types.h
@@ -19,11 +19,12 @@ D(7)	KM_PTE0,
 D(8)	KM_PTE1,
 D(9)	KM_IRQ0,
 D(10)	KM_IRQ1,
-D(11)	KM_SOFTIRQ0,
-D(12)	KM_SOFTIRQ1,
-D(13)	KM_NMI,
-D(14)	KM_NMI_PTE,
-D(15)	KM_TYPE_NR
+D(11)	KM_IRQ_PTE,
+D(12)	KM_SOFTIRQ0,
+D(13)	KM_SOFTIRQ1,
+D(14)	KM_NMI,
+D(15)	KM_NMI_PTE,
+D(16)	KM_TYPE_NR
 };
 
 #undef D
Index: linux-2.6/arch/x86/include/asm/pgtable_32.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h
+++ linux-2.6/arch/x86/include/asm/pgtable_32.h
@@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u
 #endif
 
 #if defined(CONFIG_HIGHPTE)
-#define __KM_PTE	(in_nmi() ? KM_NMI_PTE : KM_PTE0)
+#define __KM_PTE			\
+	(in_nmi() ? KM_NMI_PTE : 	\
+	 in_irq() ? KM_IRQ_PTE :	\
+	 KM_PTE0)
 #define pte_offset_map(dir, address)					\
 	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) +		\
 	 pte_index((address)))




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 14:46 ` [tip:perfcounters/core] x86: Add NMI types for kmap_atomic Peter Zijlstra
@ 2009-06-15 15:30   ` Hugh Dickins
  2009-06-15 15:41     ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Hugh Dickins @ 2009-06-15 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx,
	mingo, linux-tip-commits, hugh.dickins

On Mon, 15 Jun 2009, Peter Zijlstra wrote:
> 
> The below would fix it, but that's getting rather ugly :-/,
> alternatively I would have to introduce something like
> pte_offset_map_irq() which would make the irq/nmi detection and leave
> the regular code paths alone, however that would mean either duplicating
> the gup_fast() pagewalk or passing down a pte function pointer, which
> would only duplicate the gup_pte_range() bit, neither is really
> attractive...

> Index: linux-2.6/arch/x86/include/asm/pgtable_32.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h
> +++ linux-2.6/arch/x86/include/asm/pgtable_32.h
> @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u
>  #endif
>  
>  #if defined(CONFIG_HIGHPTE)
> -#define __KM_PTE	(in_nmi() ? KM_NMI_PTE : KM_PTE0)
> +#define __KM_PTE			\
> +	(in_nmi() ? KM_NMI_PTE : 	\
> +	 in_irq() ? KM_IRQ_PTE :	\
> +	 KM_PTE0)
>  #define pte_offset_map(dir, address)					\
>  	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) +		\
>  	 pte_index((address)))

Yes, that does look ugly!

I've not been following the background to this, but I've often
wondered if a kmap_push() and kmap_pop() could be useful,
allowing you to reuse the slot in between - any use here?

Hugh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 15:30   ` Hugh Dickins
@ 2009-06-15 15:41     ` Peter Zijlstra
  2009-06-15 15:52       ` Ingo Molnar
  2009-06-15 18:04       ` Peter Zijlstra
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 15:41 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx,
	mingo, linux-tip-commits

On Mon, 2009-06-15 at 16:30 +0100, Hugh Dickins wrote:
> On Mon, 15 Jun 2009, Peter Zijlstra wrote:
> > 
> > The below would fix it, but that's getting rather ugly :-/,
> > alternatively I would have to introduce something like
> > pte_offset_map_irq() which would make the irq/nmi detection and leave
> > the regular code paths alone, however that would mean either duplicating
> > the gup_fast() pagewalk or passing down a pte function pointer, which
> > would only duplicate the gup_pte_range() bit, neither is really
> > attractive...
> 
> > Index: linux-2.6/arch/x86/include/asm/pgtable_32.h
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h
> > +++ linux-2.6/arch/x86/include/asm/pgtable_32.h
> > @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u
> >  #endif
> >  
> >  #if defined(CONFIG_HIGHPTE)
> > -#define __KM_PTE	(in_nmi() ? KM_NMI_PTE : KM_PTE0)
> > +#define __KM_PTE			\
> > +	(in_nmi() ? KM_NMI_PTE : 	\
> > +	 in_irq() ? KM_IRQ_PTE :	\
> > +	 KM_PTE0)
> >  #define pte_offset_map(dir, address)					\
> >  	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) +		\
> >  	 pte_index((address)))
> 
> Yes, that does look ugly!
> 
> I've not been following the background to this, 

We need/want to do a user-space stack walk from IRQ/NMI context. The NMI
bit means we cannot simply use __copy_from_user_inatomic() since that
will still fault (albeit not page), and the fault return path invokes
IRET which will terminate the NMI context.

Therefore I wrote a copy_from_user_nmi() variant that is based of of
__get_user_pages_fast() (a variant that doesn't fall back to the regular
GUP), but that means we get 2 kmap_atomic()s, one for HIGHPTE and one
for the user page.

So this introduces the pte map from IRQ context and one from NMI
context.

> but I've often
> wondered if a kmap_push() and kmap_pop() could be useful,
> allowing you to reuse the slot in between - any use here?

Yes, that would be much nicer, although less we would loose some of the
type validation that lives in -mm, (along with a massive overhaul of the
current kmap_atomic usage).

Hmm, if we give each explicit type an level and ensure the new push()'ed
type's level <= the previous one, we'd still have the full nesting
validation and such..

I'll look into doing this.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 15:41     ` Peter Zijlstra
@ 2009-06-15 15:52       ` Ingo Molnar
  2009-06-15 16:02         ` Hugh Dickins
  2009-06-15 18:04       ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-06-15 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, mathieu.desnoyers, torvalds,
	Jeremy Fitzhardinge


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> > I've not been following the background to this,
> 
> We need/want to do a user-space stack walk from IRQ/NMI context. 
> The NMI bit means we cannot simply use __copy_from_user_inatomic() 
> since that will still fault (albeit not page), and the fault 
> return path invokes IRET which will terminate the NMI context.

The goal is to allow 'perf' (see tools/perf/) non-flat 
categorizations like the sample output in the (pending) commit (see 
it attached further below). Here's the kind of output it allows:

 $ perf record -g -m 512 -f -- make -j32 kernel
 $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi

     4.14%  [k] do_page_fault
     1.20%  [k] sys_write
     1.10%  [k] sys_open
     0.63%  [k] sys_exit_group
     0.48%  [k] smp_apic_timer_interrupt
     0.37%  [k] sys_read
     0.37%  [k] sys_execve
     0.20%  [k] sys_mmap
     0.18%  [k] sys_close
     0.14%  [k] sys_munmap
     0.13%  [k] sys_poll

Note that Oprofile uses the same method of __copy_user_inatomic() in 
arch/x86/oprofile/backtrace.c, but i believe that code is broken - i 
doubt the call-chain support for user-space stacks ever worked in 
oprofile - with perfcounters i can make this method crash under 
load. (we re-enter the NMI which due to IST executes over the exact 
same, still pending NMI frame. Kaboom.)

I saw you being involved with the Oprofile code 3 years ago:

| commit c34d1b4d165c67b966bca4aba026443d7ff161eb
| Author: Hugh Dickins <hugh@veritas.com>
| Date:   Sat Oct 29 18:16:32 2005 -0700
|
|    [PATCH] mm: kill check_user_page_readable

That method of __copy_user_inatomic(), while elegant, is subtly 
wrong in an NMI context. We really must avoid taking faults there.

	Ingo

------------>
>From 3dfabc74c65904c9e6cf952391312d16ea772ef5 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 15 Jun 2009 11:24:38 +0200
Subject: [PATCH] perf report: Add per system call overhead histogram

Take advantage of call-graph percounter sampling/recording to
display a non-trivial histogram: the true, collapsed/summarized
cost measurement, on a per system call total overhead basis:

 aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
 aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
 #
 # (3536 samples)
 #
 # Overhead  Symbol
 # ........  ......
 #
     40.75%  [k] sys_write
     40.21%  [k] sys_read
      4.44%  [k] do_nmi
 ...

This is done by accounting each (reliable) call-chain that chains back
to a given system call to that system call function.

[ So in the above example we can see that hackbench spends about 40% of
  its total time somewhere in sys_write() and 40% somewhere in
  sys_read(), the rest of the time is spent in user-space. The time
  is not spent in sys_write() _itself_ but in one of its many child
  functions. ]

Or, a recording of a (source files are already in the page-cache) kernel build:

 $ perf record -g -m 512 -f -- make -j32 kernel
 $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi

     4.14%  [k] do_page_fault
     1.20%  [k] sys_write
     1.10%  [k] sys_open
     0.63%  [k] sys_exit_group
     0.48%  [k] smp_apic_timer_interrupt
     0.37%  [k] sys_read
     0.37%  [k] sys_execve
     0.20%  [k] sys_mmap
     0.18%  [k] sys_close
     0.14%  [k] sys_munmap
     0.13%  [k] sys_poll
     0.09%  [k] sys_newstat
     0.07%  [k] sys_clone
     0.06%  [k] sys_newfstat
     0.05%  [k] sys_access
     0.05%  [k] schedule

Shows the true total cost of each syscall variant that gets used
during a kernel build. This profile reveals it that pagefaults are
the costliest, followed by read()/write().

An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
per 100 seconds of kernel build-time. (this was done with HZ=1000)

The summary is done in 'perf report', i.e. in the post-processing
stage - so once we have a good call-graph recording, this type of
non-trivial high-level analysis becomes possible.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 tools/perf/builtin-report.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index aebba56..1e2f5dd 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -40,6 +40,7 @@ static int		dump_trace = 0;
 
 static int		verbose;
 static int		full_paths;
+static int		collapse_syscalls;
 
 static unsigned long	page_size;
 static unsigned long	mmap_window = 32;
@@ -983,6 +984,15 @@ process_overflow_event(event_t *event, unsigned long offset, unsigned long head)
 			for (i = 0; i < chain->nr; i++)
 				dprintf("..... %2d: %p\n", i, (void *)chain->ips[i]);
 		}
+		if (collapse_syscalls) {
+			/*
+			 * Find the all-but-last kernel entry
+			 * amongst the call-chains - to get
+			 * to the level of system calls:
+			 */
+			if (chain->kernel >= 2)
+				ip = chain->ips[chain->kernel-2];
+		}
 	}
 
 	dprintf(" ... thread: %s:%d\n", thread->comm, thread->pid);
@@ -1343,6 +1353,8 @@ static const struct option options[] = {
 		   "sort by key(s): pid, comm, dso, symbol. Default: pid,symbol"),
 	OPT_BOOLEAN('P', "full-paths", &full_paths,
 		    "Don't shorten the pathnames taking into account the cwd"),
+	OPT_BOOLEAN('S', "syscalls", &collapse_syscalls,
+		    "show per syscall summary overhead, using call graph"),
 	OPT_END()
 };
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 15:52       ` Ingo Molnar
@ 2009-06-15 16:02         ` Hugh Dickins
  0 siblings, 0 replies; 23+ messages in thread
From: Hugh Dickins @ 2009-06-15 16:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, mathieu.desnoyers, torvalds,
	Jeremy Fitzhardinge

On Mon, 15 Jun 2009, Ingo Molnar wrote:
> 
> Note that Oprofile uses the same method of __copy_user_inatomic() in 
> arch/x86/oprofile/backtrace.c, but i believe that code is broken - i 
> doubt the call-chain support for user-space stacks ever worked in 
> oprofile - with perfcounters i can make this method crash under 
> load. (we re-enter the NMI which due to IST executes over the exact 
> same, still pending NMI frame. Kaboom.)
> 
> I saw you being involved with the Oprofile code 3 years ago:
> 
> | commit c34d1b4d165c67b966bca4aba026443d7ff161eb
> | Author: Hugh Dickins <hugh@veritas.com>
> | Date:   Sat Oct 29 18:16:32 2005 -0700
> |
> |    [PATCH] mm: kill check_user_page_readable
> 
> That method of __copy_user_inatomic(), while elegant, is subtly 
> wrong in an NMI context. We really must avoid taking faults there.

Yes, I'm afraid that subtlety escaped me - thanks for explaining.

Hugh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 15:41     ` Peter Zijlstra
  2009-06-15 15:52       ` Ingo Molnar
@ 2009-06-15 18:04       ` Peter Zijlstra
  2009-06-15 18:15         ` Ingo Molnar
  2009-06-15 18:42         ` Andrew Morton
  1 sibling, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 18:04 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-kernel, mingo, hpa, paulus, acme, efault, npiggin, tglx,
	mingo, linux-tip-commits, Linus Torvalds, Andrew Morton

On Mon, 2009-06-15 at 17:41 +0200, Peter Zijlstra wrote:
> On Mon, 2009-06-15 at 16:30 +0100, Hugh Dickins wrote:
> > On Mon, 15 Jun 2009, Peter Zijlstra wrote:
> > > 
> > > The below would fix it, but that's getting rather ugly :-/,
> > > alternatively I would have to introduce something like
> > > pte_offset_map_irq() which would make the irq/nmi detection and leave
> > > the regular code paths alone, however that would mean either duplicating
> > > the gup_fast() pagewalk or passing down a pte function pointer, which
> > > would only duplicate the gup_pte_range() bit, neither is really
> > > attractive...
> > 
> > > Index: linux-2.6/arch/x86/include/asm/pgtable_32.h
> > > ===================================================================
> > > --- linux-2.6.orig/arch/x86/include/asm/pgtable_32.h
> > > +++ linux-2.6/arch/x86/include/asm/pgtable_32.h
> > > @@ -49,7 +49,10 @@ extern void set_pmd_pfn(unsigned long, u
> > >  #endif
> > >  
> > >  #if defined(CONFIG_HIGHPTE)
> > > -#define __KM_PTE	(in_nmi() ? KM_NMI_PTE : KM_PTE0)
> > > +#define __KM_PTE			\
> > > +	(in_nmi() ? KM_NMI_PTE : 	\
> > > +	 in_irq() ? KM_IRQ_PTE :	\
> > > +	 KM_PTE0)
> > >  #define pte_offset_map(dir, address)					\
> > >  	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) +		\
> > >  	 pte_index((address)))
> > 
> > Yes, that does look ugly!
> > 
> > I've not been following the background to this, 
> 
> We need/want to do a user-space stack walk from IRQ/NMI context. The NMI
> bit means we cannot simply use __copy_from_user_inatomic() since that
> will still fault (albeit not page), and the fault return path invokes
> IRET which will terminate the NMI context.
> 
> Therefore I wrote a copy_from_user_nmi() variant that is based of of
> __get_user_pages_fast() (a variant that doesn't fall back to the regular
> GUP), but that means we get 2 kmap_atomic()s, one for HIGHPTE and one
> for the user page.
> 
> So this introduces the pte map from IRQ context and one from NMI
> context.
> 
> > but I've often
> > wondered if a kmap_push() and kmap_pop() could be useful,
> > allowing you to reuse the slot in between - any use here?
> 
> Yes, that would be much nicer, although less we would loose some of the
> type validation that lives in -mm, (along with a massive overhaul of the
> current kmap_atomic usage).
> 
> Hmm, if we give each explicit type an level and ensure the new push()'ed
> type's level <= the previous one, we'd still have the full nesting
> validation and such..
> 
> I'll look into doing this.

OK, utterly untested, but how does this look?

Not-Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/arch/x86/include/asm/kmap_types.h b/arch/x86/include/asm/kmap_types.h
index ff00a44..f866138 100644
--- a/arch/x86/include/asm/kmap_types.h
+++ b/arch/x86/include/asm/kmap_types.h
@@ -19,11 +19,12 @@ D(7)	KM_PTE0,
 D(8)	KM_PTE1,
 D(9)	KM_IRQ0,
 D(10)	KM_IRQ1,
-D(11)	KM_SOFTIRQ0,
-D(12)	KM_SOFTIRQ1,
-D(13)	KM_NMI,
-D(14)	KM_NMI_PTE,
-D(15)	KM_TYPE_NR
+D(11)	KM_IRQ_PTE,
+D(12)	KM_SOFTIRQ0,
+D(13)	KM_SOFTIRQ1,
+D(14)	KM_NMI,
+D(15)	KM_NMI_PTE,
+D(16)	KM_TYPE_NR
 };
 
 #undef D
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 8546497..d31930a 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -49,7 +49,20 @@ extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t);
 #endif
 
 #if defined(CONFIG_HIGHPTE)
-#define __KM_PTE	(in_nmi() ? KM_NMI_PTE : KM_PTE0)
+
+#ifdef CONFIG_DEBUG_VM
+/*
+ * for debug we need the right type so that we can validate context
+ * nesting
+ */
+#define __KM_PTE			\
+	(in_nmi() ? KM_NMI_PTE :	\
+	 in_irq() ? KM_IRQ_PTE :	\
+	 KM_PTE0)
+#else
+#define __KM_PTE	KM_PTE0
+#endif
+
 #define pte_offset_map(dir, address)					\
 	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), __KM_PTE) +		\
 	 pte_index((address)))
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 58f621e..62d15f7 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -19,6 +19,147 @@ void kunmap(struct page *page)
 	kunmap_high(page);
 }
 
+struct kmap_atomic_state {
+	int slot;
+#ifdef CONFIG_DEBUG_VM
+	int taken[KM_TYPE_NR];
+	int context;
+#endif
+};
+
+#ifdef CONFIG_DEBUG_VM
+enum km_context {
+	KM_CTX_USER,
+	KM_CTX_SOFTIRQ,
+	KM_CTX_IRQ,
+	KM_CTX_NMI,
+
+	KM_CTX_MAX
+};
+
+static int kmap_type_to_context(enum km_type type)
+{
+	switch (type) {
+	case KM_BOUNCE_READ:
+		return KM_CTX_USER;
+	case KM_SKB_SUNRPC_DATA:
+		return KM_CTX_USER;
+	case KM_SKB_DATA_SOFTIRQ:
+		return KM_CTX_SOFTIRQ;
+	case KM_USER0:
+		return KM_CTX_USER;
+	case KM_USER1:
+		return KM_CTX_USER;
+	case KM_BIO_SRC_IRQ:
+		return KM_CTX_IRQ;
+	case KM_BIO_DST_IRQ:
+		return KM_CTX_IRQ;
+	case KM_PTE0:
+		return KM_CTX_USER;
+	case KM_PTE1:
+		return KM_CTX_USER;
+	case KM_IRQ0:
+		return KM_CTX_IRQ;
+	case KM_IRQ1:
+		return KM_CTX_IRQ;
+	case KM_SOFTIRQ0:
+		return KM_CTX_SOFTIRQ;
+	case KM_SOFTIRQ1:
+		return KM_CTX_SOFTIRQ;
+	case KM_NMI:
+		return KM_CTX_NMI;
+	case KM_NMI_PTE:
+		return KM_CTX_NMI;
+	}
+
+	return KM_CTX_MAX;
+}
+
+static void
+kmap_atomic_push_debug(struct kmap_atomic_state *state, enum km_type type)
+{
+	int context = kmap_type_to_context(type);
+	int i;
+
+	for (i = 0; i < state->slot; i++)
+		WARN_ON_ONCE(state->taken[i] == type);
+
+	WARN_ON_ONCE(state->context > context);
+
+	if (context > state->context)
+		state->context = context;
+
+	switch (context) {
+	case KM_CTX_USER:
+		WARN_ON_ONCE(in_irq());
+		WARN_ON_ONCE(in_nmi());
+		break;
+
+	case KM_CTX_SOFTIRQ:
+		WARN_ON_ONCE(in_irq());
+		WARN_ON_ONCE(in_nmi());
+		break;
+
+	case KM_CTX_IRQ:
+		WARN_ON_ONCE(in_nmi());
+		break;
+
+	case KM_CTX_NMI:
+		break;
+
+	case KM_CTX_MAX:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
+
+static void
+kmap_atomic_pop_debug(struct kmap_atomic_state *state, enum km_type type)
+{
+	WARN_ON_ONCE(state->taken[state->slot] != type);
+
+	if (!state->slot) {
+		state->context = KM_CTX_USER;
+		return;
+	}
+
+	state->context = kmap_type_to_context(state->taken[state->slot - 1]);
+}
+
+#else /* !CONFIG_DEBUG_VM */
+static inline void
+kmap_atomic_push_debug(struct kmap_atomic_state *state, enum km_type type)
+{
+}
+
+static inline void
+kmap_atomic_pop_debug(struct kmap_atomic_state *state, enum km_type type)
+{
+}
+#endif
+
+static DEFINE_PER_CPU(struct kmap_atomic_state, kmap_state);
+
+static int kmap_atomic_push(enum km_type type)
+{
+	struct kmap_atomic_state *state = &per_cpu(kmap_state);
+
+	kmap_atomic_push_debug(state, type);
+
+	return state->slot++;
+}
+
+static int kmap_atomic_pop(enum km_type type)
+{
+	struct kmap_atomic_state *state = &per_cpu(kmap_state);
+
+	state->slot--;
+
+	kmap_atomic_pop_debug(state, type);
+
+	return state->slot;
+}
+
 /*
  * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because
  * no global lock is needed and because the kmap code must perform a global TLB
@@ -38,9 +179,7 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
 	if (!PageHighMem(page))
 		return page_address(page);
 
-	debug_kmap_atomic(type);
-
-	idx = type + KM_TYPE_NR*smp_processor_id();
+	idx = KM_TYPE_NR*smp_processor_id() + kmap_atomic_push(type);
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	BUG_ON(!pte_none(*(kmap_pte-idx)));
 	set_pte(kmap_pte-idx, mk_pte(page, prot));
@@ -56,8 +195,9 @@ void *kmap_atomic(struct page *page, enum km_type type)
 void kunmap_atomic(void *kvaddr, enum km_type type)
 {
 	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
+	enum fixed_addresses idx = KM_TYPE_NR*smp_processor_id();
 
+	idx += kmap_atomic_pop(type);
 	/*
 	 * Force other mappings to Oops if they'll try to access this pte
 	 * without first remap it.  Keeping stale mappings around is a bad idea
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 1fcb712..92123b9 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -21,18 +21,6 @@ static inline void flush_kernel_dcache_page(struct page *page)
 
 #include <asm/kmap_types.h>
 
-#if defined(CONFIG_DEBUG_HIGHMEM) && defined(CONFIG_TRACE_IRQFLAGS_SUPPORT)
-
-void debug_kmap_atomic(enum km_type type);
-
-#else
-
-static inline void debug_kmap_atomic(enum km_type type)
-{
-}
-
-#endif
-
 #ifdef CONFIG_HIGHMEM
 #include <asm/highmem.h>
 
diff --git a/mm/highmem.c b/mm/highmem.c
index 68eb1d9..9101980 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -422,48 +422,3 @@ void __init page_address_init(void)
 }
 
 #endif	/* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
-
-#if defined(CONFIG_DEBUG_HIGHMEM) && defined(CONFIG_TRACE_IRQFLAGS_SUPPORT)
-
-void debug_kmap_atomic(enum km_type type)
-{
-	static unsigned warn_count = 10;
-
-	if (unlikely(warn_count == 0))
-		return;
-
-	if (unlikely(in_interrupt())) {
-		if (in_irq()) {
-			if (type != KM_IRQ0 && type != KM_IRQ1 &&
-			    type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
-			    type != KM_BOUNCE_READ) {
-				WARN_ON(1);
-				warn_count--;
-			}
-		} else if (!irqs_disabled()) {	/* softirq */
-			if (type != KM_IRQ0 && type != KM_IRQ1 &&
-			    type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
-			    type != KM_SKB_SUNRPC_DATA &&
-			    type != KM_SKB_DATA_SOFTIRQ &&
-			    type != KM_BOUNCE_READ) {
-				WARN_ON(1);
-				warn_count--;
-			}
-		}
-	}
-
-	if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ ||
-			type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ) {
-		if (!irqs_disabled()) {
-			WARN_ON(1);
-			warn_count--;
-		}
-	} else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
-		if (irq_count() == 0 && !irqs_disabled()) {
-			WARN_ON(1);
-			warn_count--;
-		}
-	}
-}
-
-#endif


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:04       ` Peter Zijlstra
@ 2009-06-15 18:15         ` Ingo Molnar
  2009-06-15 18:19           ` Peter Zijlstra
  2009-06-15 18:42         ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-06-15 18:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> +static int kmap_type_to_context(enum km_type type)
> +{
> +	switch (type) {
> +	case KM_BOUNCE_READ:
> +		return KM_CTX_USER;
> +	case KM_SKB_SUNRPC_DATA:
> +		return KM_CTX_USER;
> +	case KM_SKB_DATA_SOFTIRQ:
> +		return KM_CTX_SOFTIRQ;
> +	case KM_USER0:
> +		return KM_CTX_USER;
> +	case KM_USER1:
> +		return KM_CTX_USER;
> +	case KM_BIO_SRC_IRQ:
> +		return KM_CTX_IRQ;
> +	case KM_BIO_DST_IRQ:
> +		return KM_CTX_IRQ;
> +	case KM_PTE0:
> +		return KM_CTX_USER;
> +	case KM_PTE1:
> +		return KM_CTX_USER;
> +	case KM_IRQ0:
> +		return KM_CTX_IRQ;
> +	case KM_IRQ1:
> +		return KM_CTX_IRQ;
> +	case KM_SOFTIRQ0:
> +		return KM_CTX_SOFTIRQ;
> +	case KM_SOFTIRQ1:
> +		return KM_CTX_SOFTIRQ;
> +	case KM_NMI:
> +		return KM_CTX_NMI;
> +	case KM_NMI_PTE:
> +		return KM_CTX_NMI;
> +	}
> +
> +	return KM_CTX_MAX;

why not do a very simple stack of atomic kmaps, like Hugh suggested?

That would mean a much simpler interface:

	kaddr = kmap_atomic(page);

no index needed. The kunmap pops the entry off the stack:

	kunmap_atomic(kaddr);

This becomes simpler too.

Now, a stack can be overflown by imbalance - but that's easy to 
detect and existing entries are easily printed and thus the source 
of the leak is easily identified.

In my book this design beats the current enumeration of kmap types 
indices hands down ... It would likely be much more robust as well, 
and much more easy to extend.

Am i missing any subtlety?

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:15         ` Ingo Molnar
@ 2009-06-15 18:19           ` Peter Zijlstra
  2009-06-15 18:25             ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton

On Mon, 2009-06-15 at 20:15 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > +static int kmap_type_to_context(enum km_type type)
> > +{
> > +	switch (type) {
> > +	case KM_BOUNCE_READ:
> > +		return KM_CTX_USER;
> > +	case KM_SKB_SUNRPC_DATA:
> > +		return KM_CTX_USER;
> > +	case KM_SKB_DATA_SOFTIRQ:
> > +		return KM_CTX_SOFTIRQ;
> > +	case KM_USER0:
> > +		return KM_CTX_USER;
> > +	case KM_USER1:
> > +		return KM_CTX_USER;
> > +	case KM_BIO_SRC_IRQ:
> > +		return KM_CTX_IRQ;
> > +	case KM_BIO_DST_IRQ:
> > +		return KM_CTX_IRQ;
> > +	case KM_PTE0:
> > +		return KM_CTX_USER;
> > +	case KM_PTE1:
> > +		return KM_CTX_USER;
> > +	case KM_IRQ0:
> > +		return KM_CTX_IRQ;
> > +	case KM_IRQ1:
> > +		return KM_CTX_IRQ;
> > +	case KM_SOFTIRQ0:
> > +		return KM_CTX_SOFTIRQ;
> > +	case KM_SOFTIRQ1:
> > +		return KM_CTX_SOFTIRQ;
> > +	case KM_NMI:
> > +		return KM_CTX_NMI;
> > +	case KM_NMI_PTE:
> > +		return KM_CTX_NMI;
> > +	}
> > +
> > +	return KM_CTX_MAX;
> 
> why not do a very simple stack of atomic kmaps, like Hugh suggested?
> 
> That would mean a much simpler interface:
> 
> 	kaddr = kmap_atomic(page);
> 
> no index needed. The kunmap pops the entry off the stack:
> 
> 	kunmap_atomic(kaddr);
> 
> This becomes simpler too.
> 
> Now, a stack can be overflown by imbalance - but that's easy to 
> detect and existing entries are easily printed and thus the source 
> of the leak is easily identified.
> 
> In my book this design beats the current enumeration of kmap types 
> indices hands down ... It would likely be much more robust as well, 
> and much more easy to extend.
> 
> Am i missing any subtlety?

The above is mostly debug code used to validate the kmap_atomic
conditions.

KM_CTX_NMI nests in KM_CTX_IRQ nests in KM_CTX_SOFTIRQ nests in
KM_CTX_USER.

And validate that we indeed are in the context specified by the type.
That is, it will warn if we use KM_IRQ1 with KM_CTX_IRQ from user
context.

Some of this was already captured in the old kmap debug code which I
removed.

But yes, I should write that nicer..

/me goes clean up

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:19           ` Peter Zijlstra
@ 2009-06-15 18:25             ` Ingo Molnar
  2009-06-15 18:30               ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-06-15 18:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2009-06-15 at 20:15 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > +static int kmap_type_to_context(enum km_type type)
> > > +{
> > > +	switch (type) {
> > > +	case KM_BOUNCE_READ:
> > > +		return KM_CTX_USER;
> > > +	case KM_SKB_SUNRPC_DATA:
> > > +		return KM_CTX_USER;
> > > +	case KM_SKB_DATA_SOFTIRQ:
> > > +		return KM_CTX_SOFTIRQ;
> > > +	case KM_USER0:
> > > +		return KM_CTX_USER;
> > > +	case KM_USER1:
> > > +		return KM_CTX_USER;
> > > +	case KM_BIO_SRC_IRQ:
> > > +		return KM_CTX_IRQ;
> > > +	case KM_BIO_DST_IRQ:
> > > +		return KM_CTX_IRQ;
> > > +	case KM_PTE0:
> > > +		return KM_CTX_USER;
> > > +	case KM_PTE1:
> > > +		return KM_CTX_USER;
> > > +	case KM_IRQ0:
> > > +		return KM_CTX_IRQ;
> > > +	case KM_IRQ1:
> > > +		return KM_CTX_IRQ;
> > > +	case KM_SOFTIRQ0:
> > > +		return KM_CTX_SOFTIRQ;
> > > +	case KM_SOFTIRQ1:
> > > +		return KM_CTX_SOFTIRQ;
> > > +	case KM_NMI:
> > > +		return KM_CTX_NMI;
> > > +	case KM_NMI_PTE:
> > > +		return KM_CTX_NMI;
> > > +	}
> > > +
> > > +	return KM_CTX_MAX;
> > 
> > why not do a very simple stack of atomic kmaps, like Hugh suggested?
> > 
> > That would mean a much simpler interface:
> > 
> > 	kaddr = kmap_atomic(page);
> > 
> > no index needed. The kunmap pops the entry off the stack:
> > 
> > 	kunmap_atomic(kaddr);
> > 
> > This becomes simpler too.
> > 
> > Now, a stack can be overflown by imbalance - but that's easy to 
> > detect and existing entries are easily printed and thus the source 
> > of the leak is easily identified.
> > 
> > In my book this design beats the current enumeration of kmap types 
> > indices hands down ... It would likely be much more robust as well, 
> > and much more easy to extend.
> > 
> > Am i missing any subtlety?
> 
> The above is mostly debug code used to validate the kmap_atomic
> conditions.
> 
> KM_CTX_NMI nests in KM_CTX_IRQ nests in KM_CTX_SOFTIRQ nests in
> KM_CTX_USER.
> 
> And validate that we indeed are in the context specified by the type.
> That is, it will warn if we use KM_IRQ1 with KM_CTX_IRQ from user
> context.
> 
> Some of this was already captured in the old kmap debug code which I
> removed.
> 
> But yes, I should write that nicer..

but ... look at the APIs i propose above. We dont need _any_ 
'types'.

That type enumeration is basically an open-coded allocator. If we do 
a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
any of those indices, and all the potential for mismatch goes away 
as well - a stack nests trivially with IRQ and NMI and arbitrary 
other contexts.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:25             ` Ingo Molnar
@ 2009-06-15 18:30               ` Peter Zijlstra
  2009-06-15 18:42                 ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 18:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton

On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:

> but ... look at the APIs i propose above. We dont need _any_ 
> 'types'.
> 
> That type enumeration is basically an open-coded allocator. If we do 
> a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> any of those indices, and all the potential for mismatch goes away 
> as well - a stack nests trivially with IRQ and NMI and arbitrary 
> other contexts.

You want types because:
 - they encode the intent, and can be verified
 - they help keep track of the max nesting depth

In the proposed implementation all type code basically falls away no !
CONFIG_DEBUG_VM, but is kept around for robustness.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:30               ` Peter Zijlstra
@ 2009-06-15 18:42                 ` Ingo Molnar
  2009-06-15 18:47                   ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-06-15 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > but ... look at the APIs i propose above. We dont need _any_ 
> > 'types'.
> > 
> > That type enumeration is basically an open-coded allocator. If we do 
> > a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> > any of those indices, and all the potential for mismatch goes away 
> > as well - a stack nests trivially with IRQ and NMI and arbitrary 
> > other contexts.
> 
> You want types because:
>  - they encode the intent, and can be verified
>  - they help keep track of the max nesting depth
> 
> In the proposed implementation all type code basically falls away 
> no ! CONFIG_DEBUG_VM, but is kept around for robustness.

But much of the fragility of the types (and their clumsiness - for 
example in highpte ops we have to know at which level of the 
pagetables we are, and use the right kind of index) is _precisely_ 
because we have the types ...

Unbalanced unmaps will be detected under CONFIG_DEBUG_VM: kunmap 
uses 'page' as a parameter which is checked against the pte entry - 
they must match.

I.e. it becomes a symmetric and expressive API:

	kaddr = kmap_atomic(page);
	...
	kunmap_atomic(page);

Hm?

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:04       ` Peter Zijlstra
  2009-06-15 18:15         ` Ingo Molnar
@ 2009-06-15 18:42         ` Andrew Morton
  2009-06-15 18:45           ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2009-06-15 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: hugh.dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, mingo, linux-tip-commits, torvalds, Randy Dunlap

On Mon, 15 Jun 2009 20:04:25 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> OK, utterly untested, but how does this look?

 arch/x86/include/asm/kmap_types.h |   11 +-
 arch/x86/include/asm/pgtable_32.h |   15 +++
 arch/x86/mm/highmem_32.c          |  148 ++++++++++++++++++++++++++++++++++++--
 include/linux/highmem.h           |   12 ---
 mm/highmem.c                      |   45 -----------
 5 files changed, 164 insertions(+), 67 deletions(-)

Did
http://userweb.kernel.org/~akpm/mmotm/broken-out/kmap_types-make-most-arches-use-generic-header-file.patch
just die a horrid death?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:42         ` Andrew Morton
@ 2009-06-15 18:45           ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 18:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hugh.dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, mingo, linux-tip-commits, torvalds, Randy Dunlap

On Mon, 2009-06-15 at 11:42 -0700, Andrew Morton wrote:
> On Mon, 15 Jun 2009 20:04:25 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > OK, utterly untested, but how does this look?
> 
>  arch/x86/include/asm/kmap_types.h |   11 +-
>  arch/x86/include/asm/pgtable_32.h |   15 +++
>  arch/x86/mm/highmem_32.c          |  148 ++++++++++++++++++++++++++++++++++++--
>  include/linux/highmem.h           |   12 ---
>  mm/highmem.c                      |   45 -----------
>  5 files changed, 164 insertions(+), 67 deletions(-)
> 
> Did
> http://userweb.kernel.org/~akpm/mmotm/broken-out/kmap_types-make-most-arches-use-generic-header-file.patch
> just die a horrid death?

hehe, yep, but I can fix that up..


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:42                 ` Ingo Molnar
@ 2009-06-15 18:47                   ` Peter Zijlstra
  2009-06-15 18:52                     ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 18:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton

On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > but ... look at the APIs i propose above. We dont need _any_ 
> > > 'types'.
> > > 
> > > That type enumeration is basically an open-coded allocator. If we do 
> > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> > > any of those indices, and all the potential for mismatch goes away 
> > > as well - a stack nests trivially with IRQ and NMI and arbitrary 
> > > other contexts.
> > 
> > You want types because:
> >  - they encode the intent, and can be verified
> >  - they help keep track of the max nesting depth
> > 
> > In the proposed implementation all type code basically falls away 
> > no ! CONFIG_DEBUG_VM, but is kept around for robustness.
> 
> But much of the fragility of the types (and their clumsiness - for 
> example in highpte ops we have to know at which level of the 
> pagetables we are, and use the right kind of index) is _precisely_ 
> because we have the types ...

How will you manage the max depth?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:47                   ` Peter Zijlstra
@ 2009-06-15 18:52                     ` Ingo Molnar
  2009-06-15 19:00                       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2009-06-15 18:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > but ... look at the APIs i propose above. We dont need _any_ 
> > > > 'types'.
> > > > 
> > > > That type enumeration is basically an open-coded allocator. If we do 
> > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> > > > any of those indices, and all the potential for mismatch goes away 
> > > > as well - a stack nests trivially with IRQ and NMI and arbitrary 
> > > > other contexts.
> > > 
> > > You want types because:
> > >  - they encode the intent, and can be verified
> > >  - they help keep track of the max nesting depth
> > > 
> > > In the proposed implementation all type code basically falls away 
> > > no ! CONFIG_DEBUG_VM, but is kept around for robustness.
> > 
> > But much of the fragility of the types (and their clumsiness - for 
> > example in highpte ops we have to know at which level of the 
> > pagetables we are, and use the right kind of index) is _precisely_ 
> > because we have the types ...
> 
> How will you manage the max depth?

	if (++depth == MAX_DEPTH) {
		print_all_entries_and_nasty_warning();
		 /* hope we'll live long enough for the syslog to touch disk */
		depth = 0;
	}

unbalanced kmap is a bad bug - the easier we make it to catch, the 
better. The system wouldnt survive anyway.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 18:52                     ` Ingo Molnar
@ 2009-06-15 19:00                       ` Peter Zijlstra
  2009-06-16  8:13                         ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-15 19:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton

On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > 
> > > > > but ... look at the APIs i propose above. We dont need _any_ 
> > > > > 'types'.
> > > > > 
> > > > > That type enumeration is basically an open-coded allocator. If we do 
> > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> > > > > any of those indices, and all the potential for mismatch goes away 
> > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary 
> > > > > other contexts.
> > > > 
> > > > You want types because:
> > > >  - they encode the intent, and can be verified
> > > >  - they help keep track of the max nesting depth
> > > > 
> > > > In the proposed implementation all type code basically falls away 
> > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness.
> > > 
> > > But much of the fragility of the types (and their clumsiness - for 
> > > example in highpte ops we have to know at which level of the 
> > > pagetables we are, and use the right kind of index) is _precisely_ 
> > > because we have the types ...
> > 
> > How will you manage the max depth?
> 
> 	if (++depth == MAX_DEPTH) {
> 		print_all_entries_and_nasty_warning();
> 		 /* hope we'll live long enough for the syslog to touch disk */
> 		depth = 0;
> 	}

That will only trigger if we hit it, which will be _very_ rare.

> unbalanced kmap is a bad bug - the easier we make it to catch, the 
> better. The system wouldnt survive anyway.

My proposed patch validates strict balance of types. But I can easily
add the above as well.

By removing the types it becomes very difficult to verify the max depth.
I really don't like removing them.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-15 19:00                       ` Peter Zijlstra
@ 2009-06-16  8:13                         ` Ingo Molnar
  2009-06-16 12:38                           ` Hugh Dickins
  2009-06-17  7:44                           ` Peter Zijlstra
  0 siblings, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2009-06-16  8:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote:
> > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > 
> > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > > > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > 
> > > > > > but ... look at the APIs i propose above. We dont need _any_ 
> > > > > > 'types'.
> > > > > > 
> > > > > > That type enumeration is basically an open-coded allocator. If we do 
> > > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> > > > > > any of those indices, and all the potential for mismatch goes away 
> > > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary 
> > > > > > other contexts.
> > > > > 
> > > > > You want types because:
> > > > >  - they encode the intent, and can be verified
> > > > >  - they help keep track of the max nesting depth
> > > > > 
> > > > > In the proposed implementation all type code basically falls away 
> > > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness.
> > > > 
> > > > But much of the fragility of the types (and their clumsiness - for 
> > > > example in highpte ops we have to know at which level of the 
> > > > pagetables we are, and use the right kind of index) is _precisely_ 
> > > > because we have the types ...
> > > 
> > > How will you manage the max depth?
> > 
> > 	if (++depth == MAX_DEPTH) {
> > 		print_all_entries_and_nasty_warning();
> > 		 /* hope we'll live long enough for the syslog to touch disk */
> > 		depth = 0;
> > 	}
> 
> That will only trigger if we hit it, which will be _very_ rare.
> 
> > unbalanced kmap is a bad bug - the easier we make it to catch, 
> > the better. The system wouldnt survive anyway.
> 
> My proposed patch validates strict balance of types. But I can 
> easily add the above as well.
> 
> By removing the types it becomes very difficult to verify the max 
> depth. I really don't like removing them.

The fact that it implies an atomic section pretty much limits its 
depth in practice, doesnt it?

All we need to track in the debug code is 
max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be 
smaller than the max - even if (as you are right to point out) we 
dont hit that magic combo that truly maximizes the depth.

And note that in practice many of the current types are exclusive to 
each other - so using the stack would _reduce_ the amount of 
kmap-atomic space we need.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-16  8:13                         ` Ingo Molnar
@ 2009-06-16 12:38                           ` Hugh Dickins
  2009-06-17  7:58                             ` Peter Zijlstra
  2009-06-17  7:44                           ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Hugh Dickins @ 2009-06-16 12:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton

On Tue, 16 Jun 2009, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2009-06-15 at 20:52 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > On Mon, 2009-06-15 at 20:42 +0200, Ingo Molnar wrote:
> > > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > > On Mon, 2009-06-15 at 20:25 +0200, Ingo Molnar wrote:
> > > > > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > > 
> > > > > > > but ... look at the APIs i propose above. We dont need _any_ 
> > > > > > > 'types'.
> > > > > > > 
> > > > > > > That type enumeration is basically an open-coded allocator. If we do 
> > > > > > > a _real_ allocator (a balanced stack of atomic kmaps) we dont need 
> > > > > > > any of those indices, and all the potential for mismatch goes away 
> > > > > > > as well - a stack nests trivially with IRQ and NMI and arbitrary 
> > > > > > > other contexts.
> > > > > > 
> > > > > > You want types because:
> > > > > >  - they encode the intent, and can be verified
> > > > > >  - they help keep track of the max nesting depth
> > > > > > 
> > > > > > In the proposed implementation all type code basically falls away 
> > > > > > no ! CONFIG_DEBUG_VM, but is kept around for robustness.
> > > > > 
> > > > > But much of the fragility of the types (and their clumsiness - for 
> > > > > example in highpte ops we have to know at which level of the 
> > > > > pagetables we are, and use the right kind of index) is _precisely_ 
> > > > > because we have the types ...
> > > > 
> > > > How will you manage the max depth?
> > > 
> > > 	if (++depth == MAX_DEPTH) {
> > > 		print_all_entries_and_nasty_warning();
> > > 		 /* hope we'll live long enough for the syslog to touch disk */
> > > 		depth = 0;
> > > 	}
> > 
> > That will only trigger if we hit it, which will be _very_ rare.
> > 
> > > unbalanced kmap is a bad bug - the easier we make it to catch, 
> > > the better. The system wouldnt survive anyway.
> > 
> > My proposed patch validates strict balance of types. But I can 
> > easily add the above as well.
> > 
> > By removing the types it becomes very difficult to verify the max 
> > depth. I really don't like removing them.
> 
> The fact that it implies an atomic section pretty much limits its 
> depth in practice, doesnt it?
> 
> All we need to track in the debug code is 
> max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be 
> smaller than the max - even if (as you are right to point out) we 
> dont hit that magic combo that truly maximizes the depth.
> 
> And note that in practice many of the current types are exclusive to 
> each other - so using the stack would _reduce_ the amount of 
> kmap-atomic space we need.

I'll briefly resurface into the discussion before submerging again ;)

I like very much the direction you're taking this, Ingo.

Yes, that is how I've sometimes thought we should go - though when
making the kmap_push/kmap_pop suggestion to Peter yesterday, I wasn't
expecting him to make that revolution, just provide a way to save a
current KM_type mapping and restore it later, so he can safely use
the standard primitives like pte_offset_map() within.

I wasn't expecting in_nmi() and in_irq() tests still to be there,
even if only when debug.  I can understand Peter's lockdep background
wanting to retain the checking and KM_types, but if we're actually
going to overhaul this area, I'd love just to get rid of them.

Yes, that should reduce the amount of kmap_atomic space needed;
though I've not thought how we keep track of the maximum needed
as the kernel goes on developing.

There might be a very few places where we expect to kmap_atomic A,
kmap_atomic B, kunmap_atomic A, kunmap_atomic B?

Something else to throw in: what if they were not just atomic,
but also replaced the current sleeping kmaps? i.e. a task context
carries around its own stack of these.

I've always rejected that as introducing a pretty terrible overhead
just where we don't want it; but maybe you're ingenious enough to
devise ways of amortizing that cost.

It would be nice to delete mm/highmem.c is we could.  Ah, but there
are probably places where one task passes a kmap address to another?

Hugh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-16  8:13                         ` Ingo Molnar
  2009-06-16 12:38                           ` Hugh Dickins
@ 2009-06-17  7:44                           ` Peter Zijlstra
  2009-06-17 12:28                             ` Ingo Molnar
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-17  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton

On Tue, 2009-06-16 at 10:13 +0200, Ingo Molnar wrote:

> > By removing the types it becomes very difficult to verify the max 
> > depth. I really don't like removing them.
> 
> The fact that it implies an atomic section pretty much limits its 
> depth in practice, doesnt it?
> 
> All we need to track in the debug code is 
> max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts must be 
> smaller than the max - even if (as you are right to point out) we 
> dont hit that magic combo that truly maximizes the depth.

Right, so the thing I'd worry about is someone adding kmap_atomic() to
an interrupt context that didn't have interrupts disabled and then
managing to nest that a few times.

Suppose you put it in some IO completion handler, and someone has 4 IO
controllers installed and all 4 IO interrupts come in at the 'same'
time.

With types you could warn on similarly to what we do today, but with the
simple push/pop that might be a lot harder.

Anyway, with the whole cr2 fiddling bit being discussed this seems to
become redundant.

> And note that in practice many of the current types are exclusive to 
> each other - so using the stack would _reduce_ the amount of 
> kmap-atomic space we need.

Yeah, I did realize that.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-16 12:38                           ` Hugh Dickins
@ 2009-06-17  7:58                             ` Peter Zijlstra
  2009-06-17  8:43                               ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-17  7:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Ingo Molnar, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton,
	Tejun Heo

On Tue, 2009-06-16 at 13:38 +0100, Hugh Dickins wrote:
> 
> Something else to throw in: what if they were not just atomic,
> but also replaced the current sleeping kmaps? i.e. a task context
> carries around its own stack of these.

I actually did that once, but it means the task needs to be cpu-affine,
because fixmaps have different addresses between cpus. And disabling
migration for tasks has subtle side-effects so I dropped that again.

However, I recently considered the possiblity of putting the fixmaps in
the new per-cpu address space so that we might use the %gs segment to
normalize the fixmap addresses between the cpus.

This would allow full preemptible kmaps (yay for -rt).

However I suspect it might greatly complicate kmaps for the !i386 world.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-17  7:58                             ` Peter Zijlstra
@ 2009-06-17  8:43                               ` Tejun Heo
  2009-06-17  9:05                                 ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2009-06-17  8:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, Ingo Molnar, linux-kernel, mingo, hpa, paulus,
	acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds,
	Andrew Morton

Hello,

Peter Zijlstra wrote:
> On Tue, 2009-06-16 at 13:38 +0100, Hugh Dickins wrote:
>> Something else to throw in: what if they were not just atomic,
>> but also replaced the current sleeping kmaps? i.e. a task context
>> carries around its own stack of these.
> 
> I actually did that once, but it means the task needs to be cpu-affine,
> because fixmaps have different addresses between cpus. And disabling
> migration for tasks has subtle side-effects so I dropped that again.
> 
> However, I recently considered the possiblity of putting the fixmaps in
> the new per-cpu address space so that we might use the %gs segment to
> normalize the fixmap addresses between the cpus.
> 
> This would allow full preemptible kmaps (yay for -rt).
> 
> However I suspect it might greatly complicate kmaps for the !i386 world.

Other archs are in the process of conversion so once that is complete,
there's no reason this should be more difficult but it means that
kmapped addresses should be accessed differently from regular ones
which we can't do.  :-(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-17  8:43                               ` Tejun Heo
@ 2009-06-17  9:05                                 ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2009-06-17  9:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Hugh Dickins, Ingo Molnar, linux-kernel, mingo, hpa, paulus,
	acme, efault, npiggin, tglx, linux-tip-commits, Linus Torvalds,
	Andrew Morton

On Wed, 2009-06-17 at 17:43 +0900, Tejun Heo wrote:
> Hello,
> 
> Peter Zijlstra wrote:
> > On Tue, 2009-06-16 at 13:38 +0100, Hugh Dickins wrote:
> >> Something else to throw in: what if they were not just atomic,
> >> but also replaced the current sleeping kmaps? i.e. a task context
> >> carries around its own stack of these.
> > 
> > I actually did that once, but it means the task needs to be cpu-affine,
> > because fixmaps have different addresses between cpus. And disabling
> > migration for tasks has subtle side-effects so I dropped that again.
> > 
> > However, I recently considered the possiblity of putting the fixmaps in
> > the new per-cpu address space so that we might use the %gs segment to
> > normalize the fixmap addresses between the cpus.
> > 
> > This would allow full preemptible kmaps (yay for -rt).
> > 
> > However I suspect it might greatly complicate kmaps for the !i386 world.
> 
> Other archs are in the process of conversion so once that is complete,
> there's no reason this should be more difficult but it means that
> kmapped addresses should be accessed differently from regular ones
> which we can't do.  :-(

Right, we'd need percpu_memset, percpu_copy_from, percpu_copy_to, etc.
that all operate on %gs like addrs. might be a tad invasive.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip:perfcounters/core] x86: Add NMI types for kmap_atomic
  2009-06-17  7:44                           ` Peter Zijlstra
@ 2009-06-17 12:28                             ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2009-06-17 12:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Hugh Dickins, linux-kernel, mingo, hpa, paulus, acme, efault,
	npiggin, tglx, linux-tip-commits, Linus Torvalds, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2009-06-16 at 10:13 +0200, Ingo Molnar wrote:
> 
> > > By removing the types it becomes very difficult to verify the max 
> > > depth. I really don't like removing them.
> > 
> > The fact that it implies an atomic section pretty much limits its 
> > depth in practice, doesnt it?
> > 
> > All we need to track in the debug code is 
> > max-{syscall,softirq,hardirq,nmi}. The sum of these 4 counts 
> > must be smaller than the max - even if (as you are right to 
> > point out) we dont hit that magic combo that truly maximizes the 
> > depth.
> 
> Right, so the thing I'd worry about is someone adding 
> kmap_atomic() to an interrupt context that didn't have interrupts 
> disabled and then managing to nest that a few times.
> 
> Suppose you put it in some IO completion handler, and someone has 
> 4 IO controllers installed and all 4 IO interrupts come in at the 
> 'same' time.
> 
> With types you could warn on similarly to what we do today, but 
> with the simple push/pop that might be a lot harder.

Yes, fixed-purpose allocations are easier to warn about - they imply 
more constraints, no doubt about that.

But we could warn about using kmap-atomic with in irq context with 
irqs enabled and thus exclude the case you are worried about?

> Anyway, with the whole cr2 fiddling bit being discussed this seems 
> to become redundant.

It's not just the cr2 fiddling but also conversion of pagefault 
returns from IRET to RET. The kmap_atomic API change is a nice 
cleanup in itself.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-06-17 12:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <tip-3ff0141aa3a03ca3388b40b36167d0a37919f3fd@git.kernel.org>
2009-06-15 14:46 ` [tip:perfcounters/core] x86: Add NMI types for kmap_atomic Peter Zijlstra
2009-06-15 15:30   ` Hugh Dickins
2009-06-15 15:41     ` Peter Zijlstra
2009-06-15 15:52       ` Ingo Molnar
2009-06-15 16:02         ` Hugh Dickins
2009-06-15 18:04       ` Peter Zijlstra
2009-06-15 18:15         ` Ingo Molnar
2009-06-15 18:19           ` Peter Zijlstra
2009-06-15 18:25             ` Ingo Molnar
2009-06-15 18:30               ` Peter Zijlstra
2009-06-15 18:42                 ` Ingo Molnar
2009-06-15 18:47                   ` Peter Zijlstra
2009-06-15 18:52                     ` Ingo Molnar
2009-06-15 19:00                       ` Peter Zijlstra
2009-06-16  8:13                         ` Ingo Molnar
2009-06-16 12:38                           ` Hugh Dickins
2009-06-17  7:58                             ` Peter Zijlstra
2009-06-17  8:43                               ` Tejun Heo
2009-06-17  9:05                                 ` Peter Zijlstra
2009-06-17  7:44                           ` Peter Zijlstra
2009-06-17 12:28                             ` Ingo Molnar
2009-06-15 18:42         ` Andrew Morton
2009-06-15 18:45           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.