linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: create a separate slab for page->ptl allocation
@ 2013-10-22 11:53 Kirill A. Shutemov
  2013-10-22 12:55 ` Fengguang Wu
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-10-22 11:53 UTC (permalink / raw)
  To: Andrew Morton, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-mm, linux-arch, Kirill A. Shutemov

If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
so we loose 24 on each. An average system can easily allocate few tens
thousands of page->ptl and overhead is significant.

Let's create a separate slab for page->ptl allocation to solve this.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm.h |  8 ++++++++
 init/main.c        |  2 +-
 mm/memory.c        | 12 ++++++++++--
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9a4a873b2f..2de5da0a41 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1233,6 +1233,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
 #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
 
 #if USE_SPLIT_PTE_PTLOCKS
+void __init ptlock_cache_init(void);
 bool __ptlock_alloc(struct page *page);
 void __ptlock_free(struct page *page);
 static inline bool ptlock_alloc(struct page *page)
@@ -1285,6 +1286,7 @@ static inline void pte_lock_deinit(struct page *page)
 }
 
 #else	/* !USE_SPLIT_PTE_PTLOCKS */
+static inline void ptlock_cache_init(void) {}
 /*
  * We use mm->page_table_lock to guard all pagetable pages of the mm.
  */
@@ -1296,6 +1298,12 @@ static inline bool ptlock_init(struct page *page) { return true; }
 static inline void pte_lock_deinit(struct page *page) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
+static inline void pgtable_init(void)
+{
+	ptlock_cache_init();
+	pgtable_cache_init();
+}
+
 static inline bool pgtable_page_ctor(struct page *page)
 {
 	inc_zone_page_state(page, NR_PAGETABLE);
diff --git a/init/main.c b/init/main.c
index af310afbef..c71b505392 100644
--- a/init/main.c
+++ b/init/main.c
@@ -466,7 +466,7 @@ static void __init mm_init(void)
 	mem_init();
 	kmem_cache_init();
 	percpu_init_late();
-	pgtable_cache_init();
+	pgtable_init();
 	vmalloc_init();
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 7e11f745bc..d7e583e270 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4332,11 +4332,19 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
 #if USE_SPLIT_PTE_PTLOCKS
+struct kmem_cache *page_ptl_cachep;
+void __init ptlock_cache_init(void)
+{
+	if (sizeof(spinlock_t) > sizeof(long))
+		page_ptl_cachep = kmem_cache_create("page->ptl",
+				sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
+}
+
 bool __ptlock_alloc(struct page *page)
 {
 	spinlock_t *ptl;
 
-	ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+	ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
 	if (!ptl)
 		return false;
 	page->ptl = (unsigned long)ptl;
@@ -4346,6 +4354,6 @@ bool __ptlock_alloc(struct page *page)
 void __ptlock_free(struct page *page)
 {
 	if (sizeof(spinlock_t) > sizeof(page->ptl))
-		kfree((spinlock_t *)page->ptl);
+		kmem_cache_free(page_ptl_cachep, (spinlock_t *)page->ptl);
 }
 #endif
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
@ 2013-10-22 12:55 ` Fengguang Wu
  2013-11-04 10:42 ` Kirill A. Shutemov
  2013-11-05 23:01 ` Andrew Morton
  2 siblings, 0 replies; 17+ messages in thread
From: Fengguang Wu @ 2013-10-22 12:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Tue, Oct 22, 2013 at 02:53:59PM +0300, Kirill A. Shutemov wrote:
> If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> so we loose 24 on each. An average system can easily allocate few tens
> thousands of page->ptl and overhead is significant.
> 
> Let's create a separate slab for page->ptl allocation to solve this.

Tested-by: Fengguang Wu <fengguang.wu@intel.com>

In a 4p server, we noticed up to +469.1% increase in will-it-scale page_fault3
test case and +199.8% in vm-scalability case-shm-pread-seq-mt.

    5c02216ce3110aab070d      5a58baaa0a1af0a43d7c
------------------------  ------------------------  
               300409.00      +440.2%   1622770.80  TOTAL will-it-scale.page_fault3.90.threads

    5c02216ce3110aab070d      5a58baaa0a1af0a43d7c
------------------------  ------------------------  
               291257.80      +469.1%   1657582.20  TOTAL will-it-scale.page_fault3.120.threads

...

    5c02216ce3110aab070d      5a58baaa0a1af0a43d7c
------------------------  ------------------------  
              4034831.40      +199.8%  12095649.80  TOTAL vm-scalability.throughput

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
  2013-10-22 12:55 ` Fengguang Wu
@ 2013-11-04 10:42 ` Kirill A. Shutemov
  2013-11-05 23:01 ` Andrew Morton
  2 siblings, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-04 10:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linux-mm, linux-arch,
	Kirill A. Shutemov

Kirill A. Shutemov wrote:
> If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> so we loose 24 on each. An average system can easily allocate few tens
> thousands of page->ptl and overhead is significant.
> 
> Let's create a separate slab for page->ptl allocation to solve this.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

ping?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-05 23:01 ` Andrew Morton
@ 2013-11-05 22:42   ` Kirill A. Shutemov
  2013-11-05 23:56     ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-05 22:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch


[ sorry, resend to all ]

On Tue, Nov 05, 2013 at 03:01:45PM -0800, Andrew Morton wrote:
> On Tue, 22 Oct 2013 14:53:59 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> > is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> > so we loose 24 on each. An average system can easily allocate few tens
> > thousands of page->ptl and overhead is significant.
> > 
> > Let's create a separate slab for page->ptl allocation to solve this.
> > 
> > ...
> >
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4332,11 +4332,19 @@ void copy_user_huge_page(struct page *dst, struct page *src,
> >  #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
> >  
> >  #if USE_SPLIT_PTE_PTLOCKS
> > +struct kmem_cache *page_ptl_cachep;
> > +void __init ptlock_cache_init(void)
> > +{
> > +	if (sizeof(spinlock_t) > sizeof(long))
> > +		page_ptl_cachep = kmem_cache_create("page->ptl",
> > +				sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > +}
> 
> Confused.  If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> then the kernel will later crash.  It would be better to use BUILD_BUG_ON()
> here, if that works.  Otherwise BUG_ON.

if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
never called in this case.

> Also, we have the somewhat silly KMEM_CACHE() macro, but it looks
> inapplicable here?

The first argument of KMEM_CACHE() is struct name, but we have typedef
here.

> >  bool __ptlock_alloc(struct page *page)
> >  {
> >  	spinlock_t *ptl;
> >  
> > -	ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
> > +	ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
> >  	if (!ptl)
> >  		return false;
> >  	page->ptl = (unsigned long)ptl;
> > @@ -4346,6 +4354,6 @@ bool __ptlock_alloc(struct page *page)
> >  void __ptlock_free(struct page *page)
> >  {
> >  	if (sizeof(spinlock_t) > sizeof(page->ptl))
> > -		kfree((spinlock_t *)page->ptl);
> > +		kmem_cache_free(page_ptl_cachep, (spinlock_t *)page->ptl);
> 
> A void* cast would suffice here, but I suppose the spinlock_t* cast has
> some documentation value.

Right.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
  2013-10-22 12:55 ` Fengguang Wu
  2013-11-04 10:42 ` Kirill A. Shutemov
@ 2013-11-05 23:01 ` Andrew Morton
  2013-11-05 22:42   ` Kirill A. Shutemov
  2 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2013-11-05 23:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linux-mm, linux-arch

On Tue, 22 Oct 2013 14:53:59 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> so we loose 24 on each. An average system can easily allocate few tens
> thousands of page->ptl and overhead is significant.
> 
> Let's create a separate slab for page->ptl allocation to solve this.
> 
> ...
>
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4332,11 +4332,19 @@ void copy_user_huge_page(struct page *dst, struct page *src,
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> +struct kmem_cache *page_ptl_cachep;
> +void __init ptlock_cache_init(void)
> +{
> +	if (sizeof(spinlock_t) > sizeof(long))
> +		page_ptl_cachep = kmem_cache_create("page->ptl",
> +				sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> +}

Confused.  If (sizeof(spinlock_t) > sizeof(long)) happens to be false
then the kernel will later crash.  It would be better to use BUILD_BUG_ON()
here, if that works.  Otherwise BUG_ON.

Also, we have the somewhat silly KMEM_CACHE() macro, but it looks
inapplicable here?

>  bool __ptlock_alloc(struct page *page)
>  {
>  	spinlock_t *ptl;
>  
> -	ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
> +	ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
>  	if (!ptl)
>  		return false;
>  	page->ptl = (unsigned long)ptl;
> @@ -4346,6 +4354,6 @@ bool __ptlock_alloc(struct page *page)
>  void __ptlock_free(struct page *page)
>  {
>  	if (sizeof(spinlock_t) > sizeof(page->ptl))
> -		kfree((spinlock_t *)page->ptl);
> +		kmem_cache_free(page_ptl_cachep, (spinlock_t *)page->ptl);

A void* cast would suffice here, but I suppose the spinlock_t* cast has
some documentation value.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-05 23:56     ` Andrew Morton
@ 2013-11-05 23:13       ` Kirill A. Shutemov
  2013-11-06  0:43         ` Andrew Morton
                           ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-05 23:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Tue, Nov 05, 2013 at 03:56:19PM -0800, Andrew Morton wrote:
> On Wed, 6 Nov 2013 00:42:17 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > > >  #if USE_SPLIT_PTE_PTLOCKS
> > > > +struct kmem_cache *page_ptl_cachep;
> > > > +void __init ptlock_cache_init(void)
> > > > +{
> > > > +	if (sizeof(spinlock_t) > sizeof(long))
> > > > +		page_ptl_cachep = kmem_cache_create("page->ptl",
> > > > +				sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > > > +}
> > > 
> > > Confused.  If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> > > then the kernel will later crash.  It would be better to use BUILD_BUG_ON()
> > > here, if that works.  Otherwise BUG_ON.
> > 
> > if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
> > allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
> > never called in this case.
> 
> OK.  Please add a comment explaining this so the next reader doesn't get
> tripped up like I was.

Okay, I will tomorrow.

> Really the function shouldn't exist in this case.  It is __init so the
> sin is not terrible, but can this be arranged?

I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
don't see a way within C: we need to know sizeof(spinlock_t) on
preprocessor stage.

We can have a hack on kbuild level: write small helper program to find out
sizeof(spinlock_t) before start building and turn it into define.
But it's overkill from my POV. And cross-compilation will be a fun.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-05 22:42   ` Kirill A. Shutemov
@ 2013-11-05 23:56     ` Andrew Morton
  2013-11-05 23:13       ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2013-11-05 23:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Wed, 6 Nov 2013 00:42:17 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> > >  #if USE_SPLIT_PTE_PTLOCKS
> > > +struct kmem_cache *page_ptl_cachep;
> > > +void __init ptlock_cache_init(void)
> > > +{
> > > +	if (sizeof(spinlock_t) > sizeof(long))
> > > +		page_ptl_cachep = kmem_cache_create("page->ptl",
> > > +				sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > > +}
> > 
> > Confused.  If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> > then the kernel will later crash.  It would be better to use BUILD_BUG_ON()
> > here, if that works.  Otherwise BUG_ON.
> 
> if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
> allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
> never called in this case.

OK.  Please add a comment explaining this so the next reader doesn't get
tripped up like I was.

Really the function shouldn't exist in this case.  It is __init so the
sin is not terrible, but can this be arranged?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-05 23:13       ` Kirill A. Shutemov
@ 2013-11-06  0:43         ` Andrew Morton
  2013-11-06  9:31         ` Peter Zijlstra
  2013-11-06 10:34         ` Will Deacon
  2 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2013-11-06  0:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Wed, 6 Nov 2013 01:13:11 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> > Really the function shouldn't exist in this case.  It is __init so the
> > sin is not terrible, but can this be arranged?
> 
> I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> don't see a way within C: we need to know sizeof(spinlock_t) on
> preprocessor stage.
> 
> We can have a hack on kbuild level: write small helper program to find out
> sizeof(spinlock_t) before start building and turn it into define.
> But it's overkill from my POV. And cross-compilation will be a fun.

Yes, it doesn't seem worth the fuss.  The compiler will remove all this
code anyway, so for example ptlock_cache_init() becomes an empty function.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-05 23:13       ` Kirill A. Shutemov
  2013-11-06  0:43         ` Andrew Morton
@ 2013-11-06  9:31         ` Peter Zijlstra
  2013-11-06 11:18           ` Peter Zijlstra
  2013-11-06 13:21           ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
  2013-11-06 10:34         ` Will Deacon
  2 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06  9:31 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Wed, Nov 06, 2013 at 01:13:11AM +0200, Kirill A. Shutemov wrote:
> I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> don't see a way within C: we need to know sizeof(spinlock_t) on
> preprocessor stage.
> 
> We can have a hack on kbuild level: write small helper program to find out
> sizeof(spinlock_t) before start building and turn it into define.
> But it's overkill from my POV. And cross-compilation will be a fun.

Ah, I just remembered, we have such a thing!

---
Subject: mm: Properly separate the bloated ptl from the regular case

Use kernel/bounds.c to convert build-time spinlock_t size into a
preprocessor symbol and apply that to properly separate the page::ptl
situation.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/mm.h       | 24 +++++++++++++-----------
 include/linux/mm_types.h |  9 +++++----
 kernel/bounds.c          |  2 ++
 mm/memory.c              | 11 +++++------
 4 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d0339741b6ce..6ab26704671b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1317,27 +1317,29 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
 #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
 
 #if USE_SPLIT_PTE_PTLOCKS
-bool __ptlock_alloc(struct page *page);
-void __ptlock_free(struct page *page);
+#if BLOATED_SPINLOCKS
+extern bool ptlock_alloc(struct page *page);
+extern void ptlock_free(struct page *page);
+
+static inline spinlock_t *ptlock_ptr(struct page *page)
+{
+	return page->ptl;
+}
+#else /* BLOATED_SPINLOCKS */
 static inline bool ptlock_alloc(struct page *page)
 {
-	if (sizeof(spinlock_t) > sizeof(page->ptl))
-		return __ptlock_alloc(page);
 	return true;
 }
+
 static inline void ptlock_free(struct page *page)
 {
-	if (sizeof(spinlock_t) > sizeof(page->ptl))
-		__ptlock_free(page);
 }
 
 static inline spinlock_t *ptlock_ptr(struct page *page)
 {
-	if (sizeof(spinlock_t) > sizeof(page->ptl))
-		return (spinlock_t *) page->ptl;
-	else
-		return (spinlock_t *) &page->ptl;
+	return &page->ptl;
 }
+#endif /* BLOATED_SPINLOCKS */
 
 static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
@@ -1354,7 +1356,7 @@ static inline bool ptlock_init(struct page *page)
 	 * slab code uses page->slab_cache and page->first_page (for tail
 	 * pages), which share storage with page->ptl.
 	 */
-	VM_BUG_ON(page->ptl);
+	VM_BUG_ON(*(unsigned long *)&page->ptl);
 	if (!ptlock_alloc(page))
 		return false;
 	spin_lock_init(ptlock_ptr(page));
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5bee515c4505..f706743b63bb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -155,10 +155,11 @@ struct page {
 						 * system if PG_buddy is set.
 						 */
 #if USE_SPLIT_PTE_PTLOCKS
-		unsigned long ptl; /* It's spinlock_t if it fits to long,
-				    * otherwise it's pointer to dynamicaly
-				    * allocated spinlock_t.
-				    */
+#if BLOATED_SPINLOCKS
+		spinlock_t *ptl;
+#else
+		spinlock_t ptl;
+#endif
 #endif
 		struct kmem_cache *slab_cache;	/* SL[AU]B: Pointer to slab */
 		struct page *first_page;	/* Compound tail pages */
diff --git a/kernel/bounds.c b/kernel/bounds.c
index e8ca97b5c386..5982437eca2c 100644
--- a/kernel/bounds.c
+++ b/kernel/bounds.c
@@ -11,6 +11,7 @@
 #include <linux/kbuild.h>
 #include <linux/page_cgroup.h>
 #include <linux/log2.h>
+#include <linux/spinlock.h>
 
 void foo(void)
 {
@@ -21,5 +22,6 @@ void foo(void)
 #ifdef CONFIG_SMP
 	DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
 #endif
+	DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int));
 	/* End of constants */
 }
diff --git a/mm/memory.c b/mm/memory.c
index 6f7bdee617e2..8356eac27d0a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4271,21 +4271,20 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
-#if USE_SPLIT_PTE_PTLOCKS
-bool __ptlock_alloc(struct page *page)
+#if USE_SPLIT_PTE_PTLOCKS && BLOATED_SPINLOCKS
+bool ptlock_alloc(struct page *page)
 {
 	spinlock_t *ptl;
 
 	ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
 	if (!ptl)
 		return false;
-	page->ptl = (unsigned long)ptl;
+	page->ptl = ptl;
 	return true;
 }
 
-void __ptlock_free(struct page *page)
+void ptlock_free(struct page *page)
 {
-	if (sizeof(spinlock_t) > sizeof(page->ptl))
-		kfree((spinlock_t *)page->ptl);
+	kfree(page->ptl);
 }
 #endif

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-05 23:13       ` Kirill A. Shutemov
  2013-11-06  0:43         ` Andrew Morton
  2013-11-06  9:31         ` Peter Zijlstra
@ 2013-11-06 10:34         ` Will Deacon
  2013-11-06 10:49           ` Geert Uytterhoeven
  2013-11-06 11:02           ` Peter Zijlstra
  2 siblings, 2 replies; 17+ messages in thread
From: Will Deacon @ 2013-11-06 10:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar,
	linux-kernel, linux-mm, linux-arch

On Tue, Nov 05, 2013 at 11:13:11PM +0000, Kirill A. Shutemov wrote:
> On Tue, Nov 05, 2013 at 03:56:19PM -0800, Andrew Morton wrote:
> > On Wed, 6 Nov 2013 00:42:17 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > 
> > > > >  #if USE_SPLIT_PTE_PTLOCKS
> > > > > +struct kmem_cache *page_ptl_cachep;
> > > > > +void __init ptlock_cache_init(void)
> > > > > +{
> > > > > +	if (sizeof(spinlock_t) > sizeof(long))
> > > > > +		page_ptl_cachep = kmem_cache_create("page->ptl",
> > > > > +				sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > > > > +}
> > > > 
> > > > Confused.  If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> > > > then the kernel will later crash.  It would be better to use BUILD_BUG_ON()
> > > > here, if that works.  Otherwise BUG_ON.
> > > 
> > > if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
> > > allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
> > > never called in this case.
> > 
> > OK.  Please add a comment explaining this so the next reader doesn't get
> > tripped up like I was.
> 
> Okay, I will tomorrow.
> 
> > Really the function shouldn't exist in this case.  It is __init so the
> > sin is not terrible, but can this be arranged?
> 
> I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> don't see a way within C: we need to know sizeof(spinlock_t) on
> preprocessor stage.

FWIW: if the architecture selects ARCH_USE_CMPXCHG_LOCKREF, then a spinlock_t
is 32-bit (assuming that unsigned int is also 32-bit).

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-06 10:34         ` Will Deacon
@ 2013-11-06 10:49           ` Geert Uytterhoeven
  2013-11-06 11:02           ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Geert Uytterhoeven @ 2013-11-06 10:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kirill A. Shutemov, Andrew Morton, Kirill A. Shutemov,
	Peter Zijlstra, Ingo Molnar, linux-kernel, linux-mm, linux-arch

On Wed, Nov 6, 2013 at 11:34 AM, Will Deacon <will.deacon@arm.com> wrote:
> FWIW: if the architecture selects ARCH_USE_CMPXCHG_LOCKREF, then a spinlock_t
> is 32-bit (assuming that unsigned int is also 32-bit).

Linux already assumes (unsigned) int is 32-bit.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-06 10:34         ` Will Deacon
  2013-11-06 10:49           ` Geert Uytterhoeven
@ 2013-11-06 11:02           ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 11:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Kirill A. Shutemov, Andrew Morton, Kirill A. Shutemov,
	Ingo Molnar, linux-kernel, linux-mm, linux-arch

On Wed, Nov 06, 2013 at 10:34:03AM +0000, Will Deacon wrote:
> FWIW: if the architecture selects ARCH_USE_CMPXCHG_LOCKREF, then a spinlock_t
> is 32-bit (assuming that unsigned int is also 32-bit).

Egads, talk about fragile. That thing relies on someone actually keeping
lib/Kconfig up-to-date.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-06  9:31         ` Peter Zijlstra
@ 2013-11-06 11:18           ` Peter Zijlstra
  2013-11-06 13:31             ` lockref: Use bloated_spinlocks to avoid explicit config dependencies Kirill A. Shutemov
  2013-11-06 13:21           ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 11:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> Subject: mm: Properly separate the bloated ptl from the regular case
> 
> Use kernel/bounds.c to convert build-time spinlock_t size into a
> preprocessor symbol and apply that to properly separate the page::ptl
> situation.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  include/linux/mm.h       | 24 +++++++++++++-----------
>  include/linux/mm_types.h |  9 +++++----
>  kernel/bounds.c          |  2 ++
>  mm/memory.c              | 11 +++++------
>  4 files changed, 25 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/bounds.c b/kernel/bounds.c
> index e8ca97b5c386..5982437eca2c 100644
> --- a/kernel/bounds.c
> +++ b/kernel/bounds.c
> @@ -11,6 +11,7 @@
>  #include <linux/kbuild.h>
>  #include <linux/page_cgroup.h>
>  #include <linux/log2.h>
> +#include <linux/spinlock.h>
>  
>  void foo(void)
>  {
> @@ -21,5 +22,6 @@ void foo(void)
>  #ifdef CONFIG_SMP
>  	DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
>  #endif
> +	DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int));
>  	/* End of constants */
>  }

Using that we could also do.. not been near a compiler.

---
Subject: lockref: Use bloated_spinlocks to avoid explicit config dependencies

Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
friendly compile-time test to determine this.

Not-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 lib/Kconfig   | 3 ---
 lib/lockref.c | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index b3c8be0da17f..254af289d1d0 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -54,9 +54,6 @@ config ARCH_USE_CMPXCHG_LOCKREF
 config CMPXCHG_LOCKREF
 	def_bool y if ARCH_USE_CMPXCHG_LOCKREF
 	depends on SMP
-	depends on !GENERIC_LOCKBREAK
-	depends on !DEBUG_SPINLOCK
-	depends on !DEBUG_LOCK_ALLOC
 
 config CRC_CCITT
 	tristate "CRC-CCITT functions"
diff --git a/lib/lockref.c b/lib/lockref.c
index 6f9d434c1521..a158fd86aa1a 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -1,7 +1,7 @@
 #include <linux/export.h>
 #include <linux/lockref.h>
 
-#ifdef CONFIG_CMPXCHG_LOCKREF
+#if defined(CONFIG_CMPXCHG_LOCKREF) && !BLOATED_SPINLOCKS
 
 /*
  * Allow weakly-ordered memory architectures to provide barrier-less

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-06  9:31         ` Peter Zijlstra
  2013-11-06 11:18           ` Peter Zijlstra
@ 2013-11-06 13:21           ` Kirill A. Shutemov
  2013-11-06 14:30             ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-06 13:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 06, 2013 at 01:13:11AM +0200, Kirill A. Shutemov wrote:
> > I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> > don't see a way within C: we need to know sizeof(spinlock_t) on
> > preprocessor stage.
> > 
> > We can have a hack on kbuild level: write small helper program to find out
> > sizeof(spinlock_t) before start building and turn it into define.
> > But it's overkill from my POV. And cross-compilation will be a fun.
> 
> Ah, I just remembered, we have such a thing!

Great!

> @@ -1354,7 +1356,7 @@ static inline bool ptlock_init(struct page *page)
>  	 * slab code uses page->slab_cache and page->first_page (for tail
>  	 * pages), which share storage with page->ptl.
>  	 */
> -	VM_BUG_ON(page->ptl);
> +	VM_BUG_ON(*(unsigned long *)&page->ptl);

Huh? Why not direct cast to unsigned long?

VM_BUG_ON((unsigned long)page->ptl);

Otherwise:

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>


-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* lockref: Use bloated_spinlocks to avoid explicit config dependencies
  2013-11-06 11:18           ` Peter Zijlstra
@ 2013-11-06 13:31             ` Kirill A. Shutemov
  2013-11-06 14:32               ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-06 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch, Linus Torvalds

On Wed, Nov 06, 2013 at 12:18:45PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> > Subject: mm: Properly separate the bloated ptl from the regular case
> > 
> > Use kernel/bounds.c to convert build-time spinlock_t size into a
> > preprocessor symbol and apply that to properly separate the page::ptl
> > situation.
> > 
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > ---
> >  include/linux/mm.h       | 24 +++++++++++++-----------
> >  include/linux/mm_types.h |  9 +++++----
> >  kernel/bounds.c          |  2 ++
> >  mm/memory.c              | 11 +++++------
> >  4 files changed, 25 insertions(+), 21 deletions(-)
> > 
> > diff --git a/kernel/bounds.c b/kernel/bounds.c
> > index e8ca97b5c386..5982437eca2c 100644
> > --- a/kernel/bounds.c
> > +++ b/kernel/bounds.c
> > @@ -11,6 +11,7 @@
> >  #include <linux/kbuild.h>
> >  #include <linux/page_cgroup.h>
> >  #include <linux/log2.h>
> > +#include <linux/spinlock.h>
> >  
> >  void foo(void)
> >  {
> > @@ -21,5 +22,6 @@ void foo(void)
> >  #ifdef CONFIG_SMP
> >  	DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
> >  #endif
> > +	DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int));
> >  	/* End of constants */
> >  }
> 
> Using that we could also do.. not been near a compiler.
> 

[ Subject adjusted, CC: +Linus ]
> ---
> Subject: lockref: Use bloated_spinlocks to avoid explicit config dependencies
> 
> Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
> friendly compile-time test to determine this.
> 
> Not-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  lib/Kconfig   | 3 ---
>  lib/lockref.c | 2 +-
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/lib/Kconfig b/lib/Kconfig
> index b3c8be0da17f..254af289d1d0 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -54,9 +54,6 @@ config ARCH_USE_CMPXCHG_LOCKREF
>  config CMPXCHG_LOCKREF
>  	def_bool y if ARCH_USE_CMPXCHG_LOCKREF
>  	depends on SMP
> -	depends on !GENERIC_LOCKBREAK
> -	depends on !DEBUG_SPINLOCK
> -	depends on !DEBUG_LOCK_ALLOC
>  
>  config CRC_CCITT
>  	tristate "CRC-CCITT functions"
> diff --git a/lib/lockref.c b/lib/lockref.c
> index 6f9d434c1521..a158fd86aa1a 100644
> --- a/lib/lockref.c
> +++ b/lib/lockref.c
> @@ -1,7 +1,7 @@
>  #include <linux/export.h>
>  #include <linux/lockref.h>
>  
> -#ifdef CONFIG_CMPXCHG_LOCKREF
> +#if defined(CONFIG_CMPXCHG_LOCKREF) && !BLOATED_SPINLOCKS

Having CONFIG_CMPXCHG_LOCKREF=y, but not really using it could be
misleading.
Should we get rid of CONFIG_CMPXCHG_LOCKREF completely and have here:

#if defined(CONFIG_ARCH_USE_CMPXCHG_LOCKREF) && \
	defined(CONFIG_SMP) && !BLOATED_SPINLOCKS

?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: create a separate slab for page->ptl allocation
  2013-11-06 13:21           ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
@ 2013-11-06 14:30             ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 14:30 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch

On Wed, Nov 06, 2013 at 03:21:55PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 06, 2013 at 01:13:11AM +0200, Kirill A. Shutemov wrote:
> > > I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> > > don't see a way within C: we need to know sizeof(spinlock_t) on
> > > preprocessor stage.
> > > 
> > > We can have a hack on kbuild level: write small helper program to find out
> > > sizeof(spinlock_t) before start building and turn it into define.
> > > But it's overkill from my POV. And cross-compilation will be a fun.
> > 
> > Ah, I just remembered, we have such a thing!
> 
> Great!
> 
> > @@ -1354,7 +1356,7 @@ static inline bool ptlock_init(struct page *page)
> >  	 * slab code uses page->slab_cache and page->first_page (for tail
> >  	 * pages), which share storage with page->ptl.
> >  	 */
> > -	VM_BUG_ON(page->ptl);
> > +	VM_BUG_ON(*(unsigned long *)&page->ptl);
> 
> Huh? Why not direct cast to unsigned long?
> 
> VM_BUG_ON((unsigned long)page->ptl);

I tried, GCC didn't dig that. I think because spinlock_t is a composite
type and you cannot cast that to a primitive type.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: lockref: Use bloated_spinlocks to avoid explicit config dependencies
  2013-11-06 13:31             ` lockref: Use bloated_spinlocks to avoid explicit config dependencies Kirill A. Shutemov
@ 2013-11-06 14:32               ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 14:32 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
	linux-mm, linux-arch, Linus Torvalds

On Wed, Nov 06, 2013 at 03:31:12PM +0200, Kirill A. Shutemov wrote:
> Should we get rid of CONFIG_CMPXCHG_LOCKREF completely and have here:
> 
> #if defined(CONFIG_ARCH_USE_CMPXCHG_LOCKREF) && \
> 	defined(CONFIG_SMP) && !BLOATED_SPINLOCKS
> 

Yeah, that might make more sense.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-11-06 14:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-10-22 12:55 ` Fengguang Wu
2013-11-04 10:42 ` Kirill A. Shutemov
2013-11-05 23:01 ` Andrew Morton
2013-11-05 22:42   ` Kirill A. Shutemov
2013-11-05 23:56     ` Andrew Morton
2013-11-05 23:13       ` Kirill A. Shutemov
2013-11-06  0:43         ` Andrew Morton
2013-11-06  9:31         ` Peter Zijlstra
2013-11-06 11:18           ` Peter Zijlstra
2013-11-06 13:31             ` lockref: Use bloated_spinlocks to avoid explicit config dependencies Kirill A. Shutemov
2013-11-06 14:32               ` Peter Zijlstra
2013-11-06 13:21           ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-11-06 14:30             ` Peter Zijlstra
2013-11-06 10:34         ` Will Deacon
2013-11-06 10:49           ` Geert Uytterhoeven
2013-11-06 11:02           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).