* [PATCH] mm: create a separate slab for page->ptl allocation
@ 2013-10-22 11:53 Kirill A. Shutemov
2013-10-22 12:55 ` Fengguang Wu
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-10-22 11:53 UTC (permalink / raw)
To: Andrew Morton, Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, linux-mm, linux-arch, Kirill A. Shutemov
If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
so we loose 24 on each. An average system can easily allocate few tens
thousands of page->ptl and overhead is significant.
Let's create a separate slab for page->ptl allocation to solve this.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 8 ++++++++
init/main.c | 2 +-
mm/memory.c | 12 ++++++++++--
3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9a4a873b2f..2de5da0a41 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1233,6 +1233,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
#if USE_SPLIT_PTE_PTLOCKS
+void __init ptlock_cache_init(void);
bool __ptlock_alloc(struct page *page);
void __ptlock_free(struct page *page);
static inline bool ptlock_alloc(struct page *page)
@@ -1285,6 +1286,7 @@ static inline void pte_lock_deinit(struct page *page)
}
#else /* !USE_SPLIT_PTE_PTLOCKS */
+static inline void ptlock_cache_init(void) {}
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
@@ -1296,6 +1298,12 @@ static inline bool ptlock_init(struct page *page) { return true; }
static inline void pte_lock_deinit(struct page *page) {}
#endif /* USE_SPLIT_PTE_PTLOCKS */
+static inline void pgtable_init(void)
+{
+ ptlock_cache_init();
+ pgtable_cache_init();
+}
+
static inline bool pgtable_page_ctor(struct page *page)
{
inc_zone_page_state(page, NR_PAGETABLE);
diff --git a/init/main.c b/init/main.c
index af310afbef..c71b505392 100644
--- a/init/main.c
+++ b/init/main.c
@@ -466,7 +466,7 @@ static void __init mm_init(void)
mem_init();
kmem_cache_init();
percpu_init_late();
- pgtable_cache_init();
+ pgtable_init();
vmalloc_init();
}
diff --git a/mm/memory.c b/mm/memory.c
index 7e11f745bc..d7e583e270 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4332,11 +4332,19 @@ void copy_user_huge_page(struct page *dst, struct page *src,
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
#if USE_SPLIT_PTE_PTLOCKS
+struct kmem_cache *page_ptl_cachep;
+void __init ptlock_cache_init(void)
+{
+ if (sizeof(spinlock_t) > sizeof(long))
+ page_ptl_cachep = kmem_cache_create("page->ptl",
+ sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
+}
+
bool __ptlock_alloc(struct page *page)
{
spinlock_t *ptl;
- ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+ ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
if (!ptl)
return false;
page->ptl = (unsigned long)ptl;
@@ -4346,6 +4354,6 @@ bool __ptlock_alloc(struct page *page)
void __ptlock_free(struct page *page)
{
if (sizeof(spinlock_t) > sizeof(page->ptl))
- kfree((spinlock_t *)page->ptl);
+ kmem_cache_free(page_ptl_cachep, (spinlock_t *)page->ptl);
}
#endif
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
@ 2013-10-22 12:55 ` Fengguang Wu
2013-11-04 10:42 ` Kirill A. Shutemov
2013-11-05 23:01 ` Andrew Morton
2 siblings, 0 replies; 17+ messages in thread
From: Fengguang Wu @ 2013-10-22 12:55 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Tue, Oct 22, 2013 at 02:53:59PM +0300, Kirill A. Shutemov wrote:
> If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> so we loose 24 on each. An average system can easily allocate few tens
> thousands of page->ptl and overhead is significant.
>
> Let's create a separate slab for page->ptl allocation to solve this.
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
In a 4p server, we noticed up to +469.1% increase in will-it-scale page_fault3
test case and +199.8% in vm-scalability case-shm-pread-seq-mt.
5c02216ce3110aab070d 5a58baaa0a1af0a43d7c
------------------------ ------------------------
300409.00 +440.2% 1622770.80 TOTAL will-it-scale.page_fault3.90.threads
5c02216ce3110aab070d 5a58baaa0a1af0a43d7c
------------------------ ------------------------
291257.80 +469.1% 1657582.20 TOTAL will-it-scale.page_fault3.120.threads
...
5c02216ce3110aab070d 5a58baaa0a1af0a43d7c
------------------------ ------------------------
4034831.40 +199.8% 12095649.80 TOTAL vm-scalability.throughput
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] mm: create a separate slab for page->ptl allocation
2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-10-22 12:55 ` Fengguang Wu
@ 2013-11-04 10:42 ` Kirill A. Shutemov
2013-11-05 23:01 ` Andrew Morton
2 siblings, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-04 10:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linux-mm, linux-arch,
Kirill A. Shutemov
Kirill A. Shutemov wrote:
> If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> so we loose 24 on each. An average system can easily allocate few tens
> thousands of page->ptl and overhead is significant.
>
> Let's create a separate slab for page->ptl allocation to solve this.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
ping?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-05 23:01 ` Andrew Morton
@ 2013-11-05 22:42 ` Kirill A. Shutemov
2013-11-05 23:56 ` Andrew Morton
0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-05 22:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
[ sorry, resend to all ]
On Tue, Nov 05, 2013 at 03:01:45PM -0800, Andrew Morton wrote:
> On Tue, 22 Oct 2013 14:53:59 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>
> > If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> > is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> > so we loose 24 on each. An average system can easily allocate few tens
> > thousands of page->ptl and overhead is significant.
> >
> > Let's create a separate slab for page->ptl allocation to solve this.
> >
> > ...
> >
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4332,11 +4332,19 @@ void copy_user_huge_page(struct page *dst, struct page *src,
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
> >
> > #if USE_SPLIT_PTE_PTLOCKS
> > +struct kmem_cache *page_ptl_cachep;
> > +void __init ptlock_cache_init(void)
> > +{
> > + if (sizeof(spinlock_t) > sizeof(long))
> > + page_ptl_cachep = kmem_cache_create("page->ptl",
> > + sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > +}
>
> Confused. If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> then the kernel will later crash. It would be better to use BUILD_BUG_ON()
> here, if that works. Otherwise BUG_ON.
if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
never called in this case.
> Also, we have the somewhat silly KMEM_CACHE() macro, but it looks
> inapplicable here?
The first argument of KMEM_CACHE() is struct name, but we have typedef
here.
> > bool __ptlock_alloc(struct page *page)
> > {
> > spinlock_t *ptl;
> >
> > - ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
> > + ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
> > if (!ptl)
> > return false;
> > page->ptl = (unsigned long)ptl;
> > @@ -4346,6 +4354,6 @@ bool __ptlock_alloc(struct page *page)
> > void __ptlock_free(struct page *page)
> > {
> > if (sizeof(spinlock_t) > sizeof(page->ptl))
> > - kfree((spinlock_t *)page->ptl);
> > + kmem_cache_free(page_ptl_cachep, (spinlock_t *)page->ptl);
>
> A void* cast would suffice here, but I suppose the spinlock_t* cast has
> some documentation value.
Right.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-10-22 12:55 ` Fengguang Wu
2013-11-04 10:42 ` Kirill A. Shutemov
@ 2013-11-05 23:01 ` Andrew Morton
2013-11-05 22:42 ` Kirill A. Shutemov
2 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2013-11-05 23:01 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, linux-mm, linux-arch
On Tue, 22 Oct 2013 14:53:59 +0300 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
> is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab,
> so we loose 24 on each. An average system can easily allocate few tens
> thousands of page->ptl and overhead is significant.
>
> Let's create a separate slab for page->ptl allocation to solve this.
>
> ...
>
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4332,11 +4332,19 @@ void copy_user_huge_page(struct page *dst, struct page *src,
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
>
> #if USE_SPLIT_PTE_PTLOCKS
> +struct kmem_cache *page_ptl_cachep;
> +void __init ptlock_cache_init(void)
> +{
> + if (sizeof(spinlock_t) > sizeof(long))
> + page_ptl_cachep = kmem_cache_create("page->ptl",
> + sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> +}
Confused. If (sizeof(spinlock_t) > sizeof(long)) happens to be false
then the kernel will later crash. It would be better to use BUILD_BUG_ON()
here, if that works. Otherwise BUG_ON.
Also, we have the somewhat silly KMEM_CACHE() macro, but it looks
inapplicable here?
> bool __ptlock_alloc(struct page *page)
> {
> spinlock_t *ptl;
>
> - ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
> + ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
> if (!ptl)
> return false;
> page->ptl = (unsigned long)ptl;
> @@ -4346,6 +4354,6 @@ bool __ptlock_alloc(struct page *page)
> void __ptlock_free(struct page *page)
> {
> if (sizeof(spinlock_t) > sizeof(page->ptl))
> - kfree((spinlock_t *)page->ptl);
> + kmem_cache_free(page_ptl_cachep, (spinlock_t *)page->ptl);
A void* cast would suffice here, but I suppose the spinlock_t* cast has
some documentation value.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-05 23:56 ` Andrew Morton
@ 2013-11-05 23:13 ` Kirill A. Shutemov
2013-11-06 0:43 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-05 23:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Tue, Nov 05, 2013 at 03:56:19PM -0800, Andrew Morton wrote:
> On Wed, 6 Nov 2013 00:42:17 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
> > > > #if USE_SPLIT_PTE_PTLOCKS
> > > > +struct kmem_cache *page_ptl_cachep;
> > > > +void __init ptlock_cache_init(void)
> > > > +{
> > > > + if (sizeof(spinlock_t) > sizeof(long))
> > > > + page_ptl_cachep = kmem_cache_create("page->ptl",
> > > > + sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > > > +}
> > >
> > > Confused. If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> > > then the kernel will later crash. It would be better to use BUILD_BUG_ON()
> > > here, if that works. Otherwise BUG_ON.
> >
> > if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
> > allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
> > never called in this case.
>
> OK. Please add a comment explaining this so the next reader doesn't get
> tripped up like I was.
Okay, I will tomorrow.
> Really the function shouldn't exist in this case. It is __init so the
> sin is not terrible, but can this be arranged?
I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
don't see a way within C: we need to know sizeof(spinlock_t) on
preprocessor stage.
We can have a hack on kbuild level: write small helper program to find out
sizeof(spinlock_t) before start building and turn it into define.
But it's overkill from my POV. And cross-compilation will be a fun.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-05 22:42 ` Kirill A. Shutemov
@ 2013-11-05 23:56 ` Andrew Morton
2013-11-05 23:13 ` Kirill A. Shutemov
0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2013-11-05 23:56 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Wed, 6 Nov 2013 00:42:17 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > > #if USE_SPLIT_PTE_PTLOCKS
> > > +struct kmem_cache *page_ptl_cachep;
> > > +void __init ptlock_cache_init(void)
> > > +{
> > > + if (sizeof(spinlock_t) > sizeof(long))
> > > + page_ptl_cachep = kmem_cache_create("page->ptl",
> > > + sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > > +}
> >
> > Confused. If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> > then the kernel will later crash. It would be better to use BUILD_BUG_ON()
> > here, if that works. Otherwise BUG_ON.
>
> if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
> allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
> never called in this case.
OK. Please add a comment explaining this so the next reader doesn't get
tripped up like I was.
Really the function shouldn't exist in this case. It is __init so the
sin is not terrible, but can this be arranged?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-05 23:13 ` Kirill A. Shutemov
@ 2013-11-06 0:43 ` Andrew Morton
2013-11-06 9:31 ` Peter Zijlstra
2013-11-06 10:34 ` Will Deacon
2 siblings, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2013-11-06 0:43 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Wed, 6 Nov 2013 01:13:11 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> > Really the function shouldn't exist in this case. It is __init so the
> > sin is not terrible, but can this be arranged?
>
> I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> don't see a way within C: we need to know sizeof(spinlock_t) on
> preprocessor stage.
>
> We can have a hack on kbuild level: write small helper program to find out
> sizeof(spinlock_t) before start building and turn it into define.
> But it's overkill from my POV. And cross-compilation will be a fun.
Yes, it doesn't seem worth the fuss. The compiler will remove all this
code anyway, so for example ptlock_cache_init() becomes an empty function.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-05 23:13 ` Kirill A. Shutemov
2013-11-06 0:43 ` Andrew Morton
@ 2013-11-06 9:31 ` Peter Zijlstra
2013-11-06 11:18 ` Peter Zijlstra
2013-11-06 13:21 ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-11-06 10:34 ` Will Deacon
2 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 9:31 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Wed, Nov 06, 2013 at 01:13:11AM +0200, Kirill A. Shutemov wrote:
> I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> don't see a way within C: we need to know sizeof(spinlock_t) on
> preprocessor stage.
>
> We can have a hack on kbuild level: write small helper program to find out
> sizeof(spinlock_t) before start building and turn it into define.
> But it's overkill from my POV. And cross-compilation will be a fun.
Ah, I just remembered, we have such a thing!
---
Subject: mm: Properly separate the bloated ptl from the regular case
Use kernel/bounds.c to convert build-time spinlock_t size into a
preprocessor symbol and apply that to properly separate the page::ptl
situation.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
include/linux/mm.h | 24 +++++++++++++-----------
include/linux/mm_types.h | 9 +++++----
kernel/bounds.c | 2 ++
mm/memory.c | 11 +++++------
4 files changed, 25 insertions(+), 21 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d0339741b6ce..6ab26704671b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1317,27 +1317,29 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
#if USE_SPLIT_PTE_PTLOCKS
-bool __ptlock_alloc(struct page *page);
-void __ptlock_free(struct page *page);
+#if BLOATED_SPINLOCKS
+extern bool ptlock_alloc(struct page *page);
+extern void ptlock_free(struct page *page);
+
+static inline spinlock_t *ptlock_ptr(struct page *page)
+{
+ return page->ptl;
+}
+#else /* BLOATED_SPINLOCKS */
static inline bool ptlock_alloc(struct page *page)
{
- if (sizeof(spinlock_t) > sizeof(page->ptl))
- return __ptlock_alloc(page);
return true;
}
+
static inline void ptlock_free(struct page *page)
{
- if (sizeof(spinlock_t) > sizeof(page->ptl))
- __ptlock_free(page);
}
static inline spinlock_t *ptlock_ptr(struct page *page)
{
- if (sizeof(spinlock_t) > sizeof(page->ptl))
- return (spinlock_t *) page->ptl;
- else
- return (spinlock_t *) &page->ptl;
+ return &page->ptl;
}
+#endif /* BLOATED_SPINLOCKS */
static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
@@ -1354,7 +1356,7 @@ static inline bool ptlock_init(struct page *page)
* slab code uses page->slab_cache and page->first_page (for tail
* pages), which share storage with page->ptl.
*/
- VM_BUG_ON(page->ptl);
+ VM_BUG_ON(*(unsigned long *)&page->ptl);
if (!ptlock_alloc(page))
return false;
spin_lock_init(ptlock_ptr(page));
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5bee515c4505..f706743b63bb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -155,10 +155,11 @@ struct page {
* system if PG_buddy is set.
*/
#if USE_SPLIT_PTE_PTLOCKS
- unsigned long ptl; /* It's spinlock_t if it fits to long,
- * otherwise it's pointer to dynamicaly
- * allocated spinlock_t.
- */
+#if BLOATED_SPINLOCKS
+ spinlock_t *ptl;
+#else
+ spinlock_t ptl;
+#endif
#endif
struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
struct page *first_page; /* Compound tail pages */
diff --git a/kernel/bounds.c b/kernel/bounds.c
index e8ca97b5c386..5982437eca2c 100644
--- a/kernel/bounds.c
+++ b/kernel/bounds.c
@@ -11,6 +11,7 @@
#include <linux/kbuild.h>
#include <linux/page_cgroup.h>
#include <linux/log2.h>
+#include <linux/spinlock.h>
void foo(void)
{
@@ -21,5 +22,6 @@ void foo(void)
#ifdef CONFIG_SMP
DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
#endif
+ DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int));
/* End of constants */
}
diff --git a/mm/memory.c b/mm/memory.c
index 6f7bdee617e2..8356eac27d0a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4271,21 +4271,20 @@ void copy_user_huge_page(struct page *dst, struct page *src,
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
-#if USE_SPLIT_PTE_PTLOCKS
-bool __ptlock_alloc(struct page *page)
+#if USE_SPLIT_PTE_PTLOCKS && BLOATED_SPINLOCKS
+bool ptlock_alloc(struct page *page)
{
spinlock_t *ptl;
ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
if (!ptl)
return false;
- page->ptl = (unsigned long)ptl;
+ page->ptl = ptl;
return true;
}
-void __ptlock_free(struct page *page)
+void ptlock_free(struct page *page)
{
- if (sizeof(spinlock_t) > sizeof(page->ptl))
- kfree((spinlock_t *)page->ptl);
+ kfree(page->ptl);
}
#endif
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-05 23:13 ` Kirill A. Shutemov
2013-11-06 0:43 ` Andrew Morton
2013-11-06 9:31 ` Peter Zijlstra
@ 2013-11-06 10:34 ` Will Deacon
2013-11-06 10:49 ` Geert Uytterhoeven
2013-11-06 11:02 ` Peter Zijlstra
2 siblings, 2 replies; 17+ messages in thread
From: Will Deacon @ 2013-11-06 10:34 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Ingo Molnar,
linux-kernel, linux-mm, linux-arch
On Tue, Nov 05, 2013 at 11:13:11PM +0000, Kirill A. Shutemov wrote:
> On Tue, Nov 05, 2013 at 03:56:19PM -0800, Andrew Morton wrote:
> > On Wed, 6 Nov 2013 00:42:17 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >
> > > > > #if USE_SPLIT_PTE_PTLOCKS
> > > > > +struct kmem_cache *page_ptl_cachep;
> > > > > +void __init ptlock_cache_init(void)
> > > > > +{
> > > > > + if (sizeof(spinlock_t) > sizeof(long))
> > > > > + page_ptl_cachep = kmem_cache_create("page->ptl",
> > > > > + sizeof(spinlock_t), 0, SLAB_PANIC, NULL);
> > > > > +}
> > > >
> > > > Confused. If (sizeof(spinlock_t) > sizeof(long)) happens to be false
> > > > then the kernel will later crash. It would be better to use BUILD_BUG_ON()
> > > > here, if that works. Otherwise BUG_ON.
> > >
> > > if (sizeof(spinlock_t) > sizeof(long)) is false, we don't need dynamicly
> > > allocate page->ptl. It's embedded to struct page itself. __ptlock_alloc()
> > > never called in this case.
> >
> > OK. Please add a comment explaining this so the next reader doesn't get
> > tripped up like I was.
>
> Okay, I will tomorrow.
>
> > Really the function shouldn't exist in this case. It is __init so the
> > sin is not terrible, but can this be arranged?
>
> I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> don't see a way within C: we need to know sizeof(spinlock_t) on
> preprocessor stage.
FWIW: if the architecture selects ARCH_USE_CMPXCHG_LOCKREF, then a spinlock_t
is 32-bit (assuming that unsigned int is also 32-bit).
Will
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-06 10:34 ` Will Deacon
@ 2013-11-06 10:49 ` Geert Uytterhoeven
2013-11-06 11:02 ` Peter Zijlstra
1 sibling, 0 replies; 17+ messages in thread
From: Geert Uytterhoeven @ 2013-11-06 10:49 UTC (permalink / raw)
To: Will Deacon
Cc: Kirill A. Shutemov, Andrew Morton, Kirill A. Shutemov,
Peter Zijlstra, Ingo Molnar, linux-kernel, linux-mm, linux-arch
On Wed, Nov 6, 2013 at 11:34 AM, Will Deacon <will.deacon@arm.com> wrote:
> FWIW: if the architecture selects ARCH_USE_CMPXCHG_LOCKREF, then a spinlock_t
> is 32-bit (assuming that unsigned int is also 32-bit).
Linux already assumes (unsigned) int is 32-bit.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-06 10:34 ` Will Deacon
2013-11-06 10:49 ` Geert Uytterhoeven
@ 2013-11-06 11:02 ` Peter Zijlstra
1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 11:02 UTC (permalink / raw)
To: Will Deacon
Cc: Kirill A. Shutemov, Andrew Morton, Kirill A. Shutemov,
Ingo Molnar, linux-kernel, linux-mm, linux-arch
On Wed, Nov 06, 2013 at 10:34:03AM +0000, Will Deacon wrote:
> FWIW: if the architecture selects ARCH_USE_CMPXCHG_LOCKREF, then a spinlock_t
> is 32-bit (assuming that unsigned int is also 32-bit).
Egads, talk about fragile. That thing relies on someone actually keeping
lib/Kconfig up-to-date.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-06 9:31 ` Peter Zijlstra
@ 2013-11-06 11:18 ` Peter Zijlstra
2013-11-06 13:31 ` lockref: Use bloated_spinlocks to avoid explicit config dependencies Kirill A. Shutemov
2013-11-06 13:21 ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 11:18 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> Subject: mm: Properly separate the bloated ptl from the regular case
>
> Use kernel/bounds.c to convert build-time spinlock_t size into a
> preprocessor symbol and apply that to properly separate the page::ptl
> situation.
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
> include/linux/mm.h | 24 +++++++++++++-----------
> include/linux/mm_types.h | 9 +++++----
> kernel/bounds.c | 2 ++
> mm/memory.c | 11 +++++------
> 4 files changed, 25 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/bounds.c b/kernel/bounds.c
> index e8ca97b5c386..5982437eca2c 100644
> --- a/kernel/bounds.c
> +++ b/kernel/bounds.c
> @@ -11,6 +11,7 @@
> #include <linux/kbuild.h>
> #include <linux/page_cgroup.h>
> #include <linux/log2.h>
> +#include <linux/spinlock.h>
>
> void foo(void)
> {
> @@ -21,5 +22,6 @@ void foo(void)
> #ifdef CONFIG_SMP
> DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
> #endif
> + DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int));
> /* End of constants */
> }
Using that we could also do.. not been near a compiler.
---
Subject: lockref: Use bloated_spinlocks to avoid explicit config dependencies
Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
friendly compile-time test to determine this.
Not-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
lib/Kconfig | 3 ---
lib/lockref.c | 2 +-
2 files changed, 1 insertion(+), 4 deletions(-)
diff --git a/lib/Kconfig b/lib/Kconfig
index b3c8be0da17f..254af289d1d0 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -54,9 +54,6 @@ config ARCH_USE_CMPXCHG_LOCKREF
config CMPXCHG_LOCKREF
def_bool y if ARCH_USE_CMPXCHG_LOCKREF
depends on SMP
- depends on !GENERIC_LOCKBREAK
- depends on !DEBUG_SPINLOCK
- depends on !DEBUG_LOCK_ALLOC
config CRC_CCITT
tristate "CRC-CCITT functions"
diff --git a/lib/lockref.c b/lib/lockref.c
index 6f9d434c1521..a158fd86aa1a 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -1,7 +1,7 @@
#include <linux/export.h>
#include <linux/lockref.h>
-#ifdef CONFIG_CMPXCHG_LOCKREF
+#if defined(CONFIG_CMPXCHG_LOCKREF) && !BLOATED_SPINLOCKS
/*
* Allow weakly-ordered memory architectures to provide barrier-less
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-06 9:31 ` Peter Zijlstra
2013-11-06 11:18 ` Peter Zijlstra
@ 2013-11-06 13:21 ` Kirill A. Shutemov
2013-11-06 14:30 ` Peter Zijlstra
1 sibling, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-06 13:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 06, 2013 at 01:13:11AM +0200, Kirill A. Shutemov wrote:
> > I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> > don't see a way within C: we need to know sizeof(spinlock_t) on
> > preprocessor stage.
> >
> > We can have a hack on kbuild level: write small helper program to find out
> > sizeof(spinlock_t) before start building and turn it into define.
> > But it's overkill from my POV. And cross-compilation will be a fun.
>
> Ah, I just remembered, we have such a thing!
Great!
> @@ -1354,7 +1356,7 @@ static inline bool ptlock_init(struct page *page)
> * slab code uses page->slab_cache and page->first_page (for tail
> * pages), which share storage with page->ptl.
> */
> - VM_BUG_ON(page->ptl);
> + VM_BUG_ON(*(unsigned long *)&page->ptl);
Huh? Why not direct cast to unsigned long?
VM_BUG_ON((unsigned long)page->ptl);
Otherwise:
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 17+ messages in thread
* lockref: Use bloated_spinlocks to avoid explicit config dependencies
2013-11-06 11:18 ` Peter Zijlstra
@ 2013-11-06 13:31 ` Kirill A. Shutemov
2013-11-06 14:32 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2013-11-06 13:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
linux-mm, linux-arch, Linus Torvalds
On Wed, Nov 06, 2013 at 12:18:45PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> > Subject: mm: Properly separate the bloated ptl from the regular case
> >
> > Use kernel/bounds.c to convert build-time spinlock_t size into a
> > preprocessor symbol and apply that to properly separate the page::ptl
> > situation.
> >
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > ---
> > include/linux/mm.h | 24 +++++++++++++-----------
> > include/linux/mm_types.h | 9 +++++----
> > kernel/bounds.c | 2 ++
> > mm/memory.c | 11 +++++------
> > 4 files changed, 25 insertions(+), 21 deletions(-)
> >
> > diff --git a/kernel/bounds.c b/kernel/bounds.c
> > index e8ca97b5c386..5982437eca2c 100644
> > --- a/kernel/bounds.c
> > +++ b/kernel/bounds.c
> > @@ -11,6 +11,7 @@
> > #include <linux/kbuild.h>
> > #include <linux/page_cgroup.h>
> > #include <linux/log2.h>
> > +#include <linux/spinlock.h>
> >
> > void foo(void)
> > {
> > @@ -21,5 +22,6 @@ void foo(void)
> > #ifdef CONFIG_SMP
> > DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
> > #endif
> > + DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int));
> > /* End of constants */
> > }
>
> Using that we could also do.. not been near a compiler.
>
[ Subject adjusted, CC: +Linus ]
> ---
> Subject: lockref: Use bloated_spinlocks to avoid explicit config dependencies
>
> Avoid the fragile Kconfig construct guestimating spinlock_t sizes; use a
> friendly compile-time test to determine this.
>
> Not-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
> lib/Kconfig | 3 ---
> lib/lockref.c | 2 +-
> 2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index b3c8be0da17f..254af289d1d0 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -54,9 +54,6 @@ config ARCH_USE_CMPXCHG_LOCKREF
> config CMPXCHG_LOCKREF
> def_bool y if ARCH_USE_CMPXCHG_LOCKREF
> depends on SMP
> - depends on !GENERIC_LOCKBREAK
> - depends on !DEBUG_SPINLOCK
> - depends on !DEBUG_LOCK_ALLOC
>
> config CRC_CCITT
> tristate "CRC-CCITT functions"
> diff --git a/lib/lockref.c b/lib/lockref.c
> index 6f9d434c1521..a158fd86aa1a 100644
> --- a/lib/lockref.c
> +++ b/lib/lockref.c
> @@ -1,7 +1,7 @@
> #include <linux/export.h>
> #include <linux/lockref.h>
>
> -#ifdef CONFIG_CMPXCHG_LOCKREF
> +#if defined(CONFIG_CMPXCHG_LOCKREF) && !BLOATED_SPINLOCKS
Having CONFIG_CMPXCHG_LOCKREF=y, but not really using it could be
misleading.
Should we get rid of CONFIG_CMPXCHG_LOCKREF completely and have here:
#if defined(CONFIG_ARCH_USE_CMPXCHG_LOCKREF) && \
defined(CONFIG_SMP) && !BLOATED_SPINLOCKS
?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] mm: create a separate slab for page->ptl allocation
2013-11-06 13:21 ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
@ 2013-11-06 14:30 ` Peter Zijlstra
0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 14:30 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
linux-mm, linux-arch
On Wed, Nov 06, 2013 at 03:21:55PM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 06, 2013 at 10:31:31AM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 06, 2013 at 01:13:11AM +0200, Kirill A. Shutemov wrote:
> > > I would like to get rid of __ptlock_alloc()/__ptlock_free() too, but I
> > > don't see a way within C: we need to know sizeof(spinlock_t) on
> > > preprocessor stage.
> > >
> > > We can have a hack on kbuild level: write small helper program to find out
> > > sizeof(spinlock_t) before start building and turn it into define.
> > > But it's overkill from my POV. And cross-compilation will be a fun.
> >
> > Ah, I just remembered, we have such a thing!
>
> Great!
>
> > @@ -1354,7 +1356,7 @@ static inline bool ptlock_init(struct page *page)
> > * slab code uses page->slab_cache and page->first_page (for tail
> > * pages), which share storage with page->ptl.
> > */
> > - VM_BUG_ON(page->ptl);
> > + VM_BUG_ON(*(unsigned long *)&page->ptl);
>
> Huh? Why not direct cast to unsigned long?
>
> VM_BUG_ON((unsigned long)page->ptl);
I tried, GCC didn't dig that. I think because spinlock_t is a composite
type and you cannot cast that to a primitive type.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: lockref: Use bloated_spinlocks to avoid explicit config dependencies
2013-11-06 13:31 ` lockref: Use bloated_spinlocks to avoid explicit config dependencies Kirill A. Shutemov
@ 2013-11-06 14:32 ` Peter Zijlstra
0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2013-11-06 14:32 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Kirill A. Shutemov, Ingo Molnar, linux-kernel,
linux-mm, linux-arch, Linus Torvalds
On Wed, Nov 06, 2013 at 03:31:12PM +0200, Kirill A. Shutemov wrote:
> Should we get rid of CONFIG_CMPXCHG_LOCKREF completely and have here:
>
> #if defined(CONFIG_ARCH_USE_CMPXCHG_LOCKREF) && \
> defined(CONFIG_SMP) && !BLOATED_SPINLOCKS
>
Yeah, that might make more sense.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-11-06 14:32 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-22 11:53 [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-10-22 12:55 ` Fengguang Wu
2013-11-04 10:42 ` Kirill A. Shutemov
2013-11-05 23:01 ` Andrew Morton
2013-11-05 22:42 ` Kirill A. Shutemov
2013-11-05 23:56 ` Andrew Morton
2013-11-05 23:13 ` Kirill A. Shutemov
2013-11-06 0:43 ` Andrew Morton
2013-11-06 9:31 ` Peter Zijlstra
2013-11-06 11:18 ` Peter Zijlstra
2013-11-06 13:31 ` lockref: Use bloated_spinlocks to avoid explicit config dependencies Kirill A. Shutemov
2013-11-06 14:32 ` Peter Zijlstra
2013-11-06 13:21 ` [PATCH] mm: create a separate slab for page->ptl allocation Kirill A. Shutemov
2013-11-06 14:30 ` Peter Zijlstra
2013-11-06 10:34 ` Will Deacon
2013-11-06 10:49 ` Geert Uytterhoeven
2013-11-06 11:02 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).