linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
@ 2013-04-26 12:13 Zhang Yi
  2013-04-26 18:26 ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2013-04-26 12:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: 'Peter Zijlstra', 'Darren Hart',
	'Thomas Gleixner', 'Ingo Molnar',
	'Dave Hansen',
	zhang.yi20, wetpzy

Hi ,

At 2013-04-26 04:52:31,"Thomas Gleixner" <tglx@linutronix.de> wrote:
>
>Unfortunately this did not work out very well.
>
>1. Your patch now lacks a proper changelog which explains the change
>
>2. Your patch lacks any newline characters as you can see below
>

I am so sorry for my mistakes. : )




The futex-keys of processes share futex determined by page-offset, mapping-host, and
mapping-index of the user space address.
User appications using hugepage for futex may lead to futex-key conflict.
Assume there are two or more futexes in diffrent normal pages of the hugepage,
and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
In that case, futex may not work well.

This patch adds the normal page index in the compound page into the offset of futex-key.

Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address as the 1st mutex's
address, and use the return address with PAGE_SIZE added as the 2nd mutex's address;
2. The 1st thread initialize the two mutexes with pshared attribute, and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block on the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block on the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread can not take the 2nd mutex, and
may block forever.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>


diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
--- orig/linux3.9-rc7/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
+++ new/linux3.9-rc7/include/linux/futex.h	2013-04-19 16:33:58.725880000 +0000
@@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
  * The key type depends on whether it's a shared or private mapping.
  * Don't rearrange members without looking at hash_futex().
  *
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
+ * There are three components in offset:
  * We use the two low order bits of offset to tell what is the kind of key :
  *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
  *       (no reference on an inode or mm)
@@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
  *	mapped on a file (reference on the underlying inode)
  *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
  *       (but private mapping on an mm, and reference taken on it)
+ * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
+ * The rest hign order bits indicates the index if the page is a
+ * subpage of a compound page.
 */

 #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
@@ -36,17 +39,17 @@ union futex_key {
 	struct {
 		unsigned long pgoff;
 		struct inode *inode;
-		int offset;
+		long offset;
 	} shared;
 	struct {
 		unsigned long address;
 		struct mm_struct *mm;
-		int offset;
+		long offset;
 	} private;
 	struct {
 		unsigned long word;
 		void *ptr;
-		int offset;
+		long offset;
 	} both;
 };

diff -uprN orig/linux3.9-rc7/kernel/futex.c new/linux3.9-rc7/kernel/futex.c
--- orig/linux3.9-rc7/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
+++ new/linux3.9-rc7/kernel/futex.c	2013-04-19 16:24:05.629143000 +0000
@@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
 	}
 }

+/*
+* Get subpage index in compound page, for futex_key.
+*/
+static inline int get_page_compound_index(struct page *page)
+{
+	struct page *head_page;
+	if (PageHead(page))
+		return 0;
+
+	head_page = compound_head(page);
+	if (compound_order(head_page) >= MAX_ORDER)
+		return page_to_pfn(page) - page_to_pfn(head_page);
+	else
+		return page - compound_head(page);
+}
+
 /**
  * get_futex_key() - Get parameters which are the keys for a futex
  * @uaddr:	virtual address of the futex
@@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
  * The key words are stored in *key on success.
  *
  * For shared mappings, it's (page->index, file_inode(vma->vm_file),
- * offset_within_page).  For private mappings, it's (uaddr, current->mm).
+ * page_compound_index and offset_within_page).
+ * For private mappings, it's (uaddr, current->mm).
  * We can usually work out the index without swapping in the page.
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
@@ -239,7 +256,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
 	struct page *page, *page_head;
-	int err, ro = 0;
+	int err, ro = 0, comp_idx = 0;

 	/*
 	 * The futex address must be "naturally" aligned.
@@ -299,6 +316,7 @@ again:
 			 * freed from under us.
 			 */
 			if (page != page_head) {
+				comp_idx = get_page_compound_index(page);
 				get_page(page_head);
 				put_page(page);
 			}
@@ -311,6 +329,7 @@ again:
 #else
 	page_head = compound_head(page);
 	if (page != page_head) {
+		comp_idx = get_page_compound_index(page);
 		get_page(page_head);
 		put_page(page);
 	}
@@ -363,7 +382,9 @@ again:
 		key->private.mm = mm;
 		key->private.address = address;
 	} else {
-		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
+		/* inode-based key */
+		key->both.offset |= ((long)comp_idx << PAGE_SHIFT)
+				   | FUT_OFF_INODE;
 		key->shared.inode = page_head->mapping->host;
 		key->shared.pgoff = page_head->index;
 	}



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-26 12:13 [PATCH] futex: bugfix for futex-key conflict when futex use hugepage Zhang Yi
@ 2013-04-26 18:26 ` Thomas Gleixner
  2013-05-07 12:23   ` Zhang Yi
  2013-05-07 12:34   ` Zhang Yi
  0 siblings, 2 replies; 20+ messages in thread
From: Thomas Gleixner @ 2013-04-26 18:26 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-kernel, 'Peter Zijlstra', 'Darren Hart',
	'Ingo Molnar', 'Dave Hansen',
	zhang.yi20, wetpzy

Zhang,

On Fri, 26 Apr 2013, Zhang Yi wrote:
> At 2013-04-26 04:52:31,"Thomas Gleixner" <tglx@linutronix.de> wrote:
> >
> >Unfortunately this did not work out very well.
> >
> >1. Your patch now lacks a proper changelog which explains the change
> >
> >2. Your patch lacks any newline characters as you can see below
> >
> 
> I am so sorry for my mistakes. : )

Nothing to worry about. We all make mistakes! :)

> The futex-keys of processes share futex determined by page-offset, mapping-host, and
> mapping-index of the user space address.
> User appications using hugepage for futex may lead to futex-key conflict.
> Assume there are two or more futexes in diffrent normal pages of the hugepage,
> and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.

Nit-pick: Please format changelog text with a linebreak around 78
characters. So it looks like this:

The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.

> In that case, futex may not work well.

Very nice detective work!
 
> diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
> --- orig/linux3.9-rc7/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
> +++ new/linux3.9-rc7/include/linux/futex.h	2013-04-19 16:33:58.725880000 +0000

The canonical diff for patch submission is

  diff -uprN linux3.9-rc7/ linux3.9-rc7.orig/

That results in a patch which can be applied with "patch -p1" from the
kernel base directory and that's how all our scripts work.

Your's needs to be applied with -p2, so it requires manual
interaction.

You can verify that by cd'ing into the kernel tree base directory and
run "patch -p1 < your.patch".

You might have a look at quilt or simply use git, which will do the
right thing for you and in both cases you do not need a separate
kernel tree to diff against.

>  #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
> @@ -36,17 +39,17 @@ union futex_key {
>  	struct {
>  		unsigned long pgoff;
>  		struct inode *inode;
> -		int offset;
> +		long offset;

unsigned long please, offset can't be negative. The "int" type of
offset was silly already.

> +/*
> +* Get subpage index in compound page, for futex_key.
> +*/
> +static inline int get_page_compound_index(struct page *page)
> +{
> +	struct page *head_page;
> +	if (PageHead(page))
> +		return 0;

If you look at the callsite, then you'll see that this is only called
when page != page_head. And page_head = compound_head(page). So you
don't need to double check that.

> +
> +	head_page = compound_head(page);

Again. The head page is already known, so you can hand it into the
function.

> +	if (compound_order(head_page) >= MAX_ORDER)
> +		return page_to_pfn(page) - page_to_pfn(head_page);
> +	else
> +		return page - compound_head(page);
> +}
> +

Now instead of returning that value, I'd rather hand the futex key
pointer to the function and let the function add the index
value. Something like:

static void key_add_compound_idx(key, page, page_head)
{
	...
}

That makes the code simpler and easier to read.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-26 18:26 ` Thomas Gleixner
@ 2013-05-07 12:23   ` Zhang Yi
  2013-05-07 15:20     ` Mel Gorman
  2013-05-07 12:34   ` Zhang Yi
  1 sibling, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2013-05-07 12:23 UTC (permalink / raw)
  To: 'Thomas Gleixner'
  Cc: linux-kernel, 'Peter Zijlstra', 'Darren Hart',
	'Ingo Molnar', 'Dave Hansen',
	zhang.yi20, wetpzy

> -----Original Message-----
> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Saturday, April 27, 2013 2:26 AM
> To: Zhang Yi
> Cc: linux-kernel@vger.kernel.org; 'Peter Zijlstra'; 'Darren Hart'; 'Ingo Molnar'; 'Dave Hansen';
zhang.yi20@zte.com.cn;
> wetpzy@163.com
> Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> Zhang,
>
> On Fri, 26 Apr 2013, Zhang Yi wrote:
> > At 2013-04-26 04:52:31,"Thomas Gleixner" <tglx@linutronix.de> wrote:
> > >
> > >Unfortunately this did not work out very well.
> > >
> > >1. Your patch now lacks a proper changelog which explains the change
> > >
> > >2. Your patch lacks any newline characters as you can see below
> > >
> >
> > I am so sorry for my mistakes. : )
>
> Nothing to worry about. We all make mistakes! :)
>
> > The futex-keys of processes share futex determined by page-offset, mapping-host, and
> > mapping-index of the user space address.
> > User appications using hugepage for futex may lead to futex-key conflict.
> > Assume there are two or more futexes in diffrent normal pages of the hugepage,
> > and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
>
> Nit-pick: Please format changelog text with a linebreak around 78
> characters. So it looks like this:
>
> The futex-keys of processes share futex determined by page-offset,
> mapping-host, and mapping-index of the user space address. User
> appications using hugepage for futex may lead to futex-key conflict.
>
> Assume there are two or more futexes in diffrent normal pages of the
> hugepage, and each futex has the same offset in its normal page,
> causing all the futexes have the same futex-key.
>
> > In that case, futex may not work well.
>
> Very nice detective work!
>
> > diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
> > --- orig/linux3.9-rc7/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
> > +++ new/linux3.9-rc7/include/linux/futex.h	2013-04-19 16:33:58.725880000 +0000
>
> The canonical diff for patch submission is
>
>   diff -uprN linux3.9-rc7/ linux3.9-rc7.orig/
>
> That results in a patch which can be applied with "patch -p1" from the
> kernel base directory and that's how all our scripts work.
>
> Your's needs to be applied with -p2, so it requires manual
> interaction.
>
> You can verify that by cd'ing into the kernel tree base directory and
> run "patch -p1 < your.patch".
>
> You might have a look at quilt or simply use git, which will do the
> right thing for you and in both cases you do not need a separate
> kernel tree to diff against.
>
> >  #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
> > @@ -36,17 +39,17 @@ union futex_key {
> >  	struct {
> >  		unsigned long pgoff;
> >  		struct inode *inode;
> > -		int offset;
> > +		long offset;
>
> unsigned long please, offset can't be negative. The "int" type of
> offset was silly already.
>
> > +/*
> > +* Get subpage index in compound page, for futex_key.
> > +*/
> > +static inline int get_page_compound_index(struct page *page)
> > +{
> > +	struct page *head_page;
> > +	if (PageHead(page))
> > +		return 0;
>
> If you look at the callsite, then you'll see that this is only called
> when page != page_head. And page_head = compound_head(page). So you
> don't need to double check that.
>
> > +
> > +	head_page = compound_head(page);
>
> Again. The head page is already known, so you can hand it into the
> function.
>
> > +	if (compound_order(head_page) >= MAX_ORDER)
> > +		return page_to_pfn(page) - page_to_pfn(head_page);
> > +	else
> > +		return page - compound_head(page);
> > +}
> > +
>
> Now instead of returning that value, I'd rather hand the futex key
> pointer to the function and let the function add the index
> value. Something like:
>
> static void key_add_compound_idx(key, page, page_head)
> {
> 	...
> }
>
> That makes the code simpler and easier to read.
>
> Thanks,
>
> 	tglx


The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.


Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address
as the 1st mutex's
address, and use the return address with PAGE_SIZ
 added as the 2nd mutex's addres.;
2. The 1st thread initialize the two mutexes with pshared attribute
 and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block o
 the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block o
 the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread can not tak
 the 2nd mutex, an
may block forever.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.c
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>n>
Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>


diff -uprN linux3.9-orig/include/linux/futex.h linux3.9/include/linux/futex.h
--- linux3.9-orig/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
+++ linux3.9/include/linux/futex.h	2013-04-27 08:59:58.932078000 +0000
@@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
  * The key type depends on whether it's a shared or private mapping.
  * Don't rearrange members without looking at hash_futex().
  *
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
+ * There are three cmponents in offset:
  * We use the two low order bits of offset to tell what is the kind of key :
  *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
  *       (no reference on an inode or mm)
@@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
  *	mapped on a file (reference on the underlying inode)
  *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
  *       (but private mapping on an mm, and reference taken on it)
+ * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
+ * The rest hign order bits indicates the index if the page is a
+ * subpage of a compound page.
 */

 #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
@@ -36,17 +39,17 @@ union futex_key {
 	struct {
 		unsigned long pgoff;
 		struct inode *inode;
-		int offset;
+		unsigned long offset;
 	} shared;
 	struct {
 		unsigned long address;
 		struct mm_struct *mm;
-		int offset;
+		unsigned long offset;
 	} private;
 	struct {
 		unsigned long word;
 		void *ptr;
-		int offset;
+		unsigned long offset;
 	} both;
 };

diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
--- linux3.9-orig/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
+++ linux3.9/kernel/futex.c	2013-05-06 16:24:40.403525000 +0000
@@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
 	}
 }

+/*
+* Get subpage index in compound page, and add it into futex_key.
+*/
+static void key_add_compound_idx(union futex_key *key,
+				 struct page *head_page, struct page *page)
+{
+	int compound_idx;
+
+	if (compound_order(head_page) >= MAX_ORDER)
+		compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
+	else
+		compound_idx = page - head_page;
+
+	key->both.offset |= compound_idx << PAGE_SHIFT;
+}
+
 /**
  * get_futex_key() - Get parameters which are the keys for a futex
  * @uaddr:	virtual address of the futex
@@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
  * The key words are stored in *key on success.
  *
  * For shared mappings, it's (page->index, file_inode(vma->vm_file),
- * offset_within_page).  For private mappings, it's (uaddr, current->mm).
+ * page_compound_index and offset_within_page).
+ * For private mappings, it's (uaddr, current->mm).
  * We can usually work out the index without swapping in the page.
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
@@ -366,6 +383,8 @@ again:
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
 		key->shared.inode = page_head->mapping->host;
 		key->shared.pgoff = page_head->index;
+		if (page != page_head)
+			key_add_compound_idx(key, page_head, page);
 	}

 	get_futex_key_refs(key);




^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-26 18:26 ` Thomas Gleixner
  2013-05-07 12:23   ` Zhang Yi
@ 2013-05-07 12:34   ` Zhang Yi
  1 sibling, 0 replies; 20+ messages in thread
From: Zhang Yi @ 2013-05-07 12:34 UTC (permalink / raw)
  To: 'Zhang Yi', 'Thomas Gleixner'
  Cc: linux-kernel, 'Peter Zijlstra', 'Darren Hart',
	'Ingo Molnar', 'Dave Hansen',
	zhang.yi20, wetpzy

It is OK that I send the mail to myself , but there are some wrong while sending to you.
Ignore this mail ,please, I will check and send it again.

> -----Original Message-----
> From: Zhang Yi [mailto:wetpzy@gmail.com]
> Sent: Tuesday, May 07, 2013 8:24 PM
> To: 'Thomas Gleixner'
> Cc: 'linux-kernel@vger.kernel.org'; 'Peter Zijlstra'; 'Darren Hart'; 'Ingo Molnar'; 'Dave Hansen';
'zhang.yi20@zte.com.cn';
> 'wetpzy@163.com'
> Subject: RE: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> > -----Original Message-----
> > From: Thomas Gleixner [mailto:tglx@linutronix.de]
> > Sent: Saturday, April 27, 2013 2:26 AM
> > To: Zhang Yi
> > Cc: linux-kernel@vger.kernel.org; 'Peter Zijlstra'; 'Darren Hart'; 'Ingo Molnar'; 'Dave Hansen';
zhang.yi20@zte.com.cn;
> > wetpzy@163.com
> > Subject: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
> >
> > Zhang,
> >
> > On Fri, 26 Apr 2013, Zhang Yi wrote:
> > > At 2013-04-26 04:52:31,"Thomas Gleixner" <tglx@linutronix.de> wrote:
> > > >
> > > >Unfortunately this did not work out very well.
> > > >
> > > >1. Your patch now lacks a proper changelog which explains the change
> > > >
> > > >2. Your patch lacks any newline characters as you can see below
> > > >
> > >
> > > I am so sorry for my mistakes. : )
> >
> > Nothing to worry about. We all make mistakes! :)
> >
> > > The futex-keys of processes share futex determined by page-offset, mapping-host, and
> > > mapping-index of the user space address.
> > > User appications using hugepage for futex may lead to futex-key conflict.
> > > Assume there are two or more futexes in diffrent normal pages of the hugepage,
> > > and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
> >
> > Nit-pick: Please format changelog text with a linebreak around 78
> > characters. So it looks like this:
> >
> > The futex-keys of processes share futex determined by page-offset,
> > mapping-host, and mapping-index of the user space address. User
> > appications using hugepage for futex may lead to futex-key conflict.
> >
> > Assume there are two or more futexes in diffrent normal pages of the
> > hugepage, and each futex has the same offset in its normal page,
> > causing all the futexes have the same futex-key.
> >
> > > In that case, futex may not work well.
> >
> > Very nice detective work!
> >
> > > diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
> > > --- orig/linux3.9-rc7/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
> > > +++ new/linux3.9-rc7/include/linux/futex.h	2013-04-19 16:33:58.725880000 +0000
> >
> > The canonical diff for patch submission is
> >
> >   diff -uprN linux3.9-rc7/ linux3.9-rc7.orig/
> >
> > That results in a patch which can be applied with "patch -p1" from the
> > kernel base directory and that's how all our scripts work.
> >
> > Your's needs to be applied with -p2, so it requires manual
> > interaction.
> >
> > You can verify that by cd'ing into the kernel tree base directory and
> > run "patch -p1 < your.patch".
> >
> > You might have a look at quilt or simply use git, which will do the
> > right thing for you and in both cases you do not need a separate
> > kernel tree to diff against.
> >
> > >  #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
> > > @@ -36,17 +39,17 @@ union futex_key {
> > >  	struct {
> > >  		unsigned long pgoff;
> > >  		struct inode *inode;
> > > -		int offset;
> > > +		long offset;
> >
> > unsigned long please, offset can't be negative. The "int" type of
> > offset was silly already.
> >
> > > +/*
> > > +* Get subpage index in compound page, for futex_key.
> > > +*/
> > > +static inline int get_page_compound_index(struct page *page)
> > > +{
> > > +	struct page *head_page;
> > > +	if (PageHead(page))
> > > +		return 0;
> >
> > If you look at the callsite, then you'll see that this is only called
> > when page != page_head. And page_head = compound_head(page). So you
> > don't need to double check that.
> >
> > > +
> > > +	head_page = compound_head(page);
> >
> > Again. The head page is already known, so you can hand it into the
> > function.
> >
> > > +	if (compound_order(head_page) >= MAX_ORDER)
> > > +		return page_to_pfn(page) - page_to_pfn(head_page);
> > > +	else
> > > +		return page - compound_head(page);
> > > +}
> > > +
> >
> > Now instead of returning that value, I'd rather hand the futex key
> > pointer to the function and let the function add the index
> > value. Something like:
> >
> > static void key_add_compound_idx(key, page, page_head)
> > {
> > 	...
> > }
> >
> > That makes the code simpler and easier to read.
> >
> > Thanks,
> >
> > 	tglx
>
>
> The futex-keys of processes share futex determined by page-offset,
> mapping-host, and mapping-index of the user space address. User
> appications using hugepage for futex may lead to futex-key conflict.
>
> Assume there are two or more futexes in diffrent normal pages of the
> hugepage, and each futex has the same offset in its normal page,
> causing all the futexes have the same futex-key.
>
>
> Steps to reproduce the bug:
> 1. The 1st thread map a file of hugetlbfs, and use the return address
> as the 1st mutex's
> address, and use the return address with PAGE_SIZ
>  added as the 2nd mutex's addres.;
> 2. The 1st thread initialize the two mutexes with pshared attribute
>  and lock the two mutexes.
> 3. The 1st thread create the 2nd thread, and the 2nd thread block o
>  the 1st mutex.
> 4. The 1st thread create the 3rd thread, and the 3rd thread block o
>  the 2nd mutex.
> 5. The 1st thread unlock the 2nd mutex, the 3rd thread can not tak
>  the 2nd mutex, an
> may block forever.
>
>
> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> Tested-by: Ma Chenggong <ma.chenggong@zte.com.c
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Darren Hart <dvhart@linux.intel.com>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>n>
> Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
> Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
>
>
> diff -uprN linux3.9-orig/include/linux/futex.h linux3.9/include/linux/futex.h
> --- linux3.9-orig/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
> +++ linux3.9/include/linux/futex.h	2013-04-27 08:59:58.932078000 +0000
> @@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
>   * The key type depends on whether it's a shared or private mapping.
>   * Don't rearrange members without looking at hash_futex().
>   *
> - * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
> + * There are three cmponents in offset:
>   * We use the two low order bits of offset to tell what is the kind of key :
>   *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
>   *       (no reference on an inode or mm)
> @@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
>   *	mapped on a file (reference on the underlying inode)
>   *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
>   *       (but private mapping on an mm, and reference taken on it)
> + * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
> + * The rest hign order bits indicates the index if the page is a
> + * subpage of a compound page.
>  */
>
>  #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
> @@ -36,17 +39,17 @@ union futex_key {
>  	struct {
>  		unsigned long pgoff;
>  		struct inode *inode;
> -		int offset;
> +		unsigned long offset;
>  	} shared;
>  	struct {
>  		unsigned long address;
>  		struct mm_struct *mm;
> -		int offset;
> +		unsigned long offset;
>  	} private;
>  	struct {
>  		unsigned long word;
>  		void *ptr;
> -		int offset;
> +		unsigned long offset;
>  	} both;
>  };
>
> diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> --- linux3.9-orig/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
> +++ linux3.9/kernel/futex.c	2013-05-06 16:24:40.403525000 +0000
> @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
>  	}
>  }
>
> +/*
> +* Get subpage index in compound page, and add it into futex_key.
> +*/
> +static void key_add_compound_idx(union futex_key *key,
> +				 struct page *head_page, struct page *page)
> +{
> +	int compound_idx;
> +
> +	if (compound_order(head_page) >= MAX_ORDER)
> +		compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> +	else
> +		compound_idx = page - head_page;
> +
> +	key->both.offset |= compound_idx << PAGE_SHIFT;
> +}
> +
>  /**
>   * get_futex_key() - Get parameters which are the keys for a futex
>   * @uaddr:	virtual address of the futex
> @@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
>   * The key words are stored in *key on success.
>   *
>   * For shared mappings, it's (page->index, file_inode(vma->vm_file),
> - * offset_within_page).  For private mappings, it's (uaddr, current->mm).
> + * page_compound_index and offset_within_page).
> + * For private mappings, it's (uaddr, current->mm).
>   * We can usually work out the index without swapping in the page.
>   *
>   * lock_page() might sleep, the caller should not hold a spinlock.
> @@ -366,6 +383,8 @@ again:
>  		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
>  		key->shared.inode = page_head->mapping->host;
>  		key->shared.pgoff = page_head->index;
> +		if (page != page_head)
> +			key_add_compound_idx(key, page_head, page);
>  	}
>
>  	get_futex_key_refs(key);




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-07 12:23   ` Zhang Yi
@ 2013-05-07 15:20     ` Mel Gorman
  2013-05-07 15:24       ` Thomas Gleixner
  2013-05-10  9:08       ` zhang.yi20
  0 siblings, 2 replies; 20+ messages in thread
From: Mel Gorman @ 2013-05-07 15:20 UTC (permalink / raw)
  To: Zhang Yi
  Cc: 'Thomas Gleixner', linux-kernel, 'Peter Zijlstra',
	'Darren Hart', 'Ingo Molnar',
	'Dave Hansen',
	zhang.yi20, wetpzy

On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> --- linux3.9-orig/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
> +++ linux3.9/kernel/futex.c	2013-05-06 16:24:40.403525000 +0000
> @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
>  	}
>  }
> 
> +/*
> +* Get subpage index in compound page, and add it into futex_key.
> +*/
> +static void key_add_compound_idx(union futex_key *key,
> +				 struct page *head_page, struct page *page)
> +{
> +	int compound_idx;
> +
> +	if (compound_order(head_page) >= MAX_ORDER)
> +		compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> +	else
> +		compound_idx = page - head_page;
> +
> +	key->both.offset |= compound_idx << PAGE_SHIFT;
> +}
> +

This implicitely assumies it is dealing with a hugetlbfs page. Today, it
is the case that an inode-based futex with PageCompound is a hugetlbfs
page but that could change in the future if THP ever backs files. This
would then break again except it would be harder to fix because THP pages
can be collapsed underneath you after the futex key has been generated.

As this problem is hugetlbfs-specific should the fix be firmly in hugetlbfs
land? Something like the following untested and only partial diff? Is the
use of PageCompound in the futex path like this going to be problematic?

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 16e4e9a..f9c33d3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -348,6 +348,17 @@ static inline int hstate_index(struct hstate *h)
 	return h - hstates;
 }
 
+pgoff_t __basepage_index(struct page *page);
+
+/* Return page->index in PAGE_SIZE units */
+static inline pgoff_t basepage_index(struct page *page)
+{
+	if (!PageCompound(page))
+		return page->index;
+
+	return __basepage_index(page);
+}
+
 #else
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -365,6 +376,10 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
 {
 	return 1;
 }
+static inline pgoff_t basepage_index(struct page *page)
+{
+	return page->index;
+}
 #define hstate_index_to_shift(index) 0
 #define hstate_index(h) 0
 #endif
diff --git a/kernel/futex.c b/kernel/futex.c
index b26dcfc..97beb5d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -61,6 +61,7 @@
 #include <linux/nsproxy.h>
 #include <linux/ptrace.h>
 #include <linux/sched/rt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/futex.h>
 
@@ -365,7 +366,7 @@ again:
 	} else {
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
 		key->shared.inode = page_head->mapping->host;
-		key->shared.pgoff = page_head->index;
+		key->shared.pgoff = basepage_index(page_head);
 	}
 
 	get_futex_key_refs(key);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1a12f5b..ddbad35 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -690,6 +690,23 @@ int PageHuge(struct page *page)
 }
 EXPORT_SYMBOL_GPL(PageHuge);
 
+pgoff_t __basepage_index(struct page *page)
+{
+	struct page *page_head = compound_head(page);
+	pgoff_t index = page_index(page_head);
+	int compound_idx;
+
+	if (!PageHuge(page_head))
+		return page_index(page);
+
+	if (compound_order(page_head) >= MAX_ORDER)
+		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
+	else
+		compound_idx = page - head_page;
+
+	return (index << page_hstate(page_head)->order) + compound_idx;
+}
+
 static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 {
 	struct page *page;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-07 15:20     ` Mel Gorman
@ 2013-05-07 15:24       ` Thomas Gleixner
  2013-05-07 15:54         ` Mel Gorman
  2013-05-10  9:08       ` zhang.yi20
  1 sibling, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2013-05-07 15:24 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Zhang Yi, linux-kernel, 'Peter Zijlstra',
	'Darren Hart', 'Ingo Molnar',
	'Dave Hansen',
	zhang.yi20, wetpzy



On Tue, 7 May 2013, Mel Gorman wrote:

> On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > --- linux3.9-orig/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
> > +++ linux3.9/kernel/futex.c	2013-05-06 16:24:40.403525000 +0000
> > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> >  	}
> >  }
> > 
> > +/*
> > +* Get subpage index in compound page, and add it into futex_key.
> > +*/
> > +static void key_add_compound_idx(union futex_key *key,
> > +				 struct page *head_page, struct page *page)
> > +{
> > +	int compound_idx;
> > +
> > +	if (compound_order(head_page) >= MAX_ORDER)
> > +		compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > +	else
> > +		compound_idx = page - head_page;
> > +
> > +	key->both.offset |= compound_idx << PAGE_SHIFT;
> > +}
> > +
> 
> This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> is the case that an inode-based futex with PageCompound is a hugetlbfs
> page but that could change in the future if THP ever backs files. This
> would then break again except it would be harder to fix because THP pages
> can be collapsed underneath you after the futex key has been generated.
> 
> As this problem is hugetlbfs-specific should the fix be firmly in hugetlbfs
> land? Something like the following untested and only partial diff? Is the
> use of PageCompound in the futex path like this going to be problematic?

Why should it ?
 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 16e4e9a..f9c33d3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -348,6 +348,17 @@ static inline int hstate_index(struct hstate *h)
>  	return h - hstates;
>  }
>  
> +pgoff_t __basepage_index(struct page *page);
> +
> +/* Return page->index in PAGE_SIZE units */
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> +	if (!PageCompound(page))
> +		return page->index;
> +
> +	return __basepage_index(page);
> +}
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page_node(h, nid) NULL
> @@ -365,6 +376,10 @@ static inline unsigned int pages_per_huge_page(struct hstate *h)
>  {
>  	return 1;
>  }
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> +	return page->index;
> +}
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
>  #endif
> diff --git a/kernel/futex.c b/kernel/futex.c
> index b26dcfc..97beb5d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -61,6 +61,7 @@
>  #include <linux/nsproxy.h>
>  #include <linux/ptrace.h>
>  #include <linux/sched/rt.h>
> +#include <linux/hugetlb.h>
>  
>  #include <asm/futex.h>
>  
> @@ -365,7 +366,7 @@ again:
>  	} else {
>  		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
>  		key->shared.inode = page_head->mapping->host;
> -		key->shared.pgoff = page_head->index;
> +		key->shared.pgoff = basepage_index(page_head);

  That want's to be  basepage_index(page), right ?

>  	}
>  
>  	get_futex_key_refs(key);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1a12f5b..ddbad35 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -690,6 +690,23 @@ int PageHuge(struct page *page)
>  }
>  EXPORT_SYMBOL_GPL(PageHuge);
>  
> +pgoff_t __basepage_index(struct page *page)
> +{
> +	struct page *page_head = compound_head(page);
> +	pgoff_t index = page_index(page_head);
> +	int compound_idx;
> +
> +	if (!PageHuge(page_head))
> +		return page_index(page);
> +
> +	if (compound_order(page_head) >= MAX_ORDER)
> +		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
> +	else
> +		compound_idx = page - head_page;
> +
> +	return (index << page_hstate(page_head)->order) + compound_idx;
> +}
> +
>  static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
>  {
>  	struct page *page;
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-07 15:24       ` Thomas Gleixner
@ 2013-05-07 15:54         ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2013-05-07 15:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Zhang Yi, linux-kernel, 'Peter Zijlstra',
	'Darren Hart', 'Ingo Molnar',
	'Dave Hansen',
	zhang.yi20, wetpzy

On Tue, May 07, 2013 at 05:24:57PM +0200, Thomas Gleixner wrote:
> 
> 
> On Tue, 7 May 2013, Mel Gorman wrote:
> 
> > On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > > --- linux3.9-orig/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
> > > +++ linux3.9/kernel/futex.c	2013-05-06 16:24:40.403525000 +0000
> > > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> > >  	}
> > >  }
> > > 
> > > +/*
> > > +* Get subpage index in compound page, and add it into futex_key.
> > > +*/
> > > +static void key_add_compound_idx(union futex_key *key,
> > > +				 struct page *head_page, struct page *page)
> > > +{
> > > +	int compound_idx;
> > > +
> > > +	if (compound_order(head_page) >= MAX_ORDER)
> > > +		compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > > +	else
> > > +		compound_idx = page - head_page;
> > > +
> > > +	key->both.offset |= compound_idx << PAGE_SHIFT;
> > > +}
> > > +
> > 
> > This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> > is the case that an inode-based futex with PageCompound is a hugetlbfs
> > page but that could change in the future if THP ever backs files. This
> > would then break again except it would be harder to fix because THP pages
> > can be collapsed underneath you after the futex key has been generated.
> > 
> > As this problem is hugetlbfs-specific should the fix be firmly in hugetlbfs
> > land? Something like the following untested and only partial diff? Is the
> > use of PageCompound in the futex path like this going to be problematic?
> 
> Why should it ?
>  

The comment for it states that it is "generally not used in hot code
paths" but it's a light-weight check that the cache lines should already
be fetched for. I doubt that the overhead of this check versus
page_head == page is noticable.

> > @@ -365,7 +366,7 @@ again:
> >  	} else {
> >  		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> >  		key->shared.inode = page_head->mapping->host;
> > -		key->shared.pgoff = page_head->index;
> > +		key->shared.pgoff = basepage_index(page_head);
> 
>   That want's to be  basepage_index(page), right ?
> 

BAH, yes.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-07 15:20     ` Mel Gorman
  2013-05-07 15:24       ` Thomas Gleixner
@ 2013-05-10  9:08       ` zhang.yi20
  2013-05-10  9:42         ` Mel Gorman
  1 sibling, 1 reply; 20+ messages in thread
From: zhang.yi20 @ 2013-05-10  9:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: 'Dave Hansen', 'Darren Hart',
	linux-kernel, 'Ingo Molnar', 'Peter Zijlstra',
	'Thomas Gleixner',
	wetpzy, Zhang Yi



Mel Gorman <mgorman@suse.de> wrote on 2013/05/07 23:20:07:

>
> Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > --- linux3.9-orig/kernel/futex.c   2013-04-15 00:45:16.000000000 +0000
> > +++ linux3.9/kernel/futex.c   2013-05-06 16:24:40.403525000 +0000
> > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> >     }
> >  }
> >
> > +/*
> > +* Get subpage index in compound page, and add it into futex_key.
> > +*/
> > +static void key_add_compound_idx(union futex_key *key,
> > +             struct page *head_page, struct page *page)
> > +{
> > +   int compound_idx;
> > +
> > +   if (compound_order(head_page) >= MAX_ORDER)
> > +      compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > +   else
> > +      compound_idx = page - head_page;
> > +
> > +   key->both.offset |= compound_idx << PAGE_SHIFT;
> > +}
> > +
>
> This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> is the case that an inode-based futex with PageCompound is a hugetlbfs
> page but that could change in the future if THP ever backs files. This
> would then break again except it would be harder to fix because THP pages
> can be collapsed underneath you after the futex key has been generated.
>
> As this problem is hugetlbfs-specific should the fix be firmly in
hugetlbfs

I think we should do.
Eg, user applications want high performance, they may use DPDK  which using
hugetlbfs.


Should I rework the patch like the following code, and test it?

> land? Something like the following untested and only partial diff? Is the
> use of PageCompound in the futex path like this going to be problematic?
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 16e4e9a..f9c33d3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -348,6 +348,17 @@ static inline int hstate_index(struct hstate *h)
>     return h - hstates;
>  }
>
> +pgoff_t __basepage_index(struct page *page);
> +
> +/* Return page->index in PAGE_SIZE units */
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> +   if (!PageCompound(page))
> +      return page->index;
> +
> +   return __basepage_index(page);
> +}
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page_node(h, nid) NULL
> @@ -365,6 +376,10 @@ static inline unsigned int pages_per_huge_page
> (struct hstate *h)
>  {
>     return 1;
>  }
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> +   return page->index;
> +}
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
>  #endif
> diff --git a/kernel/futex.c b/kernel/futex.c
> index b26dcfc..97beb5d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -61,6 +61,7 @@
>  #include <linux/nsproxy.h>
>  #include <linux/ptrace.h>
>  #include <linux/sched/rt.h>
> +#include <linux/hugetlb.h>
>
>  #include <asm/futex.h>
>
> @@ -365,7 +366,7 @@ again:
>     } else {
>        key->both.offset |= FUT_OFF_INODE; /* inode-based key */
>        key->shared.inode = page_head->mapping->host;
> -      key->shared.pgoff = page_head->index;
> +      key->shared.pgoff = basepage_index(page_head);
>     }
>
>     get_futex_key_refs(key);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1a12f5b..ddbad35 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -690,6 +690,23 @@ int PageHuge(struct page *page)
>  }
>  EXPORT_SYMBOL_GPL(PageHuge);
>
> +pgoff_t __basepage_index(struct page *page)
> +{
> +   struct page *page_head = compound_head(page);
> +   pgoff_t index = page_index(page_head);
> +   int compound_idx;
> +
> +   if (!PageHuge(page_head))
> +      return page_index(page);
> +
> +   if (compound_order(page_head) >= MAX_ORDER)
> +      compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
> +   else
> +      compound_idx = page - head_page;
> +
> +   return (index << page_hstate(page_head)->order) + compound_idx;
> +}
> +
>  static struct page *alloc_fresh_huge_page_node(struct hstate *h, int
nid)
>  {
>     struct page *page;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-10  9:08       ` zhang.yi20
@ 2013-05-10  9:42         ` Mel Gorman
  0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2013-05-10  9:42 UTC (permalink / raw)
  To: zhang.yi20
  Cc: 'Dave Hansen', 'Darren Hart',
	linux-kernel, 'Ingo Molnar', 'Peter Zijlstra',
	'Thomas Gleixner',
	wetpzy, Zhang Yi

On Fri, May 10, 2013 at 05:08:30PM +0800, zhang.yi20@zte.com.cn wrote:
> 
> 
> Mel Gorman <mgorman@suse.de> wrote on 2013/05/07 23:20:07:
> 
> >
> > Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
> >
> > On Tue, May 07, 2013 at 08:23:48PM +0800, Zhang Yi wrote:
> > > diff -uprN linux3.9-orig/kernel/futex.c linux3.9/kernel/futex.c
> > > --- linux3.9-orig/kernel/futex.c   2013-04-15 00:45:16.000000000 +0000
> > > +++ linux3.9/kernel/futex.c   2013-05-06 16:24:40.403525000 +0000
> > > @@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
> > >     }
> > >  }
> > >
> > > +/*
> > > +* Get subpage index in compound page, and add it into futex_key.
> > > +*/
> > > +static void key_add_compound_idx(union futex_key *key,
> > > +             struct page *head_page, struct page *page)
> > > +{
> > > +   int compound_idx;
> > > +
> > > +   if (compound_order(head_page) >= MAX_ORDER)
> > > +      compound_idx = page_to_pfn(page) - page_to_pfn(head_page);
> > > +   else
> > > +      compound_idx = page - head_page;
> > > +
> > > +   key->both.offset |= compound_idx << PAGE_SHIFT;
> > > +}
> > > +
> >
> > This implicitely assumies it is dealing with a hugetlbfs page. Today, it
> > is the case that an inode-based futex with PageCompound is a hugetlbfs
> > page but that could change in the future if THP ever backs files. This
> > would then break again except it would be harder to fix because THP pages
> > can be collapsed underneath you after the futex key has been generated.
> >
> > As this problem is hugetlbfs-specific should the fix be firmly in
> hugetlbfs
> 
> I think we should do.
> Eg, user applications want high performance, they may use DPDK  which using
> hugetlbfs.
> 
> 
> Should I rework the patch like the following code, and test it?
> 

Yes please, making sure to fix the bug Thomas pointed out.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-06-25 18:23             ` Darren Hart
@ 2013-06-25 19:41               ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2013-06-25 19:41 UTC (permalink / raw)
  To: Darren Hart
  Cc: Zhang Yi, linux-kernel, 'Mel Gorman',
	'Ingo Molnar', 'Peter Zijlstra',
	zhang.yi20

On Tue, 25 Jun 2013, Darren Hart wrote:
> Otherwise this looks ready to me. Thomas, do you want a resend with
> commit message corrections or do you prefer to integrate those 
> yourself?
> 
> With the above fixes:
> 
> Acked-by: Darren Hart <dvhart@linux.intel.com>

I pick it up and fix it. We delayed it long enough already.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-06-25 13:19           ` Zhang Yi
@ 2013-06-25 18:23             ` Darren Hart
  2013-06-25 19:41               ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Darren Hart @ 2013-06-25 18:23 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-kernel, 'Mel Gorman', 'Ingo Molnar',
	'Peter Zijlstra', 'Thomas Gleixner',
	zhang.yi20

Hi Zhang Yi,

Thanks for turning around an update so quickly. The code itself looks
ready to me. We try to maintain a high level of quality in the commit
message as well to help with understanding complex systems such as
futexes.

On Tue, 2013-06-25 at 21:19 +0800, Zhang Yi wrote:
> The futex-keys of processes share futex determined by page-offset,

"share futex" is rather confusing to me. Maybe:

The futex-key is determined by page-offset, ...

> mapping-host, and mapping-index of the user space address. User
> appications using hugepage for futex may lead to futex-key conflict.

Please take care with spelling. Grammar issues aren't as critical, but
please enable spell checking in your editor.

applications
hugepages for futexes
conflicts

> 
> Assume there are two or more futexes in diffrent normal pages of the

different

> hugepage, and each futex has the same offset in its normal page,
> causing all the futexes have the same futex-key.

then all the futexes will have the same futex-key.

> 
> This patch adds the normal page index in the compound page into
> the pgoff of futex-key.


of the futex_key.

> 
> Steps to reproduce the bug:
> 1. The 1st thread map a file of hugetlbfs, and use the return address

maps
uses

> as the 1st mutex's address, and use the return address with PAGE_SIZE

uses

> added as the 2nd mutex's address.
> 2. The 1st thread initialize the two mutexes with pshared attribute,

initializes
the pshared attribute,

> and lock the two mutexes.

locks

> 3. The 1st thread create the 2nd thread, and the 2nd thread block on

creates
blocks

> the 1st mutex.
> 4. The 1st thread create the 3rd thread, and the 3rd thread block on

creates
blocks

> the 2nd mutex.
> 5. The 1st thread unlock the 2nd mutex, the 3rd thread cannot take

unlocks

> the 2nd mutex, and may block forever.
> 
> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
> 

Otherwise this looks ready to me. Thomas, do you want a resend with
commit message corrections or do you prefer to integrate those 
yourself?

With the above fixes:

Acked-by: Darren Hart <dvhart@linux.intel.com>


> diff -uprN linux-3.10-rc7.org/include/linux/hugetlb.h linux-3.10-rc7/include/linux/hugetlb.h
> --- linux-3.10-rc7.org/include/linux/hugetlb.h	2013-06-22 19:47:31.000000000 +0000
> +++ linux-3.10-rc7/include/linux/hugetlb.h	2013-06-25 09:40:06.256556000 +0000
> @@ -358,6 +358,17 @@ static inline int hstate_index(struct hs
>  	return h - hstates;
>  }
> 
> +pgoff_t __basepage_index(struct page *page);
> +
> +/* Return page->index in PAGE_SIZE units */
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> +	if (!PageCompound(page))
> +		return page->index;
> +
> +	return __basepage_index(page);
> +}
> +
>  #else	/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  #define alloc_huge_page_node(h, nid) NULL
> @@ -378,6 +389,11 @@ static inline unsigned int pages_per_hug
>  }
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
> +
> +static inline pgoff_t basepage_index(struct page *page)
> +{
> +	return page->index;
> +}
>  #endif	/* CONFIG_HUGETLB_PAGE */
> 
>  #endif /* _LINUX_HUGETLB_H */
> diff -uprN linux-3.10-rc7.org/kernel/futex.c linux-3.10-rc7/kernel/futex.c
> --- linux-3.10-rc7.org/kernel/futex.c	2013-06-22 19:47:31.000000000 +0000
> +++ linux-3.10-rc7/kernel/futex.c	2013-06-25 09:35:59.615425000 +0000
> @@ -61,6 +61,7 @@
>  #include <linux/nsproxy.h>
>  #include <linux/ptrace.h>
>  #include <linux/sched/rt.h>
> +#include <linux/hugetlb.h>
> 
>  #include <asm/futex.h>
> 
> @@ -365,7 +366,7 @@ again:
>  	} else {
>  		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
>  		key->shared.inode = page_head->mapping->host;
> -		key->shared.pgoff = page_head->index;
> +		key->shared.pgoff = basepage_index(page);
>  	}
> 
>  	get_futex_key_refs(key);
> diff -uprN linux-3.10-rc7.org/mm/hugetlb.c linux-3.10-rc7/mm/hugetlb.c
> --- linux-3.10-rc7.org/mm/hugetlb.c	2013-06-25 09:38:53.435151000 +0000
> +++ linux-3.10-rc7/mm/hugetlb.c	2013-06-25 09:39:30.375701000 +0000
> @@ -690,6 +690,23 @@ int PageHuge(struct page *page)
>  }
>  EXPORT_SYMBOL_GPL(PageHuge);
> 
> +pgoff_t __basepage_index(struct page *page)
> +{
> +	struct page *page_head = compound_head(page);
> +	pgoff_t index = page_index(page_head);
> +	unsigned long compound_idx;
> +
> +	if (!PageHuge(page_head))
> +		return page_index(page);
> +
> +	if (compound_order(page_head) >= MAX_ORDER)
> +		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
> +	else
> +		compound_idx = page - page_head;
> +
> +	return (index << compound_order(page_head)) + compound_idx;
> +}
> +
>  static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
>  {
>  	struct page *page;
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-06-24 21:02         ` Darren Hart
@ 2013-06-25 13:19           ` Zhang Yi
  2013-06-25 18:23             ` Darren Hart
  0 siblings, 1 reply; 20+ messages in thread
From: Zhang Yi @ 2013-06-25 13:19 UTC (permalink / raw)
  To: 'Darren Hart'
  Cc: linux-kernel, 'Mel Gorman', 'Ingo Molnar',
	'Peter Zijlstra', 'Thomas Gleixner',
	zhang.yi20

The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.

This patch adds the normal page index in the compound page into
the pgoff of futex-key.

Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address
as the 1st mutex's address, and use the return address with PAGE_SIZE
added as the 2nd mutex's address.
2. The 1st thread initialize the two mutexes with pshared attribute,
and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block on
the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block on
the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread cannot take
the 2nd mutex, and may block forever.

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>

diff -uprN linux-3.10-rc7.org/include/linux/hugetlb.h linux-3.10-rc7/include/linux/hugetlb.h
--- linux-3.10-rc7.org/include/linux/hugetlb.h	2013-06-22 19:47:31.000000000 +0000
+++ linux-3.10-rc7/include/linux/hugetlb.h	2013-06-25 09:40:06.256556000 +0000
@@ -358,6 +358,17 @@ static inline int hstate_index(struct hs
 	return h - hstates;
 }

+pgoff_t __basepage_index(struct page *page);
+
+/* Return page->index in PAGE_SIZE units */
+static inline pgoff_t basepage_index(struct page *page)
+{
+	if (!PageCompound(page))
+		return page->index;
+
+	return __basepage_index(page);
+}
+
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -378,6 +389,11 @@ static inline unsigned int pages_per_hug
 }
 #define hstate_index_to_shift(index) 0
 #define hstate_index(h) 0
+
+static inline pgoff_t basepage_index(struct page *page)
+{
+	return page->index;
+}
 #endif	/* CONFIG_HUGETLB_PAGE */

 #endif /* _LINUX_HUGETLB_H */
diff -uprN linux-3.10-rc7.org/kernel/futex.c linux-3.10-rc7/kernel/futex.c
--- linux-3.10-rc7.org/kernel/futex.c	2013-06-22 19:47:31.000000000 +0000
+++ linux-3.10-rc7/kernel/futex.c	2013-06-25 09:35:59.615425000 +0000
@@ -61,6 +61,7 @@
 #include <linux/nsproxy.h>
 #include <linux/ptrace.h>
 #include <linux/sched/rt.h>
+#include <linux/hugetlb.h>

 #include <asm/futex.h>

@@ -365,7 +366,7 @@ again:
 	} else {
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
 		key->shared.inode = page_head->mapping->host;
-		key->shared.pgoff = page_head->index;
+		key->shared.pgoff = basepage_index(page);
 	}

 	get_futex_key_refs(key);
diff -uprN linux-3.10-rc7.org/mm/hugetlb.c linux-3.10-rc7/mm/hugetlb.c
--- linux-3.10-rc7.org/mm/hugetlb.c	2013-06-25 09:38:53.435151000 +0000
+++ linux-3.10-rc7/mm/hugetlb.c	2013-06-25 09:39:30.375701000 +0000
@@ -690,6 +690,23 @@ int PageHuge(struct page *page)
 }
 EXPORT_SYMBOL_GPL(PageHuge);

+pgoff_t __basepage_index(struct page *page)
+{
+	struct page *page_head = compound_head(page);
+	pgoff_t index = page_index(page_head);
+	unsigned long compound_idx;
+
+	if (!PageHuge(page_head))
+		return page_index(page);
+
+	if (compound_order(page_head) >= MAX_ORDER)
+		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
+	else
+		compound_idx = page - page_head;
+
+	return (index << compound_order(page_head)) + compound_idx;
+}
+
 static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 {
 	struct page *page;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-16  2:00       ` zhang.yi20
@ 2013-06-24 21:02         ` Darren Hart
  2013-06-25 13:19           ` Zhang Yi
  0 siblings, 1 reply; 20+ messages in thread
From: Darren Hart @ 2013-06-24 21:02 UTC (permalink / raw)
  To: zhang.yi20
  Cc: linux-kernel, Mel Gorman, 'Ingo Molnar',
	'Peter Zijlstra', 'Thomas Gleixner',
	Zhang Yi

On Thu, 2013-05-16 at 10:00 +0800, zhang.yi20@zte.com.cn wrote:
> 
> Darren Hart <dvhart@linux.intel.com> wrote on 2013/05/16 09:30:31:
> 
> >
> > pgoff_t is an unsigned long, and page_to_pfn() returns an unsigned long.
> > Since compound_idx can be assigned from page_to_pfn() and it is added
> > with index in the return value, unsigned long seems like a better choice
> > to me. Is there a specific reason you prefer an int? It might be "fine"
> > but it is likely to raise eyebrows whenever someone read through it.
> >
> No other specific  reason. I just think that int is enough.

Hi Yi,

I believe this patch is still pending a final version from you. Mel and
I both asked after the change of compound_idx from int to unsigned long.
While "int" may be adequate, for consistency with the return type
assigned and such, the unsigned long is preferred. Could you respin and
send along the patch so Thomas can pull it in?

Thanks,

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-16  1:30     ` Darren Hart
@ 2013-05-16  2:00       ` zhang.yi20
  2013-06-24 21:02         ` Darren Hart
  0 siblings, 1 reply; 20+ messages in thread
From: zhang.yi20 @ 2013-05-16  2:00 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Mel Gorman, 'Ingo Molnar',
	'Peter Zijlstra', 'Thomas Gleixner',
	Zhang Yi



Darren Hart <dvhart@linux.intel.com> wrote on 2013/05/16 09:30:31:

>
> pgoff_t is an unsigned long, and page_to_pfn() returns an unsigned long.
> Since compound_idx can be assigned from page_to_pfn() and it is added
> with index in the return value, unsigned long seems like a better choice
> to me. Is there a specific reason you prefer an int? It might be "fine"
> but it is likely to raise eyebrows whenever someone read through it.
>
No other specific  reason. I just think that int is enough.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-05-15 14:20 ` Mel Gorman
@ 2013-05-16  1:16   ` zhang.yi20
  2013-05-16  1:30     ` Darren Hart
  0 siblings, 1 reply; 20+ messages in thread
From: zhang.yi20 @ 2013-05-16  1:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: 'Darren Hart', linux-kernel, 'Ingo Molnar',
	'Peter Zijlstra', 'Thomas Gleixner',
	Zhang Yi



Mel Gorman <mgorman@suse.de> wrote on 2013/05/15 22:20:35:


> Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
>
> On Wed, May 15, 2013 at 09:57:03PM +0800, Zhang Yi wrote:
> > The futex-keys of processes share futex determined by page-offset,
> > mapping-host, and mapping-index of the user space address. User
> > appications using hugepage for futex may lead to futex-key conflict.
> >
> > Assume there are two or more futexes in diffrent normal pages of the
> > hugepage, and each futex has the same offset in its normal page,
> > causing all the futexes have the same futex-key.
> >
> > This patch adds the normal page index in the compound page into
> > the pgoff of futex-key.
> >
> > Steps to reproduce the bug:
> > 1. The 1st thread map a file of hugetlbfs, and use the return address
> > as the 1st mutex's address, and use the return address with PAGE_SIZE
> > added as the 2nd mutex's address.
> > 2. The 1st thread initialize the two mutexes with pshared attribute,
> > and lock the two mutexes.
> > 3. The 1st thread create the 2nd thread, and the 2nd thread block on
> > the 1st mutex.
> > 4. The 1st thread create the 3rd thread, and the 3rd thread block on
> > the 2nd mutex.
> > 5. The 1st thread unlock the 2nd mutex, the 3rd thread cannot take
> > the 2nd mutex, and may block forever.
> >
> >
> > Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> > Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > Reviewed-by: Darren Hart <dvhart@linux.intel.com>
> > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Reviewed-by: Mel Gorman <mgorman@suse.de>
> > Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
> > Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
> > Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
> > Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
> >
>
> Did all these people really review it? I just whinged about the last
patch
> and didn't put a Reviewed-by on it. That said, I don't actually have a
> problem with this patch and I assumed it passed your testing so
>
I mistakenly think that I should list all the people here. : )
Shall I cleanup the name list and send the patch again?

> Reviewed-by: Mel Gorman <mgorman@suse.de>
>
> The others might not agree though.
>
> I note the conversion from int offset to long offset in futex_key appears
> to have gotten lost. Is that in a separate cleanup patch now?

In old patch, I add the compound index into offset, so I make the offset
from int
to long. It is unnecessary for this patch.

>
> --
> Mel Gorman
> SUSE Labs


BTW, Does anyone have other advices for the patch?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-19  2:45               ` Darren Hart
@ 2013-04-19  7:03                 ` zhang.yi20
  0 siblings, 0 replies; 20+ messages in thread
From: zhang.yi20 @ 2013-04-19  7:03 UTC (permalink / raw)
  To: Darren Hart
  Cc: Dave Hansen, Dave Hansen, linux-kernel, linux-mm, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="GB2312", Size: 6221 bytes --]

Darren Hart <dvhart@linux.intel.com> wrote on 2013/04/19 10:45:00:

> > 
> > BTW, have you seen the testcase in my other mail?  It seems to be 
> > rejected by LKML.
> > 
> 
> I did not receive it, did you also CC me?
> 
> -- 
> Darren Hart
> Intel Open Source Technology Center
> Yocto Project - Technical Lead - Linux Kernel


Ok£¬ I found that the previous mail was rejected because it had Chinese 
characters.
I paste it below:

diff -uprN functional/futex_hugepage.c functional/futex_hugepage.c
--- functional/futex_hugepage.c 1970-01-01 00:00:00.000000000 +0000
+++ functional/futex_hugepage.c 2013-04-18 16:55:44.119239404 +0000
@@ -0,0 +1,188 @@
+/*********************************************************************
+ *   This program is free software;  you can redistribute it and/or 
+ *   modify it under the terms of the GNU General Public License as 
+ *   published by the Free Software Foundation; either version 2 of 
+ *   the License, or (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY;  without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program;  if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+ *   MA 02111-1307 USA
+ * NAME
+ *      futex_hugepage.c
+ *
+ * DESCRIPTION
+ *      Testing futex when using huge page
+ *
+ * AUTHOR
+ *      Zhang Yi <zhang.yi20@zte.com.cn>
+ *
+ * HISTORY
+ *      2013-4-18: Initial version by Zhang Yi <zhang.yi20@zte.com.cn>
+ *
+ ********************************************************************/
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <errno.h>
+#include <sys/time.h>
+#include <signal.h>
+
+#include "futextest.h"
+#include "logging.h"
+
+#define DEFAULT_FILE_NAME "/mnt/hugepagefile"
+#define MAX_FILENAME_LEN 128
+
+#define DEFAULT_HUGE_SIZE (2 * 1024 * 1024)
+
+#define PROTECTION (PROT_READ | PROT_WRITE)
+
+/* Only ia64 requires this */
+#ifdef __ia64__
+#define ADDR (void *)(0x8000000000000000UL)
+#define FLAGS (MAP_SHARED | MAP_FIXED)
+#else
+#define ADDR (void *)(0x0UL)
+#define FLAGS (MAP_SHARED)
+#endif
+
+
+futex_t *futex1, *futex2;
+
+unsigned long th2_wait_time;
+int th2_wait_done;
+
+void usage(char *prog)
+{
+       printf("Usage: %s\n", prog);
+       printf("  -f    hugetlbfs file path\n");
+       printf("  -l    hugepage size\n");
+}
+
+int gettid(void)
+{
+       return syscall(SYS_gettid);
+}
+
+void *wait_thread1(void *arg)
+{
+       futex_wait(futex1, *futex1, NULL, 0);
+       return NULL;
+}
+
+
+void *wait_thread2(void *arg)
+{
+       struct timeval tv;
+
+       gettimeofday(&tv, NULL);
+       th2_wait_time = tv.tv_sec;
+       futex_wait(futex2, *futex2, NULL, 0);;
+       th2_wait_done = 1;
+
+       return NULL;
+}
+
+int huge_futex_test(char *file_path, unsigned long huge_size)
+{
+       void *addr;
+       int fd, pgsz, wait_max_time = 30;
+       int ret = RET_PASS;
+       pthread_t th1, th2;
+       struct timeval tv;
+ 
+       fd = open(file_path, O_CREAT | O_RDWR, 0755);
+       if (fd < 0) {
+               perror("Open failed");
+               exit(1);
+       }
+ 
+       /*map hugetlbfs file*/
+       addr = mmap(ADDR, huge_size, PROTECTION, FLAGS, fd, 0);
+       if (addr == MAP_FAILED) {
+               perror("mmap");
+               unlink(file_path);
+               exit(1);
+       }
+
+       pgsz = getpagesize();
+       printf("page size is %d\n", pgsz);
+ 
+       /*apply the first subpage to futex1*/
+       futex1 = addr;
+       *futex1 = FUTEX_INITIALIZER ;
+       /*apply the second subpage to futex2*/
+       futex2 = addr + pgsz;
+       *futex2 = FUTEX_INITIALIZER ;
+ 
+
+       /*thread1 block on futex1 first,then thread2 block on futex2*/
+       pthread_create(&th1, NULL, wait_thread1, NULL);
+       sleep(2);
+       pthread_create(&th2, NULL, wait_thread2, NULL);
+       sleep(2);
+
+       /*try to wake up thread2*/
+       futex_wake(futex2, 1, 0);
+
+       /*see if thread2 can be woke up*/
+       while(!th2_wait_done) {
+               gettimeofday(&tv, NULL);
+               /*thread2 block over 30 secs, test fail*/
+               if(tv.tv_sec > (th2_wait_time + wait_max_time)) {
+                       printf("wait_thread2 wait for %ld secs\n", 
+                                   tv.tv_sec - th2_wait_time);
+                       ret = RET_FAIL;
+                       break;
+               }
+               sleep(2);
+       }
+
+       munmap(addr, huge_size);
+       close(fd);
+       unlink(file_path);
+
+       return ret;
+}
+
+int main(int argc, char *argv[])
+{
+       unsigned long huge_size = DEFAULT_HUGE_SIZE;
+       char file_path[MAX_FILENAME_LEN];
+       int ret, c;
+
+       strcpy(file_path, DEFAULT_FILE_NAME);
+
+       while ((c = getopt(argc, argv, "cf:l:")) != -1) {
+               switch(c) {
+               case 'c':
+                       log_color(1);
+               case 'f':
+                       strcpy(file_path, optarg);
+                       break;
+               case 'l':
+                       huge_size = atoi(optarg) * 1024 * 1024;
+                       break;
+               default:
+                       usage(basename(argv[0]));
+                       exit(1);
+               }
+       }
+ 
+       ret = huge_futex_test(file_path, huge_size);
+
+       print_result(ret);
+
+       return ret;
+}
+
diff -uprN functional/run.sh functional/run.sh
--- functional/run.sh   2013-04-18 06:39:56.000000000 +0000
+++ functional/run.sh   2013-04-18 16:55:59.447240286 +0000
@@ -89,3 +89,6 @@ echo
 echo
 ./futex_wait_uninitialized_heap $COLOR
 ./futex_wait_private_mapped_file $COLOR
+
+echo
+./futex_hugepage $COLOR
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-18 14:34           ` Darren Hart
@ 2013-04-19  2:13             ` zhang.yi20
  2013-04-19  2:45               ` Darren Hart
  0 siblings, 1 reply; 20+ messages in thread
From: zhang.yi20 @ 2013-04-19  2:13 UTC (permalink / raw)
  To: Darren Hart
  Cc: Dave Hansen, Dave Hansen, linux-kernel, linux-mm, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="GB2312", Size: 2312 bytes --]

Darren Hart <dvhart@linux.intel.com> wrote on 2013/04/18 22:34:29:

> On 04/18/2013 01:05 AM, zhang.yi20@zte.com.cn wrote:
> > 
> > I have run futextest/performance/futex_wait for testing, 
> >  5 times before make it long:
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10215 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 9862 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10081 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10060 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10081 Kiter/s
> > 
> > 
> > And 5 times after make it long:
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 9940 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10204 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 9901 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10152 Kiter/s
> > 
> > futex_wait: Measure FUTEX_WAIT operations per second
> >         Arguments: iterations=100000000 threads=256
> > Result: 10060 Kiter/s
> > 
> > 
> > Seems OK, is it?
> > 
> 
> Changes appear to be in the noise, no impact with this load 
> anyway.
> How many CPUs on your test machine? I presume not 256?
> 
> -- 

There are 16 CPUs£¬ and mode is:
Intel(R) Xeon(R) CPU           C5528  @ 2.13GHz

Shall I make the number of threads as the CPUS? I test again with argument 
'-n 16', the result is similar.

BTW, have you seen the testcase in my other mail?  It seems to be rejected 
by LKML.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-17 15:51       ` Darren Hart
@ 2013-04-18  8:05         ` zhang.yi20
  2013-04-18 14:34           ` Darren Hart
  0 siblings, 1 reply; 20+ messages in thread
From: zhang.yi20 @ 2013-04-18  8:05 UTC (permalink / raw)
  To: Darren Hart
  Cc: Dave Hansen, Dave Hansen, linux-kernel, linux-mm, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner

Darren Hart <dvhart@linux.intel.com> wrote on 2013/04/17 23:51:36:

> On 04/17/2013 08:26 AM, Dave Hansen wrote:
> > On 04/17/2013 07:18 AM, Darren Hart wrote:
> >>>> This also needs a comment in futex.h describing the usage of the
> >>>> offset field in union futex_key as well as above get_futex_key
> >>>> describing the key for shared mappings.
> >>>>
> >>> As far as I know , the max size of one hugepage is 1 GBytes for 
> >>> x86 cpu. Can some other cpus support greater hugepage even more 
> >>> than 4 GBytes? If so, we can change the type of 'offset' from int 
> >>>  to long to avoid truncating.
> >>
> >> I discussed this with Dave Hansen, on CC, and he thought we needed
> >> 9 bits, so even on x86 32b we should be covered.
> > 
> > I think the problem is actually on 64-bit since you still only have
> > 32-bits in an 'int' there.
> > 
> > I guess it's remotely possible that we could have some
> > mega-super-huge-gigantic pages show up in hardware some day, or that
> > somebody would come up with software-only one.  I bet there's a lot
> > more code that will break in the kernel than this futex code, though.
> > 
> > The other option would be to start #defining some build-time constant
> > for what the largest possible huge page size is, then BUILD_BUG_ON()
> > it.
> >
> > Or you can just make it a long ;)
> 
> If we make it a long I'd want to see futextest performance tests before
> and after. Messing with the futex_key has been known to have bad results
> in the past :-)
> 
> -- 
 
I have run futextest/performance/futex_wait for testing, 5 times before 
make it long:
futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10215 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 9862 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10081 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10060 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10081 Kiter/s


And 5 times after make it long:
futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 9940 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10204 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 9901 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10152 Kiter/s

futex_wait: Measure FUTEX_WAIT operations per second
        Arguments: iterations=100000000 threads=256
Result: 10060 Kiter/s


Seems OK, is it?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-16 17:57 Darren Hart
@ 2013-04-17  9:55 ` zhang.yi20
  2013-04-17 14:18   ` Darren Hart
  0 siblings, 1 reply; 20+ messages in thread
From: zhang.yi20 @ 2013-04-17  9:55 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, linux-mm, Ingo Molnar, Peter Zijlstra, Thomas Gleixner

Darren Hart <dvhart@linux.intel.com> wrote on 2013/04/17 01:57:10:

> Again, a functional testcase in futextest would be a good idea. This
> helps validate the patch and also can be used to identify regressions in
> the future.

I will post the testcase code later.

> 
> What is the max value of comp_idx? Are we at risk of truncating it?
> Looks like not really from my initial look.
> 
> This also needs a comment in futex.h describing the usage of the offset
> field in union futex_key as well as above get_futex_key describing the
> key for shared mappings.
> 
> 

As far as I know , the max size of one hugepage is 1 GBytes for x86 cpu.
Can some other cpus support greater hugepage even more than 4 GBytes? If 
so, we can change the type of 'offset' from int to long to avoid 
truncating.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
  2013-04-16 18:37 Dave Hansen
@ 2013-04-17  7:47 ` zhang.yi20
  0 siblings, 0 replies; 20+ messages in thread
From: zhang.yi20 @ 2013-04-17  7:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Darren Hart, linux-kernel, linux-mm, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner

Dave Hansen <dave.hansen@linux.intel.com> wrote on 2013/04/17 02:37:40:

> Instead of bothering to store the index, why not just calculate it, 
like:
> 
> On 04/15/2013 08:37 PM, zhang.yi20@zte.com.cn wrote:
> > +static inline int get_page_compound_index(struct page *page)
> > +{
> > +       if (PageHead(page))
> > +               return 0;
> > +       return compound_head(page) - page;
> > +}
> 
> BTW, you've really got to get your mail client fixed.  Your patch is
> still line-wrapped.


I agree that I should calculate the compound index, but refer to 
prep_compound_gigantic_page, I think it may like this:

+static inline int get_page_compound_index(struct page *page)
+{
+       struct page *head_page;
+       if (PageHead(page))
+               return 0;
+
+       head_page = compound_head(page);
+       if (compound_order(head_page) >= MAX_ORDER)
+               return page_to_pfn(page) - page_to_pfn(head_page);
+       else
+               return page - compound_head(page);
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-06-25 19:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-26 12:13 [PATCH] futex: bugfix for futex-key conflict when futex use hugepage Zhang Yi
2013-04-26 18:26 ` Thomas Gleixner
2013-05-07 12:23   ` Zhang Yi
2013-05-07 15:20     ` Mel Gorman
2013-05-07 15:24       ` Thomas Gleixner
2013-05-07 15:54         ` Mel Gorman
2013-05-10  9:08       ` zhang.yi20
2013-05-10  9:42         ` Mel Gorman
2013-05-07 12:34   ` Zhang Yi
  -- strict thread matches above, loose matches on Subject: below --
2013-05-15 13:57 Zhang Yi
2013-05-15 14:20 ` Mel Gorman
2013-05-16  1:16   ` zhang.yi20
2013-05-16  1:30     ` Darren Hart
2013-05-16  2:00       ` zhang.yi20
2013-06-24 21:02         ` Darren Hart
2013-06-25 13:19           ` Zhang Yi
2013-06-25 18:23             ` Darren Hart
2013-06-25 19:41               ` Thomas Gleixner
2013-04-16 18:37 Dave Hansen
2013-04-17  7:47 ` zhang.yi20
2013-04-16 17:57 Darren Hart
2013-04-17  9:55 ` zhang.yi20
2013-04-17 14:18   ` Darren Hart
2013-04-17 15:26     ` Dave Hansen
2013-04-17 15:51       ` Darren Hart
2013-04-18  8:05         ` zhang.yi20
2013-04-18 14:34           ` Darren Hart
2013-04-19  2:13             ` zhang.yi20
2013-04-19  2:45               ` Darren Hart
2013-04-19  7:03                 ` zhang.yi20

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).