linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
@ 2013-05-15 13:57 Zhang Yi
  2013-05-15 14:20 ` Mel Gorman
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang Yi @ 2013-05-15 13:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: 'Thomas Gleixner', 'Darren Hart',
	'Peter Zijlstra', 'Ingo Molnar',
	mgorman, zhang.yi20

The futex-keys of processes share futex determined by page-offset,
mapping-host, and mapping-index of the user space address. User
appications using hugepage for futex may lead to futex-key conflict.

Assume there are two or more futexes in diffrent normal pages of the
hugepage, and each futex has the same offset in its normal page,
causing all the futexes have the same futex-key.

This patch adds the normal page index in the compound page into
the pgoff of futex-key.

Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address
as the 1st mutex's address, and use the return address with PAGE_SIZE
added as the 2nd mutex's address.
2. The 1st thread initialize the two mutexes with pshared attribute,
and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block on
the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block on
the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread cannot take
the 2nd mutex, and may block forever.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>


diff -uprN linux-3.10-rc1.org/include/linux/hugetlb.h linux-3.10-rc1/include/linux/hugetlb.h
--- linux-3.10-rc1.org/include/linux/hugetlb.h	2013-05-12 00:14:08.000000000 +0000
+++ linux-3.10-rc1/include/linux/hugetlb.h	2013-05-14 10:15:49.849389000 +0000
@@ -358,6 +358,17 @@ static inline int hstate_index(struct hs
 	return h - hstates;
 }

+pgoff_t __basepage_index(struct page *page);
+
+/* Return page->index in PAGE_SIZE units */
+static inline pgoff_t basepage_index(struct page *page)
+{
+	if (!PageCompound(page))
+		return page->index;
+
+	return __basepage_index(page);
+}
+
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page_node(h, nid) NULL
@@ -378,6 +389,11 @@ static inline unsigned int pages_per_hug
 }
 #define hstate_index_to_shift(index) 0
 #define hstate_index(h) 0
+
+static inline pgoff_t basepage_index(struct page *page)
+{
+	return page->index;
+}
 #endif	/* CONFIG_HUGETLB_PAGE */

 #endif /* _LINUX_HUGETLB_H */
diff -uprN linux-3.10-rc1.org/kernel/futex.c linux-3.10-rc1/kernel/futex.c
--- linux-3.10-rc1.org/kernel/futex.c	2013-05-12 00:14:08.000000000 +0000
+++ linux-3.10-rc1/kernel/futex.c	2013-05-14 10:15:04.804350000 +0000
@@ -61,6 +61,7 @@
 #include <linux/nsproxy.h>
 #include <linux/ptrace.h>
 #include <linux/sched/rt.h>
+#include <linux/hugetlb.h>

 #include <asm/futex.h>

@@ -365,7 +366,7 @@ again:
 	} else {
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
 		key->shared.inode = page_head->mapping->host;
-		key->shared.pgoff = page_head->index;
+		key->shared.pgoff = basepage_index(page);
 	}

 	get_futex_key_refs(key);
diff -uprN linux-3.10-rc1.org/mm/hugetlb.c linux-3.10-rc1/mm/hugetlb.c
--- linux-3.10-rc1.org/mm/hugetlb.c	2013-05-12 00:14:08.000000000 +0000
+++ linux-3.10-rc1/mm/hugetlb.c	2013-05-14 10:15:37.470175000 +0000
@@ -690,6 +690,23 @@ int PageHuge(struct page *page)
 }
 EXPORT_SYMBOL_GPL(PageHuge);

+pgoff_t __basepage_index(struct page *page)
+{
+	struct page *page_head = compound_head(page);
+	pgoff_t index = page_index(page_head);
+	int compound_idx;
+
+	if (!PageHuge(page_head))
+		return page_index(page);
+
+	if (compound_order(page_head) >= MAX_ORDER)
+		compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
+	else
+		compound_idx = page - page_head;
+
+	return (index << compound_order(page_head)) + compound_idx;
+}
+
 static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
 {
 	struct page *page;



^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
@ 2013-04-26 12:13 Zhang Yi
  2013-04-26 18:26 ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang Yi @ 2013-04-26 12:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: 'Peter Zijlstra', 'Darren Hart',
	'Thomas Gleixner', 'Ingo Molnar',
	'Dave Hansen',
	zhang.yi20, wetpzy

Hi ,

At 2013-04-26 04:52:31,"Thomas Gleixner" <tglx@linutronix.de> wrote:
>
>Unfortunately this did not work out very well.
>
>1. Your patch now lacks a proper changelog which explains the change
>
>2. Your patch lacks any newline characters as you can see below
>

I am so sorry for my mistakes. : )




The futex-keys of processes share futex determined by page-offset, mapping-host, and
mapping-index of the user space address.
User appications using hugepage for futex may lead to futex-key conflict.
Assume there are two or more futexes in diffrent normal pages of the hugepage,
and each futex has the same offset in its normal page, causing all the futexes have the same futex-key.
In that case, futex may not work well.

This patch adds the normal page index in the compound page into the offset of futex-key.

Steps to reproduce the bug:
1. The 1st thread map a file of hugetlbfs, and use the return address as the 1st mutex's
address, and use the return address with PAGE_SIZE added as the 2nd mutex's address;
2. The 1st thread initialize the two mutexes with pshared attribute, and lock the two mutexes.
3. The 1st thread create the 2nd thread, and the 2nd thread block on the 1st mutex.
4. The 1st thread create the 3rd thread, and the 3rd thread block on the 2nd mutex.
5. The 1st thread unlock the 2nd mutex, the 3rd thread can not take the 2nd mutex, and
may block forever.


Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>


diff -uprN orig/linux3.9-rc7/include/linux/futex.h new/linux3.9-rc7/include/linux/futex.h
--- orig/linux3.9-rc7/include/linux/futex.h	2013-04-15 00:45:16.000000000 +0000
+++ new/linux3.9-rc7/include/linux/futex.h	2013-04-19 16:33:58.725880000 +0000
@@ -19,7 +19,7 @@ handle_futex_death(u32 __user *uaddr, st
  * The key type depends on whether it's a shared or private mapping.
  * Don't rearrange members without looking at hash_futex().
  *
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
+ * There are three components in offset:
  * We use the two low order bits of offset to tell what is the kind of key :
  *  00 : Private process futex (PTHREAD_PROCESS_PRIVATE)
  *       (no reference on an inode or mm)
@@ -27,6 +27,9 @@ handle_futex_death(u32 __user *uaddr, st
  *	mapped on a file (reference on the underlying inode)
  *  10 : Shared futex (PTHREAD_PROCESS_SHARED)
  *       (but private mapping on an mm, and reference taken on it)
+ * Bits 2 to (PAGE_SHIFT-1) indicates the offset of futex in its page.
+ * The rest hign order bits indicates the index if the page is a
+ * subpage of a compound page.
 */

 #define FUT_OFF_INODE    1 /* We set bit 0 if key has a reference on inode */
@@ -36,17 +39,17 @@ union futex_key {
 	struct {
 		unsigned long pgoff;
 		struct inode *inode;
-		int offset;
+		long offset;
 	} shared;
 	struct {
 		unsigned long address;
 		struct mm_struct *mm;
-		int offset;
+		long offset;
 	} private;
 	struct {
 		unsigned long word;
 		void *ptr;
-		int offset;
+		long offset;
 	} both;
 };

diff -uprN orig/linux3.9-rc7/kernel/futex.c new/linux3.9-rc7/kernel/futex.c
--- orig/linux3.9-rc7/kernel/futex.c	2013-04-15 00:45:16.000000000 +0000
+++ new/linux3.9-rc7/kernel/futex.c	2013-04-19 16:24:05.629143000 +0000
@@ -215,6 +215,22 @@ static void drop_futex_key_refs(union fu
 	}
 }

+/*
+* Get subpage index in compound page, for futex_key.
+*/
+static inline int get_page_compound_index(struct page *page)
+{
+	struct page *head_page;
+	if (PageHead(page))
+		return 0;
+
+	head_page = compound_head(page);
+	if (compound_order(head_page) >= MAX_ORDER)
+		return page_to_pfn(page) - page_to_pfn(head_page);
+	else
+		return page - compound_head(page);
+}
+
 /**
  * get_futex_key() - Get parameters which are the keys for a futex
  * @uaddr:	virtual address of the futex
@@ -228,7 +244,8 @@ static void drop_futex_key_refs(union fu
  * The key words are stored in *key on success.
  *
  * For shared mappings, it's (page->index, file_inode(vma->vm_file),
- * offset_within_page).  For private mappings, it's (uaddr, current->mm).
+ * page_compound_index and offset_within_page).
+ * For private mappings, it's (uaddr, current->mm).
  * We can usually work out the index without swapping in the page.
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
@@ -239,7 +256,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
 	struct page *page, *page_head;
-	int err, ro = 0;
+	int err, ro = 0, comp_idx = 0;

 	/*
 	 * The futex address must be "naturally" aligned.
@@ -299,6 +316,7 @@ again:
 			 * freed from under us.
 			 */
 			if (page != page_head) {
+				comp_idx = get_page_compound_index(page);
 				get_page(page_head);
 				put_page(page);
 			}
@@ -311,6 +329,7 @@ again:
 #else
 	page_head = compound_head(page);
 	if (page != page_head) {
+		comp_idx = get_page_compound_index(page);
 		get_page(page_head);
 		put_page(page);
 	}
@@ -363,7 +382,9 @@ again:
 		key->private.mm = mm;
 		key->private.address = address;
 	} else {
-		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
+		/* inode-based key */
+		key->both.offset |= ((long)comp_idx << PAGE_SHIFT)
+				   | FUT_OFF_INODE;
 		key->shared.inode = page_head->mapping->host;
 		key->shared.pgoff = page_head->index;
 	}



^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
@ 2013-04-16 18:37 Dave Hansen
  2013-04-17  7:47 ` zhang.yi20
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2013-04-16 18:37 UTC (permalink / raw)
  To: zhang.yi20
  Cc: linux-kernel, linux-mm, Peter Zijlstra, Darren Hart,
	Thomas Gleixner, Ingo Molnar

Instead of bothering to store the index, why not just calculate it, like:

On 04/15/2013 08:37 PM, zhang.yi20@zte.com.cn wrote:
> +static inline int get_page_compound_index(struct page *page)
> +{
> +       if (PageHead(page))
> +               return 0;
> +       return compound_head(page) - page;
> +}

BTW, you've really got to get your mail client fixed.  Your patch is
still line-wrapped.

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [PATCH] futex: bugfix for futex-key conflict when futex use hugepage
@ 2013-04-16 17:57 Darren Hart
  2013-04-17  9:55 ` zhang.yi20
  0 siblings, 1 reply; 17+ messages in thread
From: Darren Hart @ 2013-04-16 17:57 UTC (permalink / raw)
  To: zhang.yi20
  Cc: linux-kernel, linux-mm, Peter Zijlstra, Thomas Gleixner, Ingo Molnar

On 04/15/2013 08:37 PM, zhang.yi20@zte.com.cn wrote:
> Hello,
>

Hi Zhang,

I've rewrapped your plain text here for legibility, please adjust your
mail client accordingly.

> The futex-keys of processes share futex determined by page-offset,
> mapping-host, and mapping-index of the user space address.  User appications
> using hugepage for futex may lead to futex-key conflict.  Assume there are two
> or more futexes in diffrent normal pages of the hugepage, and each futex has
> the same offset in its normal page, causing all the futexes have the same
> futex-key.  In that case, futex may not work well.
>
> This patch adds the normal page index in the compound page into the offset of
> futex-key.

It also modifies the mm prep_compound*page() routines to set the page
compound index. You didn't modify the structure itself, I'm curious why
this information wasn't set before? Something for the MM folks I guess..

>
> Steps to reproduce the bug:
> 1. The 1st thread map a file of hugetlbfs, and use the return address as the
>    1st mutex's address, and use the return address with PAGE_SIZE added as the
>    2nd mutex's address;
> 2. The 1st thread initialize the two mutexes with pshared attribute, and lock
>    the two mutexes.
> 3. The 1st thread create the 2nd thread, and the 2nd thread block on the 1st
>    mutex.
> 4. The 1st thread create the 3rd thread, and the 3rd thread block on the 2nd
>    mutex.
> 5. The 1st thread unlock the 2nd mutex, the 3rd thread can not take the 2nd
>    mutex, and may block forever.


Again, a functional testcase in futextest would be a good idea. This
helps validate the patch and also can be used to identify regressions in
the future.


> Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
> Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
> Reviewed-by: Liu Dong <liu.dong3@zte.com.cn>
> Reviewed-by: Cui Yunfeng <cui.yunfeng@zte.com.cn>
> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
>
> diff -uprN orig/linux-3.9-rc7/include/linux/mm.h
> new/linux-3.9-rc7/include/linux/mm.h
> --- orig/linux-3.9-rc7/include/linux/mm.h       2013-04-15
> 00:45:16.000000000 +0000
> +++ new/linux-3.9-rc7/include/linux/mm.h        2013-04-16
> 11:21:59.573458000 +0000
> @@ -502,6 +502,20 @@ static inline void set_compound_order(st
>         page[1].lru.prev = (void *)order;
>  }
>
> +static inline void set_page_compound_index(struct page *page, int index)
> +{
> +       if (PageHead(page))
> +               return;
> +       page->index = index;
> +}

I presume the spaces instead of tabs is a result of your mailer mangling
whitespace?

> +
> +static inline int get_page_compound_index(struct page *page)
> +{
> +       if (PageHead(page))
> +               return 0;
> +       return page->index;
> +}
> +
>  #ifdef CONFIG_MMU
>  /*
>   * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
> diff -uprN orig/linux-3.9-rc7/kernel/futex.c
> new/linux-3.9-rc7/kernel/futex.c
> --- orig/linux-3.9-rc7/kernel/futex.c   2013-04-15 00:45:16.000000000
> +0000
> +++ new/linux-3.9-rc7/kernel/futex.c    2013-04-16 11:13:30.069887000
> +0000
> @@ -239,7 +239,7 @@ get_futex_key(u32 __user *uaddr, int fsh
>         unsigned long address = (unsigned long)uaddr;
>         struct mm_struct *mm = current->mm;
>         struct page *page, *page_head;
> -       int err, ro = 0;
> +       int err, ro = 0, comp_idx = 0;
>
>         /*
>          * The futex address must be "naturally" aligned.
> @@ -299,6 +299,7 @@ again:
>                          * freed from under us.
>                          */
>                         if (page != page_head) {
> +                               comp_idx = get_page_compound_index(page);
>                                 get_page(page_head);
>                                 put_page(page);
>                         }
> @@ -311,6 +312,7 @@ again:
>  #else
>         page_head = compound_head(page);
>         if (page != page_head) {
> +               comp_idx = get_page_compound_index(page);
>                 get_page(page_head);
>                 put_page(page);
>         }
> @@ -363,7 +365,8 @@ again:
>                 key->private.mm = mm;
>                 key->private.address = address;
>         } else {
> -               key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> +               key->both.offset |= (comp_idx << PAGE_SHIFT)
> +                                   | FUT_OFF_INODE; /* inode-based key */

Comments at the end of lines are bad form already, when moving to
multi-line, please move the comment just above the statements.

What is the max value of comp_idx? Are we at risk of truncating it?
Looks like not really from my initial look.

This also needs a comment in futex.h describing the usage of the offset
field in union futex_key as well as above get_futex_key describing the
key for shared mappings.


Thanks,

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-06-25 21:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 13:57 [PATCH] futex: bugfix for futex-key conflict when futex use hugepage Zhang Yi
2013-05-15 14:20 ` Mel Gorman
2013-05-16  1:16   ` zhang.yi20
2013-05-16  1:30     ` Darren Hart
2013-05-16  2:00       ` zhang.yi20
2013-06-24 21:02         ` Darren Hart
2013-06-25 13:19           ` Zhang Yi
2013-06-25 18:23             ` Darren Hart
2013-06-25 19:41               ` Thomas Gleixner
2013-06-25 21:15             ` [tip:core/locking] futex: Take hugepages into account when generating futex_key tip-bot for Zhang Yi
  -- strict thread matches above, loose matches on Subject: below --
2013-04-26 12:13 [PATCH] futex: bugfix for futex-key conflict when futex use hugepage Zhang Yi
2013-04-26 18:26 ` Thomas Gleixner
2013-05-07 12:23   ` Zhang Yi
2013-05-07 15:20     ` Mel Gorman
2013-05-10  9:08       ` zhang.yi20
2013-05-10  9:42         ` Mel Gorman
2013-04-16 18:37 Dave Hansen
2013-04-17  7:47 ` zhang.yi20
2013-04-16 17:57 Darren Hart
2013-04-17  9:55 ` zhang.yi20
2013-04-17 14:18   ` Darren Hart
2013-04-17 15:26     ` Dave Hansen
2013-04-17 15:51       ` Darren Hart
2013-04-18  8:05         ` zhang.yi20
2013-04-18 14:34           ` Darren Hart
2013-04-19  2:13             ` zhang.yi20
2013-04-19  2:45               ` Darren Hart
2013-04-19  7:03                 ` zhang.yi20

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).