All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 1/2] mm: additional page lock debugging
@ 2013-12-29  1:45 ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-29  1:45 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, kirill, Sasha Levin

We've recently stumbled on several issues with the page lock which
triggered BUG_ON()s.

While working on them, it was clear that due to the complexity of
locking its pretty hard to figure out if something is supposed
to be locked or not, and if we encountered a race it was quite a
pain narrowing it down.

This is an attempt at solving this situation. This patch adds simple
asserts to catch cases where someone is trying to lock the page lock
while it's already locked, and cases to catch someone unlocking the
lock without it being held.

My initial patch attempted to use lockdep to get further coverege,
but that attempt uncovered the amount of issues triggered and made
it impossible to debug the lockdep integration without clearing out
a large portion of existing bugs.

This patch adds a new option since it will horribly break any system
booting with it due to the amount of issues that it uncovers. This is
more of a "call for help" to other mm/ hackers to help clean it up.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/pagemap.h | 11 +++++++++++
 lib/Kconfig.debug       |  9 +++++++++
 mm/filemap.c            |  4 +++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1710d1b..da24939 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
+#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
+#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
+#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
+#else
+#define VM_ASSERT_LOCKED(page) do { } while (0)
+#define VM_ASSERT_UNLOCKED(page) do { } while (0)
+#endif
+
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
@@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
 
 static inline void __set_page_locked(struct page *page)
 {
+	VM_ASSERT_UNLOCKED(page);
 	__set_bit(PG_locked, &page->flags);
 }
 
 static inline void __clear_page_locked(struct page *page)
 {
+	VM_ASSERT_LOCKED(page);
 	__clear_bit(PG_locked, &page->flags);
 }
 
 static inline int trylock_page(struct page *page)
 {
+	VM_ASSERT_UNLOCKED(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 48d32cd..ae4b60d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -510,6 +510,15 @@ config DEBUG_VM_RB
 
 	  If unsure, say N.
 
+config DEBUG_VM_PAGE_LOCKS
+	bool "Debug VM page locking"
+	depends on DEBUG_VM
+	help
+	  Debug page locking by catching double locks and double frees. These
+	  checks may impact performance.
+
+	  If unsure, say N.
+
 config DEBUG_VIRTUAL
 	bool "Debug VM translations"
 	depends on DEBUG_KERNEL && X86
diff --git a/mm/filemap.c b/mm/filemap.c
index 7a7f3e0..665addc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
  */
 void unlock_page(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_ASSERT_LOCKED(page);
 	clear_bit_unlock(PG_locked, &page->flags);
 	smp_mb__after_clear_bit();
 	wake_up_page(page, PG_locked);
@@ -639,6 +639,7 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_ASSERT_UNLOCKED(page);
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_ASSERT_UNLOCKED(page);
 	return __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sleep_on_page_killable, TASK_KILLABLE);
 }
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 1/2] mm: additional page lock debugging
@ 2013-12-29  1:45 ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-29  1:45 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, kirill, Sasha Levin

We've recently stumbled on several issues with the page lock which
triggered BUG_ON()s.

While working on them, it was clear that due to the complexity of
locking its pretty hard to figure out if something is supposed
to be locked or not, and if we encountered a race it was quite a
pain narrowing it down.

This is an attempt at solving this situation. This patch adds simple
asserts to catch cases where someone is trying to lock the page lock
while it's already locked, and cases to catch someone unlocking the
lock without it being held.

My initial patch attempted to use lockdep to get further coverege,
but that attempt uncovered the amount of issues triggered and made
it impossible to debug the lockdep integration without clearing out
a large portion of existing bugs.

This patch adds a new option since it will horribly break any system
booting with it due to the amount of issues that it uncovers. This is
more of a "call for help" to other mm/ hackers to help clean it up.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/pagemap.h | 11 +++++++++++
 lib/Kconfig.debug       |  9 +++++++++
 mm/filemap.c            |  4 +++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1710d1b..da24939 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
 	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 }
 
+#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
+#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
+#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
+#else
+#define VM_ASSERT_LOCKED(page) do { } while (0)
+#define VM_ASSERT_UNLOCKED(page) do { } while (0)
+#endif
+
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
@@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
 
 static inline void __set_page_locked(struct page *page)
 {
+	VM_ASSERT_UNLOCKED(page);
 	__set_bit(PG_locked, &page->flags);
 }
 
 static inline void __clear_page_locked(struct page *page)
 {
+	VM_ASSERT_LOCKED(page);
 	__clear_bit(PG_locked, &page->flags);
 }
 
 static inline int trylock_page(struct page *page)
 {
+	VM_ASSERT_UNLOCKED(page);
 	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 48d32cd..ae4b60d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -510,6 +510,15 @@ config DEBUG_VM_RB
 
 	  If unsure, say N.
 
+config DEBUG_VM_PAGE_LOCKS
+	bool "Debug VM page locking"
+	depends on DEBUG_VM
+	help
+	  Debug page locking by catching double locks and double frees. These
+	  checks may impact performance.
+
+	  If unsure, say N.
+
 config DEBUG_VIRTUAL
 	bool "Debug VM translations"
 	depends on DEBUG_KERNEL && X86
diff --git a/mm/filemap.c b/mm/filemap.c
index 7a7f3e0..665addc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
  */
 void unlock_page(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	VM_ASSERT_LOCKED(page);
 	clear_bit_unlock(PG_locked, &page->flags);
 	smp_mb__after_clear_bit();
 	wake_up_page(page, PG_locked);
@@ -639,6 +639,7 @@ void __lock_page(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_ASSERT_UNLOCKED(page);
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
 							TASK_UNINTERRUPTIBLE);
 }
@@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	VM_ASSERT_UNLOCKED(page);
 	return __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sleep_on_page_killable, TASK_KILLABLE);
 }
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 2/2] mm: additional checks to page flag set/clear
  2013-12-29  1:45 ` Sasha Levin
@ 2013-12-29  1:45   ` Sasha Levin
  -1 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-29  1:45 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, kirill, Sasha Levin

Check if the flag is already set before setting it, and vice versa
for clearing.

Obviously setting or clearing a flag twice isn't a problem on it's
own, but it implies that there's an issue where some piece of code
assumed an opposite state of the flag.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/page-flags.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1fe1a7..36b0bef 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -130,6 +130,12 @@ enum pageflags {
 
 #ifndef __GENERATING_BOUNDS_H
 
+#ifdef CONFIG_DEBUG_VM_PAGE_FLAGS
+#define VM_ASSERT_FLAG(assert, page) VM_BUG_ON_PAGE(assert, page)
+#else
+#define VM_ASSERT_FLAG(assert, page) do { } while (0)
+#endif
+
 /*
  * Macros to create function definitions for page flags
  */
@@ -139,11 +145,13 @@ static inline int Page##uname(const struct page *page)			\
 
 #define SETPAGEFLAG(uname, lname)					\
 static inline void SetPage##uname(struct page *page)			\
-			{ set_bit(PG_##lname, &page->flags); }
+			{ VM_ASSERT_FLAG(Page##uname(page), page);	\
+			set_bit(PG_##lname, &page->flags); }
 
 #define CLEARPAGEFLAG(uname, lname)					\
 static inline void ClearPage##uname(struct page *page)			\
-			{ clear_bit(PG_##lname, &page->flags); }
+			{ VM_ASSERT_FLAG(!Page##uname(page), page);	\
+			clear_bit(PG_##lname, &page->flags); }
 
 #define __SETPAGEFLAG(uname, lname)					\
 static inline void __SetPage##uname(struct page *page)			\
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 2/2] mm: additional checks to page flag set/clear
@ 2013-12-29  1:45   ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-29  1:45 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, kirill, Sasha Levin

Check if the flag is already set before setting it, and vice versa
for clearing.

Obviously setting or clearing a flag twice isn't a problem on it's
own, but it implies that there's an issue where some piece of code
assumed an opposite state of the flag.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/page-flags.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d1fe1a7..36b0bef 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -130,6 +130,12 @@ enum pageflags {
 
 #ifndef __GENERATING_BOUNDS_H
 
+#ifdef CONFIG_DEBUG_VM_PAGE_FLAGS
+#define VM_ASSERT_FLAG(assert, page) VM_BUG_ON_PAGE(assert, page)
+#else
+#define VM_ASSERT_FLAG(assert, page) do { } while (0)
+#endif
+
 /*
  * Macros to create function definitions for page flags
  */
@@ -139,11 +145,13 @@ static inline int Page##uname(const struct page *page)			\
 
 #define SETPAGEFLAG(uname, lname)					\
 static inline void SetPage##uname(struct page *page)			\
-			{ set_bit(PG_##lname, &page->flags); }
+			{ VM_ASSERT_FLAG(Page##uname(page), page);	\
+			set_bit(PG_##lname, &page->flags); }
 
 #define CLEARPAGEFLAG(uname, lname)					\
 static inline void ClearPage##uname(struct page *page)			\
-			{ clear_bit(PG_##lname, &page->flags); }
+			{ VM_ASSERT_FLAG(!Page##uname(page), page);	\
+			clear_bit(PG_##lname, &page->flags); }
 
 #define __SETPAGEFLAG(uname, lname)					\
 static inline void __SetPage##uname(struct page *page)			\
-- 
1.8.3.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-29  1:45 ` Sasha Levin
@ 2013-12-30 11:43   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2013-12-30 11:43 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, linux-mm, linux-kernel

On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
> We've recently stumbled on several issues with the page lock which
> triggered BUG_ON()s.
> 
> While working on them, it was clear that due to the complexity of
> locking its pretty hard to figure out if something is supposed
> to be locked or not, and if we encountered a race it was quite a
> pain narrowing it down.
> 
> This is an attempt at solving this situation. This patch adds simple
> asserts to catch cases where someone is trying to lock the page lock
> while it's already locked, and cases to catch someone unlocking the
> lock without it being held.
> 
> My initial patch attempted to use lockdep to get further coverege,
> but that attempt uncovered the amount of issues triggered and made
> it impossible to debug the lockdep integration without clearing out
> a large portion of existing bugs.
> 
> This patch adds a new option since it will horribly break any system
> booting with it due to the amount of issues that it uncovers. This is
> more of a "call for help" to other mm/ hackers to help clean it up.
> 
> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> ---
>  include/linux/pagemap.h | 11 +++++++++++
>  lib/Kconfig.debug       |  9 +++++++++
>  mm/filemap.c            |  4 +++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 1710d1b..da24939 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>  	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>  }
>  
> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
> +#else
> +#define VM_ASSERT_LOCKED(page) do { } while (0)
> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
> +#endif
> +
>  extern void __lock_page(struct page *page);
>  extern int __lock_page_killable(struct page *page);
>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>  
>  static inline void __set_page_locked(struct page *page)
>  {
> +	VM_ASSERT_UNLOCKED(page);
>  	__set_bit(PG_locked, &page->flags);
>  }
>  
>  static inline void __clear_page_locked(struct page *page)
>  {
> +	VM_ASSERT_LOCKED(page);
>  	__clear_bit(PG_locked, &page->flags);
>  }
>  
>  static inline int trylock_page(struct page *page)
>  {
> +	VM_ASSERT_UNLOCKED(page);

This is not correct. It's perfectly fine if the page is locked here: it's
why trylock needed.

IIUC, what we want to catch is the case when the page has already locked
by the task.

I don't think it's reasonable to re-implement this functionality. We
really need to hook up lockdep.

>  	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>  }
>  
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 48d32cd..ae4b60d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -510,6 +510,15 @@ config DEBUG_VM_RB
>  
>  	  If unsure, say N.
>  
> +config DEBUG_VM_PAGE_LOCKS
> +	bool "Debug VM page locking"
> +	depends on DEBUG_VM
> +	help
> +	  Debug page locking by catching double locks and double frees. These
> +	  checks may impact performance.
> +
> +	  If unsure, say N.
> +
>  config DEBUG_VIRTUAL
>  	bool "Debug VM translations"
>  	depends on DEBUG_KERNEL && X86
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7a7f3e0..665addc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -607,7 +607,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>   */
>  void unlock_page(struct page *page)
>  {
> -	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_ASSERT_LOCKED(page);
>  	clear_bit_unlock(PG_locked, &page->flags);
>  	smp_mb__after_clear_bit();
>  	wake_up_page(page, PG_locked);
> @@ -639,6 +639,7 @@ void __lock_page(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_ASSERT_UNLOCKED(page);

It's no correct as well: __lock_page() usually called when the page is
locked and we have to sleep. See lock_page().

>  	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
>  							TASK_UNINTERRUPTIBLE);
>  }
> @@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_ASSERT_UNLOCKED(page);

The same here.

>  	return __wait_on_bit_lock(page_waitqueue(page), &wait,
>  					sleep_on_page_killable, TASK_KILLABLE);
>  }

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2013-12-30 11:43   ` Kirill A. Shutemov
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2013-12-30 11:43 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, linux-mm, linux-kernel

On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
> We've recently stumbled on several issues with the page lock which
> triggered BUG_ON()s.
> 
> While working on them, it was clear that due to the complexity of
> locking its pretty hard to figure out if something is supposed
> to be locked or not, and if we encountered a race it was quite a
> pain narrowing it down.
> 
> This is an attempt at solving this situation. This patch adds simple
> asserts to catch cases where someone is trying to lock the page lock
> while it's already locked, and cases to catch someone unlocking the
> lock without it being held.
> 
> My initial patch attempted to use lockdep to get further coverege,
> but that attempt uncovered the amount of issues triggered and made
> it impossible to debug the lockdep integration without clearing out
> a large portion of existing bugs.
> 
> This patch adds a new option since it will horribly break any system
> booting with it due to the amount of issues that it uncovers. This is
> more of a "call for help" to other mm/ hackers to help clean it up.
> 
> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> ---
>  include/linux/pagemap.h | 11 +++++++++++
>  lib/Kconfig.debug       |  9 +++++++++
>  mm/filemap.c            |  4 +++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 1710d1b..da24939 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>  	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>  }
>  
> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
> +#else
> +#define VM_ASSERT_LOCKED(page) do { } while (0)
> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
> +#endif
> +
>  extern void __lock_page(struct page *page);
>  extern int __lock_page_killable(struct page *page);
>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>  
>  static inline void __set_page_locked(struct page *page)
>  {
> +	VM_ASSERT_UNLOCKED(page);
>  	__set_bit(PG_locked, &page->flags);
>  }
>  
>  static inline void __clear_page_locked(struct page *page)
>  {
> +	VM_ASSERT_LOCKED(page);
>  	__clear_bit(PG_locked, &page->flags);
>  }
>  
>  static inline int trylock_page(struct page *page)
>  {
> +	VM_ASSERT_UNLOCKED(page);

This is not correct. It's perfectly fine if the page is locked here: it's
why trylock needed.

IIUC, what we want to catch is the case when the page has already locked
by the task.

I don't think it's reasonable to re-implement this functionality. We
really need to hook up lockdep.

>  	return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
>  }
>  
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 48d32cd..ae4b60d 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -510,6 +510,15 @@ config DEBUG_VM_RB
>  
>  	  If unsure, say N.
>  
> +config DEBUG_VM_PAGE_LOCKS
> +	bool "Debug VM page locking"
> +	depends on DEBUG_VM
> +	help
> +	  Debug page locking by catching double locks and double frees. These
> +	  checks may impact performance.
> +
> +	  If unsure, say N.
> +
>  config DEBUG_VIRTUAL
>  	bool "Debug VM translations"
>  	depends on DEBUG_KERNEL && X86
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7a7f3e0..665addc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -607,7 +607,7 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>   */
>  void unlock_page(struct page *page)
>  {
> -	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +	VM_ASSERT_LOCKED(page);
>  	clear_bit_unlock(PG_locked, &page->flags);
>  	smp_mb__after_clear_bit();
>  	wake_up_page(page, PG_locked);
> @@ -639,6 +639,7 @@ void __lock_page(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_ASSERT_UNLOCKED(page);

It's no correct as well: __lock_page() usually called when the page is
locked and we have to sleep. See lock_page().

>  	__wait_on_bit_lock(page_waitqueue(page), &wait, sleep_on_page,
>  							TASK_UNINTERRUPTIBLE);
>  }
> @@ -648,6 +649,7 @@ int __lock_page_killable(struct page *page)
>  {
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	VM_ASSERT_UNLOCKED(page);

The same here.

>  	return __wait_on_bit_lock(page_waitqueue(page), &wait,
>  					sleep_on_page_killable, TASK_KILLABLE);
>  }

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 2/2] mm: additional checks to page flag set/clear
  2013-12-29  1:45   ` Sasha Levin
@ 2013-12-30 12:12     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2013-12-30 12:12 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, linux-mm, linux-kernel

On Sat, Dec 28, 2013 at 08:45:04PM -0500, Sasha Levin wrote:
> Check if the flag is already set before setting it, and vice versa
> for clearing.
> 
> Obviously setting or clearing a flag twice isn't a problem on it's
> own, but it implies that there's an issue where some piece of code
> assumed an opposite state of the flag.

BUG() is overkill. WARN_ONCE is more then enough.

And I don't think this kind of checks make sense for all flags.

Have you seen any obviously broken case which these checks could catch?

> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> ---
>  include/linux/page-flags.h | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d1fe1a7..36b0bef 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -130,6 +130,12 @@ enum pageflags {
>  
>  #ifndef __GENERATING_BOUNDS_H
>  
> +#ifdef CONFIG_DEBUG_VM_PAGE_FLAGS
> +#define VM_ASSERT_FLAG(assert, page) VM_BUG_ON_PAGE(assert, page)
> +#else
> +#define VM_ASSERT_FLAG(assert, page) do { } while (0)
> +#endif
> +
>  /*
>   * Macros to create function definitions for page flags
>   */
> @@ -139,11 +145,13 @@ static inline int Page##uname(const struct page *page)			\
>  
>  #define SETPAGEFLAG(uname, lname)					\
>  static inline void SetPage##uname(struct page *page)			\
> -			{ set_bit(PG_##lname, &page->flags); }
> +			{ VM_ASSERT_FLAG(Page##uname(page), page);	\
> +			set_bit(PG_##lname, &page->flags); }
>  
>  #define CLEARPAGEFLAG(uname, lname)					\
>  static inline void ClearPage##uname(struct page *page)			\
> -			{ clear_bit(PG_##lname, &page->flags); }
> +			{ VM_ASSERT_FLAG(!Page##uname(page), page);	\
> +			clear_bit(PG_##lname, &page->flags); }
>  
>  #define __SETPAGEFLAG(uname, lname)					\
>  static inline void __SetPage##uname(struct page *page)			\
> -- 
> 1.8.3.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 2/2] mm: additional checks to page flag set/clear
@ 2013-12-30 12:12     ` Kirill A. Shutemov
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2013-12-30 12:12 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, linux-mm, linux-kernel

On Sat, Dec 28, 2013 at 08:45:04PM -0500, Sasha Levin wrote:
> Check if the flag is already set before setting it, and vice versa
> for clearing.
> 
> Obviously setting or clearing a flag twice isn't a problem on it's
> own, but it implies that there's an issue where some piece of code
> assumed an opposite state of the flag.

BUG() is overkill. WARN_ONCE is more then enough.

And I don't think this kind of checks make sense for all flags.

Have you seen any obviously broken case which these checks could catch?

> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> ---
>  include/linux/page-flags.h | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index d1fe1a7..36b0bef 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -130,6 +130,12 @@ enum pageflags {
>  
>  #ifndef __GENERATING_BOUNDS_H
>  
> +#ifdef CONFIG_DEBUG_VM_PAGE_FLAGS
> +#define VM_ASSERT_FLAG(assert, page) VM_BUG_ON_PAGE(assert, page)
> +#else
> +#define VM_ASSERT_FLAG(assert, page) do { } while (0)
> +#endif
> +
>  /*
>   * Macros to create function definitions for page flags
>   */
> @@ -139,11 +145,13 @@ static inline int Page##uname(const struct page *page)			\
>  
>  #define SETPAGEFLAG(uname, lname)					\
>  static inline void SetPage##uname(struct page *page)			\
> -			{ set_bit(PG_##lname, &page->flags); }
> +			{ VM_ASSERT_FLAG(Page##uname(page), page);	\
> +			set_bit(PG_##lname, &page->flags); }
>  
>  #define CLEARPAGEFLAG(uname, lname)					\
>  static inline void ClearPage##uname(struct page *page)			\
> -			{ clear_bit(PG_##lname, &page->flags); }
> +			{ VM_ASSERT_FLAG(!Page##uname(page), page);	\
> +			clear_bit(PG_##lname, &page->flags); }
>  
>  #define __SETPAGEFLAG(uname, lname)					\
>  static inline void __SetPage##uname(struct page *page)			\
> -- 
> 1.8.3.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-30 11:43   ` Kirill A. Shutemov
@ 2013-12-30 16:33     ` Sasha Levin
  -1 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-30 16:33 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: akpm, linux-mm, linux-kernel

On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
> On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
>> We've recently stumbled on several issues with the page lock which
>> triggered BUG_ON()s.
>>
>> While working on them, it was clear that due to the complexity of
>> locking its pretty hard to figure out if something is supposed
>> to be locked or not, and if we encountered a race it was quite a
>> pain narrowing it down.
>>
>> This is an attempt at solving this situation. This patch adds simple
>> asserts to catch cases where someone is trying to lock the page lock
>> while it's already locked, and cases to catch someone unlocking the
>> lock without it being held.
>>
>> My initial patch attempted to use lockdep to get further coverege,
>> but that attempt uncovered the amount of issues triggered and made
>> it impossible to debug the lockdep integration without clearing out
>> a large portion of existing bugs.
>>
>> This patch adds a new option since it will horribly break any system
>> booting with it due to the amount of issues that it uncovers. This is
>> more of a "call for help" to other mm/ hackers to help clean it up.
>>
>> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
>> ---
>>   include/linux/pagemap.h | 11 +++++++++++
>>   lib/Kconfig.debug       |  9 +++++++++
>>   mm/filemap.c            |  4 +++-
>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 1710d1b..da24939 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>>   	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>>   }
>>
>> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
>> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
>> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
>> +#else
>> +#define VM_ASSERT_LOCKED(page) do { } while (0)
>> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
>> +#endif
>> +
>>   extern void __lock_page(struct page *page);
>>   extern int __lock_page_killable(struct page *page);
>>   extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>>
>>   static inline void __set_page_locked(struct page *page)
>>   {
>> +	VM_ASSERT_UNLOCKED(page);
>>   	__set_bit(PG_locked, &page->flags);
>>   }
>>
>>   static inline void __clear_page_locked(struct page *page)
>>   {
>> +	VM_ASSERT_LOCKED(page);
>>   	__clear_bit(PG_locked, &page->flags);
>>   }
>>
>>   static inline int trylock_page(struct page *page)
>>   {
>> +	VM_ASSERT_UNLOCKED(page);
>
> This is not correct. It's perfectly fine if the page is locked here: it's
> why trylock needed.
>
> IIUC, what we want to catch is the case when the page has already locked
> by the task.

Frankly, we shouldn't have trylock_page() at all.

Look at page_referenced() for example. Instead of assuming that it has to be
called with page lock held, it's trying to acquire the lock and to free it only
if it's the one that allocated it.

Why isn't there a VM_BUG_ON() there to test whether the page is locked, and let
the callers handle that?

>
> I don't think it's reasonable to re-implement this functionality. We
> really need to hook up lockdep.

The issue with adding lockdep straight away is that we need to deal with
long held page locks somehow nicely. Unlike regular locks, these may be
held for a rather long while, triggering really long locking chains which
lockdep doesn't really like.

Many places lock a long list of pages in bulk - we could allow that with
nesting, but then you lose your ability to detect trivial deadlocks.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2013-12-30 16:33     ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-30 16:33 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: akpm, linux-mm, linux-kernel

On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
> On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
>> We've recently stumbled on several issues with the page lock which
>> triggered BUG_ON()s.
>>
>> While working on them, it was clear that due to the complexity of
>> locking its pretty hard to figure out if something is supposed
>> to be locked or not, and if we encountered a race it was quite a
>> pain narrowing it down.
>>
>> This is an attempt at solving this situation. This patch adds simple
>> asserts to catch cases where someone is trying to lock the page lock
>> while it's already locked, and cases to catch someone unlocking the
>> lock without it being held.
>>
>> My initial patch attempted to use lockdep to get further coverege,
>> but that attempt uncovered the amount of issues triggered and made
>> it impossible to debug the lockdep integration without clearing out
>> a large portion of existing bugs.
>>
>> This patch adds a new option since it will horribly break any system
>> booting with it due to the amount of issues that it uncovers. This is
>> more of a "call for help" to other mm/ hackers to help clean it up.
>>
>> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
>> ---
>>   include/linux/pagemap.h | 11 +++++++++++
>>   lib/Kconfig.debug       |  9 +++++++++
>>   mm/filemap.c            |  4 +++-
>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 1710d1b..da24939 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>>   	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>>   }
>>
>> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
>> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
>> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
>> +#else
>> +#define VM_ASSERT_LOCKED(page) do { } while (0)
>> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
>> +#endif
>> +
>>   extern void __lock_page(struct page *page);
>>   extern int __lock_page_killable(struct page *page);
>>   extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>>
>>   static inline void __set_page_locked(struct page *page)
>>   {
>> +	VM_ASSERT_UNLOCKED(page);
>>   	__set_bit(PG_locked, &page->flags);
>>   }
>>
>>   static inline void __clear_page_locked(struct page *page)
>>   {
>> +	VM_ASSERT_LOCKED(page);
>>   	__clear_bit(PG_locked, &page->flags);
>>   }
>>
>>   static inline int trylock_page(struct page *page)
>>   {
>> +	VM_ASSERT_UNLOCKED(page);
>
> This is not correct. It's perfectly fine if the page is locked here: it's
> why trylock needed.
>
> IIUC, what we want to catch is the case when the page has already locked
> by the task.

Frankly, we shouldn't have trylock_page() at all.

Look at page_referenced() for example. Instead of assuming that it has to be
called with page lock held, it's trying to acquire the lock and to free it only
if it's the one that allocated it.

Why isn't there a VM_BUG_ON() there to test whether the page is locked, and let
the callers handle that?

>
> I don't think it's reasonable to re-implement this functionality. We
> really need to hook up lockdep.

The issue with adding lockdep straight away is that we need to deal with
long held page locks somehow nicely. Unlike regular locks, these may be
held for a rather long while, triggering really long locking chains which
lockdep doesn't really like.

Many places lock a long list of pages in bulk - we could allow that with
nesting, but then you lose your ability to detect trivial deadlocks.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-30 16:33     ` Sasha Levin
@ 2013-12-30 22:48       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2013-12-30 22:48 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, linux-mm, linux-kernel

On Mon, Dec 30, 2013 at 11:33:47AM -0500, Sasha Levin wrote:
> On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
> >On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
> >>We've recently stumbled on several issues with the page lock which
> >>triggered BUG_ON()s.
> >>
> >>While working on them, it was clear that due to the complexity of
> >>locking its pretty hard to figure out if something is supposed
> >>to be locked or not, and if we encountered a race it was quite a
> >>pain narrowing it down.
> >>
> >>This is an attempt at solving this situation. This patch adds simple
> >>asserts to catch cases where someone is trying to lock the page lock
> >>while it's already locked, and cases to catch someone unlocking the
> >>lock without it being held.
> >>
> >>My initial patch attempted to use lockdep to get further coverege,
> >>but that attempt uncovered the amount of issues triggered and made
> >>it impossible to debug the lockdep integration without clearing out
> >>a large portion of existing bugs.
> >>
> >>This patch adds a new option since it will horribly break any system
> >>booting with it due to the amount of issues that it uncovers. This is
> >>more of a "call for help" to other mm/ hackers to help clean it up.
> >>
> >>Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> >>---
> >>  include/linux/pagemap.h | 11 +++++++++++
> >>  lib/Kconfig.debug       |  9 +++++++++
> >>  mm/filemap.c            |  4 +++-
> >>  3 files changed, 23 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>index 1710d1b..da24939 100644
> >>--- a/include/linux/pagemap.h
> >>+++ b/include/linux/pagemap.h
> >>@@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
> >>  	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> >>  }
> >>
> >>+#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
> >>+#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
> >>+#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
> >>+#else
> >>+#define VM_ASSERT_LOCKED(page) do { } while (0)
> >>+#define VM_ASSERT_UNLOCKED(page) do { } while (0)
> >>+#endif
> >>+
> >>  extern void __lock_page(struct page *page);
> >>  extern int __lock_page_killable(struct page *page);
> >>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> >>@@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
> >>
> >>  static inline void __set_page_locked(struct page *page)
> >>  {
> >>+	VM_ASSERT_UNLOCKED(page);
> >>  	__set_bit(PG_locked, &page->flags);
> >>  }
> >>
> >>  static inline void __clear_page_locked(struct page *page)
> >>  {
> >>+	VM_ASSERT_LOCKED(page);
> >>  	__clear_bit(PG_locked, &page->flags);
> >>  }
> >>
> >>  static inline int trylock_page(struct page *page)
> >>  {
> >>+	VM_ASSERT_UNLOCKED(page);
> >
> >This is not correct. It's perfectly fine if the page is locked here: it's
> >why trylock needed.
> >
> >IIUC, what we want to catch is the case when the page has already locked
> >by the task.
> 
> Frankly, we shouldn't have trylock_page() at all.

It has valid use cases: if you don't want to sleep on lock, just give up
right away. Like grab_cache_page_nowait().

> Look at page_referenced() for example. Instead of assuming that it has to be
> called with page lock held, it's trying to acquire the lock and to free it only
> if it's the one that allocated it.

I agree here. page_referenced() looks badly.

> Why isn't there a VM_BUG_ON() there to test whether the page is locked, and let
> the callers handle that?

At the moment trylock_page() is part of lock_page(), so you'll hit it all
the time: on any contention.

> >I don't think it's reasonable to re-implement this functionality. We
> >really need to hook up lockdep.
> 
> The issue with adding lockdep straight away is that we need to deal with
> long held page locks somehow nicely. Unlike regular locks, these may be
> held for a rather long while, triggering really long locking chains which
> lockdep doesn't really like.
> 
> Many places lock a long list of pages in bulk - we could allow that with
> nesting, but then you lose your ability to detect trivial deadlocks.

I see. But we need something better then plain VM_BUG() to be able to
analyze what happened.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2013-12-30 22:48       ` Kirill A. Shutemov
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2013-12-30 22:48 UTC (permalink / raw)
  To: Sasha Levin; +Cc: akpm, linux-mm, linux-kernel

On Mon, Dec 30, 2013 at 11:33:47AM -0500, Sasha Levin wrote:
> On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
> >On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
> >>We've recently stumbled on several issues with the page lock which
> >>triggered BUG_ON()s.
> >>
> >>While working on them, it was clear that due to the complexity of
> >>locking its pretty hard to figure out if something is supposed
> >>to be locked or not, and if we encountered a race it was quite a
> >>pain narrowing it down.
> >>
> >>This is an attempt at solving this situation. This patch adds simple
> >>asserts to catch cases where someone is trying to lock the page lock
> >>while it's already locked, and cases to catch someone unlocking the
> >>lock without it being held.
> >>
> >>My initial patch attempted to use lockdep to get further coverege,
> >>but that attempt uncovered the amount of issues triggered and made
> >>it impossible to debug the lockdep integration without clearing out
> >>a large portion of existing bugs.
> >>
> >>This patch adds a new option since it will horribly break any system
> >>booting with it due to the amount of issues that it uncovers. This is
> >>more of a "call for help" to other mm/ hackers to help clean it up.
> >>
> >>Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> >>---
> >>  include/linux/pagemap.h | 11 +++++++++++
> >>  lib/Kconfig.debug       |  9 +++++++++
> >>  mm/filemap.c            |  4 +++-
> >>  3 files changed, 23 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>index 1710d1b..da24939 100644
> >>--- a/include/linux/pagemap.h
> >>+++ b/include/linux/pagemap.h
> >>@@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
> >>  	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> >>  }
> >>
> >>+#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
> >>+#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
> >>+#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
> >>+#else
> >>+#define VM_ASSERT_LOCKED(page) do { } while (0)
> >>+#define VM_ASSERT_UNLOCKED(page) do { } while (0)
> >>+#endif
> >>+
> >>  extern void __lock_page(struct page *page);
> >>  extern int __lock_page_killable(struct page *page);
> >>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> >>@@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
> >>
> >>  static inline void __set_page_locked(struct page *page)
> >>  {
> >>+	VM_ASSERT_UNLOCKED(page);
> >>  	__set_bit(PG_locked, &page->flags);
> >>  }
> >>
> >>  static inline void __clear_page_locked(struct page *page)
> >>  {
> >>+	VM_ASSERT_LOCKED(page);
> >>  	__clear_bit(PG_locked, &page->flags);
> >>  }
> >>
> >>  static inline int trylock_page(struct page *page)
> >>  {
> >>+	VM_ASSERT_UNLOCKED(page);
> >
> >This is not correct. It's perfectly fine if the page is locked here: it's
> >why trylock needed.
> >
> >IIUC, what we want to catch is the case when the page has already locked
> >by the task.
> 
> Frankly, we shouldn't have trylock_page() at all.

It has valid use cases: if you don't want to sleep on lock, just give up
right away. Like grab_cache_page_nowait().

> Look at page_referenced() for example. Instead of assuming that it has to be
> called with page lock held, it's trying to acquire the lock and to free it only
> if it's the one that allocated it.

I agree here. page_referenced() looks badly.

> Why isn't there a VM_BUG_ON() there to test whether the page is locked, and let
> the callers handle that?

At the moment trylock_page() is part of lock_page(), so you'll hit it all
the time: on any contention.

> >I don't think it's reasonable to re-implement this functionality. We
> >really need to hook up lockdep.
> 
> The issue with adding lockdep straight away is that we need to deal with
> long held page locks somehow nicely. Unlike regular locks, these may be
> held for a rather long while, triggering really long locking chains which
> lockdep doesn't really like.
> 
> Many places lock a long list of pages in bulk - we could allow that with
> nesting, but then you lose your ability to detect trivial deadlocks.

I see. But we need something better then plain VM_BUG() to be able to
analyze what happened.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-30 22:48       ` Kirill A. Shutemov
@ 2013-12-31  3:22         ` Sasha Levin
  -1 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-31  3:22 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: akpm, linux-mm, LKML, Ingo Molnar, Peter Zijlstra

On 12/30/2013 05:48 PM, Kirill A. Shutemov wrote:
> On Mon, Dec 30, 2013 at 11:33:47AM -0500, Sasha Levin wrote:
>> On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
>>> On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
>>>> We've recently stumbled on several issues with the page lock which
>>>> triggered BUG_ON()s.
>>>>
>>>> While working on them, it was clear that due to the complexity of
>>>> locking its pretty hard to figure out if something is supposed
>>>> to be locked or not, and if we encountered a race it was quite a
>>>> pain narrowing it down.
>>>>
>>>> This is an attempt at solving this situation. This patch adds simple
>>>> asserts to catch cases where someone is trying to lock the page lock
>>>> while it's already locked, and cases to catch someone unlocking the
>>>> lock without it being held.
>>>>
>>>> My initial patch attempted to use lockdep to get further coverege,
>>>> but that attempt uncovered the amount of issues triggered and made
>>>> it impossible to debug the lockdep integration without clearing out
>>>> a large portion of existing bugs.
>>>>
>>>> This patch adds a new option since it will horribly break any system
>>>> booting with it due to the amount of issues that it uncovers. This is
>>>> more of a "call for help" to other mm/ hackers to help clean it up.
>>>>
>>>> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
>>>> ---
>>>>   include/linux/pagemap.h | 11 +++++++++++
>>>>   lib/Kconfig.debug       |  9 +++++++++
>>>>   mm/filemap.c            |  4 +++-
>>>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>> index 1710d1b..da24939 100644
>>>> --- a/include/linux/pagemap.h
>>>> +++ b/include/linux/pagemap.h
>>>> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>>>>   	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>>>>   }
>>>>
>>>> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
>>>> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
>>>> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
>>>> +#else
>>>> +#define VM_ASSERT_LOCKED(page) do { } while (0)
>>>> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
>>>> +#endif
>>>> +
>>>>   extern void __lock_page(struct page *page);
>>>>   extern int __lock_page_killable(struct page *page);
>>>>   extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>>>> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>>>>
>>>>   static inline void __set_page_locked(struct page *page)
>>>>   {
>>>> +	VM_ASSERT_UNLOCKED(page);
>>>>   	__set_bit(PG_locked, &page->flags);
>>>>   }
>>>>
>>>>   static inline void __clear_page_locked(struct page *page)
>>>>   {
>>>> +	VM_ASSERT_LOCKED(page);
>>>>   	__clear_bit(PG_locked, &page->flags);
>>>>   }
>>>>
>>>>   static inline int trylock_page(struct page *page)
>>>>   {
>>>> +	VM_ASSERT_UNLOCKED(page);
>>>
>>> This is not correct. It's perfectly fine if the page is locked here: it's
>>> why trylock needed.
>>>
>>> IIUC, what we want to catch is the case when the page has already locked
>>> by the task.
>>
>> Frankly, we shouldn't have trylock_page() at all.
>
> It has valid use cases: if you don't want to sleep on lock, just give up
> right away. Like grab_cache_page_nowait().

Agreed. We should be checking if the same process is the one holding the lock in
general. I'll address that in the next version.

[snip]

>> Many places lock a long list of pages in bulk - we could allow that with
>> nesting, but then you lose your ability to detect trivial deadlocks.
>
> I see. But we need something better then plain VM_BUG() to be able to
> analyze what happened.

I really want to use lockdep here, but I'm not really sure how to handle locks which live
for a rather long while instead of being locked and unlocked in the same function like
most of the rest of the kernel. (Cc Ingo, PeterZ).

Thanks,
Sasha


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2013-12-31  3:22         ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-31  3:22 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: akpm, linux-mm, LKML, Ingo Molnar, Peter Zijlstra

On 12/30/2013 05:48 PM, Kirill A. Shutemov wrote:
> On Mon, Dec 30, 2013 at 11:33:47AM -0500, Sasha Levin wrote:
>> On 12/30/2013 06:43 AM, Kirill A. Shutemov wrote:
>>> On Sat, Dec 28, 2013 at 08:45:03PM -0500, Sasha Levin wrote:
>>>> We've recently stumbled on several issues with the page lock which
>>>> triggered BUG_ON()s.
>>>>
>>>> While working on them, it was clear that due to the complexity of
>>>> locking its pretty hard to figure out if something is supposed
>>>> to be locked or not, and if we encountered a race it was quite a
>>>> pain narrowing it down.
>>>>
>>>> This is an attempt at solving this situation. This patch adds simple
>>>> asserts to catch cases where someone is trying to lock the page lock
>>>> while it's already locked, and cases to catch someone unlocking the
>>>> lock without it being held.
>>>>
>>>> My initial patch attempted to use lockdep to get further coverege,
>>>> but that attempt uncovered the amount of issues triggered and made
>>>> it impossible to debug the lockdep integration without clearing out
>>>> a large portion of existing bugs.
>>>>
>>>> This patch adds a new option since it will horribly break any system
>>>> booting with it due to the amount of issues that it uncovers. This is
>>>> more of a "call for help" to other mm/ hackers to help clean it up.
>>>>
>>>> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
>>>> ---
>>>>   include/linux/pagemap.h | 11 +++++++++++
>>>>   lib/Kconfig.debug       |  9 +++++++++
>>>>   mm/filemap.c            |  4 +++-
>>>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>> index 1710d1b..da24939 100644
>>>> --- a/include/linux/pagemap.h
>>>> +++ b/include/linux/pagemap.h
>>>> @@ -321,6 +321,14 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>>>>   	return pgoff >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>>>>   }
>>>>
>>>> +#ifdef CONFIG_DEBUG_VM_PAGE_LOCKS
>>>> +#define VM_ASSERT_LOCKED(page) VM_BUG_ON_PAGE(!PageLocked(page), (page))
>>>> +#define VM_ASSERT_UNLOCKED(page) VM_BUG_ON_PAGE(PageLocked(page), (page))
>>>> +#else
>>>> +#define VM_ASSERT_LOCKED(page) do { } while (0)
>>>> +#define VM_ASSERT_UNLOCKED(page) do { } while (0)
>>>> +#endif
>>>> +
>>>>   extern void __lock_page(struct page *page);
>>>>   extern int __lock_page_killable(struct page *page);
>>>>   extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
>>>> @@ -329,16 +337,19 @@ extern void unlock_page(struct page *page);
>>>>
>>>>   static inline void __set_page_locked(struct page *page)
>>>>   {
>>>> +	VM_ASSERT_UNLOCKED(page);
>>>>   	__set_bit(PG_locked, &page->flags);
>>>>   }
>>>>
>>>>   static inline void __clear_page_locked(struct page *page)
>>>>   {
>>>> +	VM_ASSERT_LOCKED(page);
>>>>   	__clear_bit(PG_locked, &page->flags);
>>>>   }
>>>>
>>>>   static inline int trylock_page(struct page *page)
>>>>   {
>>>> +	VM_ASSERT_UNLOCKED(page);
>>>
>>> This is not correct. It's perfectly fine if the page is locked here: it's
>>> why trylock needed.
>>>
>>> IIUC, what we want to catch is the case when the page has already locked
>>> by the task.
>>
>> Frankly, we shouldn't have trylock_page() at all.
>
> It has valid use cases: if you don't want to sleep on lock, just give up
> right away. Like grab_cache_page_nowait().

Agreed. We should be checking if the same process is the one holding the lock in
general. I'll address that in the next version.

[snip]

>> Many places lock a long list of pages in bulk - we could allow that with
>> nesting, but then you lose your ability to detect trivial deadlocks.
>
> I see. But we need something better then plain VM_BUG() to be able to
> analyze what happened.

I really want to use lockdep here, but I'm not really sure how to handle locks which live
for a rather long while instead of being locked and unlocked in the same function like
most of the rest of the kernel. (Cc Ingo, PeterZ).

Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-31  3:22         ` Sasha Levin
@ 2013-12-31 16:26           ` Peter Zijlstra
  -1 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2013-12-31 16:26 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

On Mon, Dec 30, 2013 at 10:22:02PM -0500, Sasha Levin wrote:

> I really want to use lockdep here, but I'm not really sure how to handle locks which live
> for a rather long while instead of being locked and unlocked in the same function like
> most of the rest of the kernel. (Cc Ingo, PeterZ).

Uh what? Lockdep doesn't care about which function locks and unlocks a
particular lock. Nor does it care how long its held for.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2013-12-31 16:26           ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2013-12-31 16:26 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

On Mon, Dec 30, 2013 at 10:22:02PM -0500, Sasha Levin wrote:

> I really want to use lockdep here, but I'm not really sure how to handle locks which live
> for a rather long while instead of being locked and unlocked in the same function like
> most of the rest of the kernel. (Cc Ingo, PeterZ).

Uh what? Lockdep doesn't care about which function locks and unlocks a
particular lock. Nor does it care how long its held for.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-31 16:26           ` Peter Zijlstra
@ 2013-12-31 16:42             ` Sasha Levin
  -1 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-31 16:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

On 12/31/2013 11:26 AM, Peter Zijlstra wrote:
> On Mon, Dec 30, 2013 at 10:22:02PM -0500, Sasha Levin wrote:
>
>> I really want to use lockdep here, but I'm not really sure how to handle locks which live
>> for a rather long while instead of being locked and unlocked in the same function like
>> most of the rest of the kernel. (Cc Ingo, PeterZ).
>
> Uh what? Lockdep doesn't care about which function locks and unlocks a
> particular lock. Nor does it care how long its held for.

Sorry, I messed up trying to explain that.

There are several places in the code which lock a large amount of pages, something like:

	for (i = 0; i < (1 << order); i++)
		lock_page(&pages[i]);


This triggers two problems:

  - lockdep complains about deadlock since we try to lock another page while one is already
locked. I can clear that by allowing page locks to nest within each other, but that seems
wrong and we'll miss actual deadlock cases.

  - We may leave back to userspace with pages still locked. This is valid behaviour but lockdep
doesn't like that.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2013-12-31 16:42             ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2013-12-31 16:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

On 12/31/2013 11:26 AM, Peter Zijlstra wrote:
> On Mon, Dec 30, 2013 at 10:22:02PM -0500, Sasha Levin wrote:
>
>> I really want to use lockdep here, but I'm not really sure how to handle locks which live
>> for a rather long while instead of being locked and unlocked in the same function like
>> most of the rest of the kernel. (Cc Ingo, PeterZ).
>
> Uh what? Lockdep doesn't care about which function locks and unlocks a
> particular lock. Nor does it care how long its held for.

Sorry, I messed up trying to explain that.

There are several places in the code which lock a large amount of pages, something like:

	for (i = 0; i < (1 << order); i++)
		lock_page(&pages[i]);


This triggers two problems:

  - lockdep complains about deadlock since we try to lock another page while one is already
locked. I can clear that by allowing page locks to nest within each other, but that seems
wrong and we'll miss actual deadlock cases.

  - We may leave back to userspace with pages still locked. This is valid behaviour but lockdep
doesn't like that.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2013-12-31 16:42             ` Sasha Levin
@ 2014-01-06 10:04               ` Peter Zijlstra
  -1 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2014-01-06 10:04 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

On Tue, Dec 31, 2013 at 11:42:04AM -0500, Sasha Levin wrote:
> On 12/31/2013 11:26 AM, Peter Zijlstra wrote:
> >On Mon, Dec 30, 2013 at 10:22:02PM -0500, Sasha Levin wrote:
> >
> >>I really want to use lockdep here, but I'm not really sure how to handle locks which live
> >>for a rather long while instead of being locked and unlocked in the same function like
> >>most of the rest of the kernel. (Cc Ingo, PeterZ).
> >
> >Uh what? Lockdep doesn't care about which function locks and unlocks a
> >particular lock. Nor does it care how long its held for.
> 
> Sorry, I messed up trying to explain that.
> 
> There are several places in the code which lock a large amount of pages, something like:
> 
> 	for (i = 0; i < (1 << order); i++)
> 		lock_page(&pages[i]);
> 
> 
> This triggers two problems:
> 
>  - lockdep complains about deadlock since we try to lock another page while one is already
> locked. I can clear that by allowing page locks to nest within each other, but that seems
> wrong and we'll miss actual deadlock cases.

Right,.. I think we can cobble something together like requiring we
always lock pages in pfn order or somesuch.

>  - We may leave back to userspace with pages still locked. This is valid behaviour but lockdep
> doesn't like that.

Where do we actually do this? BTW its not only lockdep not liking that,
Linus was actually a big fan of that check.

ISTR there being some filesystem freezer issues with that too, where the
freeze ioctl would return to userspace with 'locks' held and that's
cobbled around (or maybe gone by now -- who knows).

My initial guess would be that this is AIO/DIO again, those two seem to
be responsible for the majority of ugly around there.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2014-01-06 10:04               ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2014-01-06 10:04 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

On Tue, Dec 31, 2013 at 11:42:04AM -0500, Sasha Levin wrote:
> On 12/31/2013 11:26 AM, Peter Zijlstra wrote:
> >On Mon, Dec 30, 2013 at 10:22:02PM -0500, Sasha Levin wrote:
> >
> >>I really want to use lockdep here, but I'm not really sure how to handle locks which live
> >>for a rather long while instead of being locked and unlocked in the same function like
> >>most of the rest of the kernel. (Cc Ingo, PeterZ).
> >
> >Uh what? Lockdep doesn't care about which function locks and unlocks a
> >particular lock. Nor does it care how long its held for.
> 
> Sorry, I messed up trying to explain that.
> 
> There are several places in the code which lock a large amount of pages, something like:
> 
> 	for (i = 0; i < (1 << order); i++)
> 		lock_page(&pages[i]);
> 
> 
> This triggers two problems:
> 
>  - lockdep complains about deadlock since we try to lock another page while one is already
> locked. I can clear that by allowing page locks to nest within each other, but that seems
> wrong and we'll miss actual deadlock cases.

Right,.. I think we can cobble something together like requiring we
always lock pages in pfn order or somesuch.

>  - We may leave back to userspace with pages still locked. This is valid behaviour but lockdep
> doesn't like that.

Where do we actually do this? BTW its not only lockdep not liking that,
Linus was actually a big fan of that check.

ISTR there being some filesystem freezer issues with that too, where the
freeze ioctl would return to userspace with 'locks' held and that's
cobbled around (or maybe gone by now -- who knows).

My initial guess would be that this is AIO/DIO again, those two seem to
be responsible for the majority of ugly around there.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
  2014-01-06 10:04               ` Peter Zijlstra
@ 2014-02-10 23:19                 ` Sasha Levin
  -1 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2014-02-10 23:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

>> This triggers two problems:
>>
>>   - lockdep complains about deadlock since we try to lock another page while one is already
>> locked. I can clear that by allowing page locks to nest within each other, but that seems
>> wrong and we'll miss actual deadlock cases.
>
> Right,.. I think we can cobble something together like requiring we
> always lock pages in pfn order or somesuch.
>

Sorry, I went ahead to dig into mm/lockdep based on your comments and noticed I forgot to reply
to this mail.

Getting them to lock in pfn order seems to be a bit of a mess since we need to keep the free
lists sorted. I didn't find a nice way of doing it without having to do insertion sort which slows
everything down.

>>   - We may leave back to userspace with pages still locked. This is valid behaviour but lockdep
>> doesn't like that.
>
> Where do we actually do this? BTW its not only lockdep not liking that,
> Linus was actually a big fan of that check.
>
> ISTR there being some filesystem freezer issues with that too, where the
> freeze ioctl would return to userspace with 'locks' held and that's
> cobbled around (or maybe gone by now -- who knows).
>
> My initial guess would be that this is AIO/DIO again, those two seem to
> be responsible for the majority of ugly around there.

Indeed, the block layer has multiple "violations". In the AIO case, we lock pages in one task
and leave back to userspace, and those pages get unlocked by a completion thread which runs at
some point later.

Right now I gave up on getting lockdep fully integrated in, and am trying to fix as many of these 
issues as possible by detecting trivial cases and fixing those. I feel that adding lockdep in at
this point is way more complex than what we need done. We don't really need lockdep to detect pretty
trivial cases of double locking on the very same lock...

When we got rid of everything we can easily spot, lockdep should move in to detect anything more
complex.

Thanks,
Sasha


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/2] mm: additional page lock debugging
@ 2014-02-10 23:19                 ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2014-02-10 23:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kirill A. Shutemov, akpm, linux-mm, LKML, Ingo Molnar

>> This triggers two problems:
>>
>>   - lockdep complains about deadlock since we try to lock another page while one is already
>> locked. I can clear that by allowing page locks to nest within each other, but that seems
>> wrong and we'll miss actual deadlock cases.
>
> Right,.. I think we can cobble something together like requiring we
> always lock pages in pfn order or somesuch.
>

Sorry, I went ahead to dig into mm/lockdep based on your comments and noticed I forgot to reply
to this mail.

Getting them to lock in pfn order seems to be a bit of a mess since we need to keep the free
lists sorted. I didn't find a nice way of doing it without having to do insertion sort which slows
everything down.

>>   - We may leave back to userspace with pages still locked. This is valid behaviour but lockdep
>> doesn't like that.
>
> Where do we actually do this? BTW its not only lockdep not liking that,
> Linus was actually a big fan of that check.
>
> ISTR there being some filesystem freezer issues with that too, where the
> freeze ioctl would return to userspace with 'locks' held and that's
> cobbled around (or maybe gone by now -- who knows).
>
> My initial guess would be that this is AIO/DIO again, those two seem to
> be responsible for the majority of ugly around there.

Indeed, the block layer has multiple "violations". In the AIO case, we lock pages in one task
and leave back to userspace, and those pages get unlocked by a completion thread which runs at
some point later.

Right now I gave up on getting lockdep fully integrated in, and am trying to fix as many of these 
issues as possible by detecting trivial cases and fixing those. I feel that adding lockdep in at
this point is way more complex than what we need done. We don't really need lockdep to detect pretty
trivial cases of double locking on the very same lock...

When we got rid of everything we can easily spot, lockdep should move in to detect anything more
complex.

Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-02-10 23:26 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-29  1:45 [RFC 1/2] mm: additional page lock debugging Sasha Levin
2013-12-29  1:45 ` Sasha Levin
2013-12-29  1:45 ` [RFC 2/2] mm: additional checks to page flag set/clear Sasha Levin
2013-12-29  1:45   ` Sasha Levin
2013-12-30 12:12   ` Kirill A. Shutemov
2013-12-30 12:12     ` Kirill A. Shutemov
2013-12-30 11:43 ` [RFC 1/2] mm: additional page lock debugging Kirill A. Shutemov
2013-12-30 11:43   ` Kirill A. Shutemov
2013-12-30 16:33   ` Sasha Levin
2013-12-30 16:33     ` Sasha Levin
2013-12-30 22:48     ` Kirill A. Shutemov
2013-12-30 22:48       ` Kirill A. Shutemov
2013-12-31  3:22       ` Sasha Levin
2013-12-31  3:22         ` Sasha Levin
2013-12-31 16:26         ` Peter Zijlstra
2013-12-31 16:26           ` Peter Zijlstra
2013-12-31 16:42           ` Sasha Levin
2013-12-31 16:42             ` Sasha Levin
2014-01-06 10:04             ` Peter Zijlstra
2014-01-06 10:04               ` Peter Zijlstra
2014-02-10 23:19               ` Sasha Levin
2014-02-10 23:19                 ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.