linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] Potential race condition with page lock
@ 2019-02-11 12:53 Chintan Pandya
  2019-02-11 12:53 ` [RFC 1/2] page-flags: Make page lock operation atomic Chintan Pandya
  2019-02-11 12:53 ` [RFC 2/2] page-flags: Catch the double setter of page flags Chintan Pandya
  0 siblings, 2 replies; 11+ messages in thread
From: Chintan Pandya @ 2019-02-11 12:53 UTC (permalink / raw)
  To: Linux Upstream, hughd, peterz, jack, mawilcox, akpm
  Cc: linux-kernel, linux-mm, Chintan Pandya

In 4.14 kernel, observed following 2 BUG_ON(!PageLocked(page)) scenarios.
Both looks to be having similar cause.

Case: 1
[127823.176076] try_to_free_buffers+0xfc/0x108 (BUG_ON(), page lock was freed
                                               somehow)
[127823.176079] jbd2_journal_try_to_free_buffers+0x15c/0x194 (page lock was
                                              available till this function)
[127823.176083] ext4_releasepage+0xe0/0x110 
[127823.176087] try_to_release_page+0x68/0x90 (page lock was available till
                                              this function)
[127823.176090] invalidate_inode_page+0x94/0xa8
[127823.176093] invalidate_mapping_pages_without_uidlru+0xec/0x1a4 (page lock
                                              taken here)
...
...

Case: 2
[<ffffff9547a82fb0>] el1_dbg+0x18
[<ffffff9547bfb544>] __remove_mapping+0x160  (BUG_ON(), page lock is not
                                             available. Some one might have
                                             free'd that.)
[<ffffff9547bfb3c8>] remove_mapping+0x28
[<ffffff9547bf8404>] invalidate_inode_page+0xa4
[<ffffff9547bf8bcc>] invalidate_mapping_pages+0xd4  (acquired the page lock)
[<ffffff9547c7f934>] inode_lru_isolate+0x128
[<ffffff9547c1b500>] __list_lru_walk_one+0x10c
[<ffffff9547c1b3e0>] list_lru_walk_one+0x58
[<ffffff9547c7f7d4>] prune_icache_sb+0x50
[<ffffff9547c64fbc>] super_cache_scan+0xfc
[<ffffff9547bfb17c>] shrink_slab+0x304
[<ffffff9547bffb38>] shrink_node+0x254
[<ffffff9547bfd4fc>] do_try_to_free_pages+0x144
[<ffffff9547bfd2d8>] try_to_free_pages+0x390
[<ffffff9547bebb80>] __alloc_pages_nodemask+0x940
[<ffffff9547becedc>] __get_free_pages+0x28
[<ffffff9547cd6870>] proc_pid_readlink+0x6c
[<ffffff9547c7075c>] vfs_readlink+0x124
[<ffffff9547c66374>] SyS_readlinkat+0xc8
[<ffffff9547a83818>] __sys_trace_return+0x0

Both the scenarios say that current stack tried taking page lock but got
released in meantime by someone else. There could be 2 possiblities here.

1) Someone trying to update page flags and due to race condition, PG_locked
   bit got cleared, unwantedly.
2) Someone else took the lock without checking if it is really locked or not
   as there are explicit APIs to set PG_locked.

I didn't get traces of history for having PG_locked being set non-atomically.
I believe it could be because of performance reasons. Not sure though.

Chintan Pandya (2):
  page-flags: Make page lock operation atomic
  page-flags: Catch the double setter of page flags

 fs/cifs/file.c             | 8 ++++----
 fs/pipe.c                  | 2 +-
 include/linux/page-flags.h | 4 ++--
 include/linux/pagemap.h    | 6 +++---
 mm/filemap.c               | 4 ++--
 mm/khugepaged.c            | 2 +-
 mm/ksm.c                   | 2 +-
 mm/memory-failure.c        | 2 +-
 mm/memory.c                | 2 +-
 mm/migrate.c               | 2 +-
 mm/shmem.c                 | 6 +++---
 mm/swap_state.c            | 4 ++--
 mm/vmscan.c                | 2 +-
 13 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-11 12:53 [RFC 0/2] Potential race condition with page lock Chintan Pandya
@ 2019-02-11 12:53 ` Chintan Pandya
  2019-02-11 13:46   ` Peter Zijlstra
  2019-02-11 12:53 ` [RFC 2/2] page-flags: Catch the double setter of page flags Chintan Pandya
  1 sibling, 1 reply; 11+ messages in thread
From: Chintan Pandya @ 2019-02-11 12:53 UTC (permalink / raw)
  To: Linux Upstream, hughd, peterz, jack, mawilcox, akpm
  Cc: linux-kernel, linux-mm, Chintan Pandya

Currently, page lock operation is non-atomic. This is opening
some scope for race condition. For ex, if 2 threads are accessing
same page flags, it may happen that our desired thread's page
lock bit (PG_locked) might get overwritten by other thread
leaving page unlocked. This can cause issues later when some
code expects page to be locked but it is not.

Make page lock/unlock operation use the atomic version of
set_bit API. There are other flag set operations which still
uses non-atomic version of set_bit API. Bit, that might be
the change for the future.

Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e
Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
---
 fs/cifs/file.c             | 8 ++++----
 fs/pipe.c                  | 2 +-
 include/linux/page-flags.h | 2 +-
 include/linux/pagemap.h    | 6 +++---
 mm/filemap.c               | 4 ++--
 mm/khugepaged.c            | 2 +-
 mm/ksm.c                   | 2 +-
 mm/memory-failure.c        | 2 +-
 mm/memory.c                | 2 +-
 mm/migrate.c               | 2 +-
 mm/shmem.c                 | 6 +++---
 mm/swap_state.c            | 4 ++--
 mm/vmscan.c                | 2 +-
 13 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 7d6539a04fac..23bcdee37239 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3661,13 +3661,13 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 	 * should have access to this page, we're safe to simply set
 	 * PG_locked without checking it first.
 	 */
-	__SetPageLocked(page);
+	SetPageLocked(page);
 	rc = add_to_page_cache_locked(page, mapping,
 				      page->index, gfp);
 
 	/* give up if we can't stick it in the cache */
 	if (rc) {
-		__ClearPageLocked(page);
+		ClearPageLocked(page);
 		return rc;
 	}
 
@@ -3688,9 +3688,9 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
 		if (*bytes + PAGE_SIZE > rsize)
 			break;
 
-		__SetPageLocked(page);
+		SetPageLocked(page);
 		if (add_to_page_cache_locked(page, mapping, page->index, gfp)) {
-			__ClearPageLocked(page);
+			ClearPageLocked(page);
 			break;
 		}
 		list_move_tail(&page->lru, tmplist);
diff --git a/fs/pipe.c b/fs/pipe.c
index 8ef7d7bef775..1bab40a2ca44 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -147,7 +147,7 @@ static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
 	if (page_count(page) == 1) {
 		if (memcg_kmem_enabled())
 			memcg_kmem_uncharge(page, 0);
-		__SetPageLocked(page);
+		SetPageLocked(page);
 		return 0;
 	}
 	return 1;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 5af67406b9c9..a56a9bd4bc6b 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -268,7 +268,7 @@ static inline int TestClearPage##uname(struct page *page) { return 0; }
 #define TESTSCFLAG_FALSE(uname)						\
 	TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
 PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, PF_NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 51a9a0af3281..87a0447cfbe0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -619,17 +619,17 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
- * the page is new, so we can just run __SetPageLocked() against it.
+ * the page is new, so we can just run SetPageLocked() against it.
  */
 static inline int add_to_page_cache(struct page *page,
 		struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask)
 {
 	int error;
 
-	__SetPageLocked(page);
+	SetPageLocked(page);
 	error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
 	if (unlikely(error))
-		__ClearPageLocked(page);
+		ClearPageLocked(page);
 	return error;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 8e09304af1ec..14284726cf3a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -807,11 +807,11 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 	void *shadow = NULL;
 	int ret;
 
-	__SetPageLocked(page);
+	SetPageLocked(page);
 	ret = __add_to_page_cache_locked(page, mapping, offset,
 					 gfp_mask, &shadow);
 	if (unlikely(ret))
-		__ClearPageLocked(page);
+		ClearPageLocked(page);
 	else {
 		/*
 		 * The page might have been evicted from cache only
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index aaae33402d61..2e8f5bfa066d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1341,7 +1341,7 @@ static void collapse_shmem(struct mm_struct *mm,
 	new_page->index = start;
 	new_page->mapping = mapping;
 	__SetPageSwapBacked(new_page);
-	__SetPageLocked(new_page);
+	SetPageLocked(new_page);
 	BUG_ON(!page_ref_freeze(new_page, 1));
 
 
diff --git a/mm/ksm.c b/mm/ksm.c
index 31e6420c209b..115091798a6d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2531,7 +2531,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
 
 		SetPageDirty(new_page);
 		__SetPageUptodate(new_page);
-		__SetPageLocked(new_page);
+		SetPageLocked(new_page);
 	}
 
 	return new_page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 42d8fa64cebc..1a7c31b7d7e3 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1209,7 +1209,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
 	/*
 	 * We ignore non-LRU pages for good reasons.
 	 * - PG_locked is only well defined for LRU pages and a few others
-	 * - to avoid races with __SetPageLocked()
+	 * - to avoid races with SetPageLocked()
 	 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
 	 * The check (unnecessarily) ignores LRU pages being isolated and
 	 * walked by the page reclaim code, however that's not a big loss.
diff --git a/mm/memory.c b/mm/memory.c
index 8b9e5dd20d0c..9d7b107025e7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3102,7 +3102,7 @@ int do_swap_page(struct vm_fault *vmf)
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
 							vmf->address);
 			if (page) {
-				__SetPageLocked(page);
+				SetPageLocked(page);
 				__SetPageSwapBacked(page);
 				set_page_private(page, entry.val);
 				lru_cache_add_anon(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index 12d821ff8401..1b9ed5ca5e8e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2037,7 +2037,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	}
 
 	/* Prepare a page as a migration target */
-	__SetPageLocked(new_page);
+	SetPageLocked(new_page);
 	if (PageSwapBacked(page))
 		__SetPageSwapBacked(new_page);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 8c8af1440184..3305312c7557 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1501,7 +1501,7 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
 	else
 		page = shmem_alloc_page(gfp, info, index);
 	if (page) {
-		__SetPageLocked(page);
+		SetPageLocked(page);
 		__SetPageSwapBacked(page);
 		return page;
 	}
@@ -1554,7 +1554,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	copy_highpage(newpage, oldpage);
 	flush_dcache_page(newpage);
 
-	__SetPageLocked(newpage);
+	SetPageLocked(newpage);
 	__SetPageSwapBacked(newpage);
 	SetPageUptodate(newpage);
 	set_page_private(newpage, swap_index);
@@ -2277,7 +2277,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	}
 
 	VM_BUG_ON(PageLocked(page) || PageSwapBacked(page));
-	__SetPageLocked(page);
+	SetPageLocked(page);
 	__SetPageSwapBacked(page);
 	__SetPageUptodate(page);
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bec3d214084b..caa652f1d8a6 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -480,7 +480,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		}
 
 		/* May fail (-ENOMEM) if radix-tree node allocation failed. */
-		__SetPageLocked(new_page);
+		SetPageLocked(new_page);
 		__SetPageSwapBacked(new_page);
 		err = __add_to_swap_cache(new_page, entry);
 		if (likely(!err)) {
@@ -498,7 +498,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 			return new_page;
 		}
 		radix_tree_preload_end();
-		__ClearPageLocked(new_page);
+		ClearPageLocked(new_page);
 		/*
 		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
 		 * clear SWAP_HAS_CACHE flag.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ead2c52008fa..c01aa130b9ba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1577,7 +1577,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * we obviously don't have to worry about waking up a process
 		 * waiting on the page lock, because there are no references.
 		 */
-		__ClearPageLocked(page);
+		ClearPageLocked(page);
 free_it:
 		nr_reclaimed++;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC 2/2] page-flags: Catch the double setter of page flags
  2019-02-11 12:53 [RFC 0/2] Potential race condition with page lock Chintan Pandya
  2019-02-11 12:53 ` [RFC 1/2] page-flags: Make page lock operation atomic Chintan Pandya
@ 2019-02-11 12:53 ` Chintan Pandya
  2019-02-11 13:47   ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Chintan Pandya @ 2019-02-11 12:53 UTC (permalink / raw)
  To: Linux Upstream, hughd, peterz, jack, mawilcox, akpm
  Cc: linux-kernel, linux-mm, Chintan Pandya

Some of the page flags, like PG_locked is not supposed to
be set twice. Currently, there is no protection around this
and many callers directly tries to set this bit. Others
follow trylock_page() which is much safer version of the
same. But, for performance issues, we may not want to
implement wait-until-set. So, at least, find out who is
doing double setting and fix them.

Change-Id: I1295fcb8527ce4b54d5d11c11287fc7516006cf0
Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
---
 include/linux/page-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a56a9bd4bc6b..e307775c2b4a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -208,7 +208,7 @@ static __always_inline int Page##uname(struct page *page)		\
 
 #define SETPAGEFLAG(uname, lname, policy)				\
 static __always_inline void SetPage##uname(struct page *page)		\
-	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
+	{ WARN_ON(test_and_set_bit(PG_##lname, &policy(page, 1)->flags)); }
 
 #define CLEARPAGEFLAG(uname, lname, policy)				\
 static __always_inline void ClearPage##uname(struct page *page)		\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-11 12:53 ` [RFC 1/2] page-flags: Make page lock operation atomic Chintan Pandya
@ 2019-02-11 13:46   ` Peter Zijlstra
  2019-02-11 13:59     ` Linux Upstream
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2019-02-11 13:46 UTC (permalink / raw)
  To: Chintan Pandya
  Cc: Linux Upstream, hughd, jack, mawilcox, akpm, linux-kernel, linux-mm

On Mon, Feb 11, 2019 at 12:53:53PM +0000, Chintan Pandya wrote:
> Currently, page lock operation is non-atomic. This is opening
> some scope for race condition. For ex, if 2 threads are accessing
> same page flags, it may happen that our desired thread's page
> lock bit (PG_locked) might get overwritten by other thread
> leaving page unlocked. This can cause issues later when some
> code expects page to be locked but it is not.
> 
> Make page lock/unlock operation use the atomic version of
> set_bit API. There are other flag set operations which still
> uses non-atomic version of set_bit API. Bit, that might be
> the change for the future.
> 
> Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e

That doesn't belong in patches.

> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>

NAK.

This is bound to regress some stuff. Now agreed that using non-atomic
ops is tricky, but many are in places where we 'know' there can't be
concurrency.

If you can show any single one is wrong, we can fix that one, but we're
not going to blanket remove all this just because.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] page-flags: Catch the double setter of page flags
  2019-02-11 12:53 ` [RFC 2/2] page-flags: Catch the double setter of page flags Chintan Pandya
@ 2019-02-11 13:47   ` Peter Zijlstra
  2019-02-11 14:01     ` Linux Upstream
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2019-02-11 13:47 UTC (permalink / raw)
  To: Chintan Pandya
  Cc: Linux Upstream, hughd, jack, mawilcox, akpm, linux-kernel, linux-mm

On Mon, Feb 11, 2019 at 12:53:55PM +0000, Chintan Pandya wrote:
> Some of the page flags, like PG_locked is not supposed to
> be set twice. Currently, there is no protection around this
> and many callers directly tries to set this bit. Others
> follow trylock_page() which is much safer version of the
> same. But, for performance issues, we may not want to
> implement wait-until-set. So, at least, find out who is
> doing double setting and fix them.
> 
> Change-Id: I1295fcb8527ce4b54d5d11c11287fc7516006cf0
> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
> ---
>  include/linux/page-flags.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index a56a9bd4bc6b..e307775c2b4a 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -208,7 +208,7 @@ static __always_inline int Page##uname(struct page *page)		\
>  
>  #define SETPAGEFLAG(uname, lname, policy)				\
>  static __always_inline void SetPage##uname(struct page *page)		\
> -	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
> +	{ WARN_ON(test_and_set_bit(PG_##lname, &policy(page, 1)->flags)); }

You forgot to make this depend on CONFIG_DEBUG_VM. Also, I'm not
convinced this is always wrong, inefficient sure, but not wrong in
general.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-11 13:46   ` Peter Zijlstra
@ 2019-02-11 13:59     ` Linux Upstream
  2019-02-11 17:48       ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Linux Upstream @ 2019-02-11 13:59 UTC (permalink / raw)
  To: Peter Zijlstra, Chintan Pandya
  Cc: hughd, jack, mawilcox, akpm, linux-kernel, linux-mm



On 11/02/19 7:16 PM, Peter Zijlstra wrote:
> On Mon, Feb 11, 2019 at 12:53:53PM +0000, Chintan Pandya wrote:
>> Currently, page lock operation is non-atomic. This is opening
>> some scope for race condition. For ex, if 2 threads are accessing
>> same page flags, it may happen that our desired thread's page
>> lock bit (PG_locked) might get overwritten by other thread
>> leaving page unlocked. This can cause issues later when some
>> code expects page to be locked but it is not.
>>
>> Make page lock/unlock operation use the atomic version of
>> set_bit API. There are other flag set operations which still
>> uses non-atomic version of set_bit API. Bit, that might be
>> the change for the future.
>>
>> Change-Id: I13bdbedc2b198af014d885e1925c93b83ed6660e
> 
> That doesn't belong in patches.

Sure. That's a miss. Will fix this.

> 
>> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
> 
> NAK.
> 
> This is bound to regress some stuff. Now agreed that using non-atomic
> ops is tricky, but many are in places where we 'know' there can't be
> concurrency.
> 
> If you can show any single one is wrong, we can fix that one, but we're
> not going to blanket remove all this just because.

Not quite familiar with below stack but from crash dump, found that this
was another stack running on some other CPU at the same time which also
updates page cache lru and manipulate locks.

[84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
[84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
[84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
[84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
[84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
[84415.344636] [20190123_21:27:50.786324]@1 
__do_page_cache_readahead+0x16c/0x280
[84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
[84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
[84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88

Not entirely sure if it's racing with the crashing stack or it's simply
overrides the the bit set by case 2 (mentioned in 0/2).
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 2/2] page-flags: Catch the double setter of page flags
  2019-02-11 13:47   ` Peter Zijlstra
@ 2019-02-11 14:01     ` Linux Upstream
  0 siblings, 0 replies; 11+ messages in thread
From: Linux Upstream @ 2019-02-11 14:01 UTC (permalink / raw)
  To: Peter Zijlstra, Chintan Pandya
  Cc: hughd, jack, mawilcox, akpm, linux-kernel, linux-mm



On 11/02/19 7:17 PM, Peter Zijlstra wrote:
> On Mon, Feb 11, 2019 at 12:53:55PM +0000, Chintan Pandya wrote:
>> Some of the page flags, like PG_locked is not supposed to
>> be set twice. Currently, there is no protection around this
>> and many callers directly tries to set this bit. Others
>> follow trylock_page() which is much safer version of the
>> same. But, for performance issues, we may not want to
>> implement wait-until-set. So, at least, find out who is
>> doing double setting and fix them.
>>
>> Change-Id: I1295fcb8527ce4b54d5d11c11287fc7516006cf0
>> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
>> ---
>>   include/linux/page-flags.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index a56a9bd4bc6b..e307775c2b4a 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -208,7 +208,7 @@ static __always_inline int Page##uname(struct page *page)		\
>>   
>>   #define SETPAGEFLAG(uname, lname, policy)				\
>>   static __always_inline void SetPage##uname(struct page *page)		\
>> -	{ set_bit(PG_##lname, &policy(page, 1)->flags); }
>> +	{ WARN_ON(test_and_set_bit(PG_##lname, &policy(page, 1)->flags)); }
> 
> You forgot to make this depend on CONFIG_DEBUG_VM. Also, I'm not
> convinced this is always wrong, inefficient sure, but not wrong in
> general.

Okay. Will protect this under CONFIG_DEBUG_VM.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-11 13:59     ` Linux Upstream
@ 2019-02-11 17:48       ` Jan Kara
  2019-02-11 17:56         ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2019-02-11 17:48 UTC (permalink / raw)
  To: Linux Upstream
  Cc: Peter Zijlstra, Chintan Pandya, hughd, jack, mawilcox, akpm,
	linux-kernel, linux-mm

On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > 
> >> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
> > 
> > NAK.
> > 
> > This is bound to regress some stuff. Now agreed that using non-atomic
> > ops is tricky, but many are in places where we 'know' there can't be
> > concurrency.
> > 
> > If you can show any single one is wrong, we can fix that one, but we're
> > not going to blanket remove all this just because.
> 
> Not quite familiar with below stack but from crash dump, found that this
> was another stack running on some other CPU at the same time which also
> updates page cache lru and manipulate locks.
> 
> [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
> [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
> [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> [84415.344636] [20190123_21:27:50.786324]@1 
> __do_page_cache_readahead+0x16c/0x280
> [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> 
> Not entirely sure if it's racing with the crashing stack or it's simply
> overrides the the bit set by case 2 (mentioned in 0/2).

So this is interesting. Looking at __add_to_page_cache_locked() nothing
seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
reordered into __add_to_page_cache_locked() after page is actually added to
the xarray. So that one particular instance might benefit from atomic
SetPageLocked or a barrier somewhere between __SetPageLocked() and the
actual addition of entry into the xarray.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-11 17:48       ` Jan Kara
@ 2019-02-11 17:56         ` Matthew Wilcox
  2019-02-12  7:45           ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2019-02-11 17:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Upstream, Peter Zijlstra, Chintan Pandya, hughd, mawilcox,
	akpm, linux-kernel, linux-mm

On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote:
> On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > > 
> > >> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
> > > 
> > > NAK.
> > > 
> > > This is bound to regress some stuff. Now agreed that using non-atomic
> > > ops is tricky, but many are in places where we 'know' there can't be
> > > concurrency.
> > > 
> > > If you can show any single one is wrong, we can fix that one, but we're
> > > not going to blanket remove all this just because.
> > 
> > Not quite familiar with below stack but from crash dump, found that this
> > was another stack running on some other CPU at the same time which also
> > updates page cache lru and manipulate locks.
> > 
> > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> > [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
> > [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
> > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> > [84415.344636] [20190123_21:27:50.786324]@1 
> > __do_page_cache_readahead+0x16c/0x280
> > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> > 
> > Not entirely sure if it's racing with the crashing stack or it's simply
> > overrides the the bit set by case 2 (mentioned in 0/2).
> 
> So this is interesting. Looking at __add_to_page_cache_locked() nothing
> seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
> reordered into __add_to_page_cache_locked() after page is actually added to
> the xarray. So that one particular instance might benefit from atomic
> SetPageLocked or a barrier somewhere between __SetPageLocked() and the
> actual addition of entry into the xarray.

There's a write barrier when you add something to the XArray, by virtue
of the call to rcu_assign_pointer().

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-11 17:56         ` Matthew Wilcox
@ 2019-02-12  7:45           ` Jan Kara
  2019-02-12 12:29             ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2019-02-12  7:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, Linux Upstream, Peter Zijlstra, Chintan Pandya, hughd,
	mawilcox, akpm, linux-kernel, linux-mm

On Mon 11-02-19 09:56:53, Matthew Wilcox wrote:
> On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote:
> > On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > > > 
> > > >> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
> > > > 
> > > > NAK.
> > > > 
> > > > This is bound to regress some stuff. Now agreed that using non-atomic
> > > > ops is tricky, but many are in places where we 'know' there can't be
> > > > concurrency.
> > > > 
> > > > If you can show any single one is wrong, we can fix that one, but we're
> > > > not going to blanket remove all this just because.
> > > 
> > > Not quite familiar with below stack but from crash dump, found that this
> > > was another stack running on some other CPU at the same time which also
> > > updates page cache lru and manipulate locks.
> > > 
> > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> > > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> > > [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
> > > [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
> > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> > > [84415.344636] [20190123_21:27:50.786324]@1 
> > > __do_page_cache_readahead+0x16c/0x280
> > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> > > 
> > > Not entirely sure if it's racing with the crashing stack or it's simply
> > > overrides the the bit set by case 2 (mentioned in 0/2).
> > 
> > So this is interesting. Looking at __add_to_page_cache_locked() nothing
> > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
> > reordered into __add_to_page_cache_locked() after page is actually added to
> > the xarray. So that one particular instance might benefit from atomic
> > SetPageLocked or a barrier somewhere between __SetPageLocked() and the
> > actual addition of entry into the xarray.
> 
> There's a write barrier when you add something to the XArray, by virtue
> of the call to rcu_assign_pointer().

OK, I've missed rcu_assign_pointer(). Thanks for correction... but...
rcu_assign_pointer() is __smp_store_release(&p, v) and that on x86 seems to
be:

        barrier();                                                      \
        WRITE_ONCE(*p, v);                                              \

which seems to provide a compiler barrier but not an SMP barrier? So is x86
store ordering strong enough to make writes appear in the right order? So far
I didn't think so... What am I missing?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 1/2] page-flags: Make page lock operation atomic
  2019-02-12  7:45           ` Jan Kara
@ 2019-02-12 12:29             ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2019-02-12 12:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Linux Upstream, Chintan Pandya, hughd, mawilcox,
	akpm, linux-kernel, linux-mm

On Tue, Feb 12, 2019 at 08:45:35AM +0100, Jan Kara wrote:
> On Mon 11-02-19 09:56:53, Matthew Wilcox wrote:
> > On Mon, Feb 11, 2019 at 06:48:46PM +0100, Jan Kara wrote:
> > > On Mon 11-02-19 13:59:24, Linux Upstream wrote:
> > > > > 
> > > > >> Signed-off-by: Chintan Pandya <chintan.pandya@oneplus.com>
> > > > > 
> > > > > NAK.
> > > > > 
> > > > > This is bound to regress some stuff. Now agreed that using non-atomic
> > > > > ops is tricky, but many are in places where we 'know' there can't be
> > > > > concurrency.
> > > > > 
> > > > > If you can show any single one is wrong, we can fix that one, but we're
> > > > > not going to blanket remove all this just because.
> > > > 
> > > > Not quite familiar with below stack but from crash dump, found that this
> > > > was another stack running on some other CPU at the same time which also
> > > > updates page cache lru and manipulate locks.
> > > > 
> > > > [84415.344577] [20190123_21:27:50.786264]@1 preempt_count_add+0xdc/0x184
> > > > [84415.344588] [20190123_21:27:50.786276]@1 workingset_refault+0xdc/0x268
> > > > [84415.344600] [20190123_21:27:50.786288]@1 add_to_page_cache_lru+0x84/0x11c
> > > > [84415.344612] [20190123_21:27:50.786301]@1 ext4_mpage_readpages+0x178/0x714
> > > > [84415.344625] [20190123_21:27:50.786313]@1 ext4_readpages+0x50/0x60
> > > > [84415.344636] [20190123_21:27:50.786324]@1 
> > > > __do_page_cache_readahead+0x16c/0x280
> > > > [84415.344646] [20190123_21:27:50.786334]@1 filemap_fault+0x41c/0x588
> > > > [84415.344655] [20190123_21:27:50.786343]@1 ext4_filemap_fault+0x34/0x50
> > > > [84415.344664] [20190123_21:27:50.786353]@1 __do_fault+0x28/0x88
> > > > 
> > > > Not entirely sure if it's racing with the crashing stack or it's simply
> > > > overrides the the bit set by case 2 (mentioned in 0/2).
> > > 
> > > So this is interesting. Looking at __add_to_page_cache_locked() nothing
> > > seems to prevent __SetPageLocked(page) in add_to_page_cache_lru() to get
> > > reordered into __add_to_page_cache_locked() after page is actually added to
> > > the xarray. So that one particular instance might benefit from atomic
> > > SetPageLocked or a barrier somewhere between __SetPageLocked() and the
> > > actual addition of entry into the xarray.
> > 
> > There's a write barrier when you add something to the XArray, by virtue
> > of the call to rcu_assign_pointer().
> 
> OK, I've missed rcu_assign_pointer(). Thanks for correction... but...
> rcu_assign_pointer() is __smp_store_release(&p, v) and that on x86 seems to
> be:
> 
>         barrier();                                                      \
>         WRITE_ONCE(*p, v);                                              \
> 
> which seems to provide a compiler barrier but not an SMP barrier? So is x86
> store ordering strong enough to make writes appear in the right order? So far
> I didn't think so... What am I missing?

X86 is TSO, and that is strong enough for this. The only visible
reordering on TSO is due to the store-buffer, that is: writes may happen
after later reads.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-12 12:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11 12:53 [RFC 0/2] Potential race condition with page lock Chintan Pandya
2019-02-11 12:53 ` [RFC 1/2] page-flags: Make page lock operation atomic Chintan Pandya
2019-02-11 13:46   ` Peter Zijlstra
2019-02-11 13:59     ` Linux Upstream
2019-02-11 17:48       ` Jan Kara
2019-02-11 17:56         ` Matthew Wilcox
2019-02-12  7:45           ` Jan Kara
2019-02-12 12:29             ` Peter Zijlstra
2019-02-11 12:53 ` [RFC 2/2] page-flags: Catch the double setter of page flags Chintan Pandya
2019-02-11 13:47   ` Peter Zijlstra
2019-02-11 14:01     ` Linux Upstream

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).