[RFC 0/2] Potential race condition with page lock

* [RFC 0/2] Potential race condition with page lock
@ 2019-02-11 12:53 Chintan Pandya
  2019-02-11 12:53 ` [RFC 1/2] page-flags: Make page lock operation atomic Chintan Pandya
  2019-02-11 12:53 ` [RFC 2/2] page-flags: Catch the double setter of page flags Chintan Pandya
  0 siblings, 2 replies; 11+ messages in thread
From: Chintan Pandya @ 2019-02-11 12:53 UTC (permalink / raw)
  To: Linux Upstream, hughd, peterz, jack, mawilcox, akpm
  Cc: linux-kernel, linux-mm, Chintan Pandya

In 4.14 kernel, observed following 2 BUG_ON(!PageLocked(page)) scenarios.
Both looks to be having similar cause.

Case: 1
[127823.176076] try_to_free_buffers+0xfc/0x108 (BUG_ON(), page lock was freed
                                               somehow)
[127823.176079] jbd2_journal_try_to_free_buffers+0x15c/0x194 (page lock was
                                              available till this function)
[127823.176083] ext4_releasepage+0xe0/0x110 
[127823.176087] try_to_release_page+0x68/0x90 (page lock was available till
                                              this function)
[127823.176090] invalidate_inode_page+0x94/0xa8
[127823.176093] invalidate_mapping_pages_without_uidlru+0xec/0x1a4 (page lock
                                              taken here)
...
...

Case: 2
[<ffffff9547a82fb0>] el1_dbg+0x18
[<ffffff9547bfb544>] __remove_mapping+0x160  (BUG_ON(), page lock is not
                                             available. Some one might have
                                             free'd that.)
[<ffffff9547bfb3c8>] remove_mapping+0x28
[<ffffff9547bf8404>] invalidate_inode_page+0xa4
[<ffffff9547bf8bcc>] invalidate_mapping_pages+0xd4  (acquired the page lock)
[<ffffff9547c7f934>] inode_lru_isolate+0x128
[<ffffff9547c1b500>] __list_lru_walk_one+0x10c
[<ffffff9547c1b3e0>] list_lru_walk_one+0x58
[<ffffff9547c7f7d4>] prune_icache_sb+0x50
[<ffffff9547c64fbc>] super_cache_scan+0xfc
[<ffffff9547bfb17c>] shrink_slab+0x304
[<ffffff9547bffb38>] shrink_node+0x254
[<ffffff9547bfd4fc>] do_try_to_free_pages+0x144
[<ffffff9547bfd2d8>] try_to_free_pages+0x390
[<ffffff9547bebb80>] __alloc_pages_nodemask+0x940
[<ffffff9547becedc>] __get_free_pages+0x28
[<ffffff9547cd6870>] proc_pid_readlink+0x6c
[<ffffff9547c7075c>] vfs_readlink+0x124
[<ffffff9547c66374>] SyS_readlinkat+0xc8
[<ffffff9547a83818>] __sys_trace_return+0x0

Both the scenarios say that current stack tried taking page lock but got
released in meantime by someone else. There could be 2 possiblities here.

1) Someone trying to update page flags and due to race condition, PG_locked
   bit got cleared, unwantedly.
2) Someone else took the lock without checking if it is really locked or not
   as there are explicit APIs to set PG_locked.

I didn't get traces of history for having PG_locked being set non-atomically.
I believe it could be because of performance reasons. Not sure though.

Chintan Pandya (2):
  page-flags: Make page lock operation atomic
  page-flags: Catch the double setter of page flags

 fs/cifs/file.c             | 8 ++++----
 fs/pipe.c                  | 2 +-
 include/linux/page-flags.h | 4 ++--
 include/linux/pagemap.h    | 6 +++---
 mm/filemap.c               | 4 ++--
 mm/khugepaged.c            | 2 +-
 mm/ksm.c                   | 2 +-
 mm/memory-failure.c        | 2 +-
 mm/memory.c                | 2 +-
 mm/migrate.c               | 2 +-
 mm/shmem.c                 | 6 +++---
 mm/swap_state.c            | 4 ++--
 mm/vmscan.c                | 2 +-
 13 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 11+ messages in thread