From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> To: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org Cc: Michal Hocko <mhocko@suse.com>, Linus Torvalds <torvalds@linux-foundation.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Nicholas Piggin <npiggin@gmail.com> Subject: [PATCH] mm/huge_memory.c: split should clone page flags before unfreezing pageref Date: Sun, 11 Feb 2018 13:35:17 +0300 [thread overview] Message-ID: <151834531706.176342.14968581451762734122.stgit@buzz> (raw) THP split makes non-atomic change of tail page flags. This is almost ok because tail pages are locked and isolated but this breaks recent changes in page locking: non-atomic operation could clear bit PG_waiters. As a result concurrent sequence get_page_unless_zero() -> lock_page() might block forever. Especially if this page was truncated later. Fix is trivial: clone flags before unfreezing page reference counter. This race exists since commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") while unsave unfreeze itself was added in commit 8df651c7059e ("thp: cleanup split_huge_page()"). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- mm/huge_memory.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..2b38d9f2f262 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2357,6 +2357,19 @@ static void __split_huge_page_tail(struct page *head, int tail, VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail); VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); + /* Clone page flags before unfreezing refcount. */ + page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; + page_tail->flags |= (head->flags & + ((1L << PG_referenced) | + (1L << PG_swapbacked) | + (1L << PG_swapcache) | + (1L << PG_mlocked) | + (1L << PG_uptodate) | + (1L << PG_active) | + (1L << PG_locked) | + (1L << PG_unevictable) | + (1L << PG_dirty))); + /* * tail_page->_refcount is zero and not changing from under us. But * get_page_unless_zero() may be running from under us on the @@ -2375,18 +2388,6 @@ static void __split_huge_page_tail(struct page *head, int tail, page_ref_add(page_tail, 2); } - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; - page_tail->flags |= (head->flags & - ((1L << PG_referenced) | - (1L << PG_swapbacked) | - (1L << PG_swapcache) | - (1L << PG_mlocked) | - (1L << PG_uptodate) | - (1L << PG_active) | - (1L << PG_locked) | - (1L << PG_unevictable) | - (1L << PG_dirty))); - /* * After clearing PageTail the gup refcount can be released. * Page flags also must be visible before we make the page non-compound.
WARNING: multiple messages have this Message-ID (diff)
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> To: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org Cc: Michal Hocko <mhocko@suse.com>, Linus Torvalds <torvalds@linux-foundation.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Nicholas Piggin <npiggin@gmail.com> Subject: [PATCH] mm/huge_memory.c: split should clone page flags before unfreezing pageref Date: Sun, 11 Feb 2018 13:35:17 +0300 [thread overview] Message-ID: <151834531706.176342.14968581451762734122.stgit@buzz> (raw) THP split makes non-atomic change of tail page flags. This is almost ok because tail pages are locked and isolated but this breaks recent changes in page locking: non-atomic operation could clear bit PG_waiters. As a result concurrent sequence get_page_unless_zero() -> lock_page() might block forever. Especially if this page was truncated later. Fix is trivial: clone flags before unfreezing page reference counter. This race exists since commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") while unsave unfreeze itself was added in commit 8df651c7059e ("thp: cleanup split_huge_page()"). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- mm/huge_memory.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..2b38d9f2f262 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2357,6 +2357,19 @@ static void __split_huge_page_tail(struct page *head, int tail, VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail); VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); + /* Clone page flags before unfreezing refcount. */ + page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; + page_tail->flags |= (head->flags & + ((1L << PG_referenced) | + (1L << PG_swapbacked) | + (1L << PG_swapcache) | + (1L << PG_mlocked) | + (1L << PG_uptodate) | + (1L << PG_active) | + (1L << PG_locked) | + (1L << PG_unevictable) | + (1L << PG_dirty))); + /* * tail_page->_refcount is zero and not changing from under us. But * get_page_unless_zero() may be running from under us on the @@ -2375,18 +2388,6 @@ static void __split_huge_page_tail(struct page *head, int tail, page_ref_add(page_tail, 2); } - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; - page_tail->flags |= (head->flags & - ((1L << PG_referenced) | - (1L << PG_swapbacked) | - (1L << PG_swapcache) | - (1L << PG_mlocked) | - (1L << PG_uptodate) | - (1L << PG_active) | - (1L << PG_locked) | - (1L << PG_unevictable) | - (1L << PG_dirty))); - /* * After clearing PageTail the gup refcount can be released. * Page flags also must be visible before we make the page non-compound. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2018-02-11 10:35 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-02-11 10:35 Konstantin Khlebnikov [this message] 2018-02-11 10:35 ` [PATCH] mm/huge_memory.c: split should clone page flags before unfreezing pageref Konstantin Khlebnikov 2018-02-11 11:07 ` Kirill A. Shutemov 2018-02-11 11:07 ` Kirill A. Shutemov 2018-02-11 13:13 ` Konstantin Khlebnikov 2018-02-11 13:13 ` Konstantin Khlebnikov 2018-02-11 14:29 ` [PATCH v2] mm/huge_memory.c: reorder operations in __split_huge_page_tail() Konstantin Khlebnikov 2018-02-11 14:29 ` Konstantin Khlebnikov 2018-02-11 15:14 ` Kirill A. Shutemov 2018-02-11 15:14 ` Kirill A. Shutemov 2018-02-11 15:32 ` Konstantin Khlebnikov 2018-02-11 15:32 ` Konstantin Khlebnikov 2018-02-11 15:47 ` Kirill A. Shutemov 2018-02-11 15:47 ` Kirill A. Shutemov 2018-02-11 15:55 ` Konstantin Khlebnikov 2018-02-11 15:55 ` Konstantin Khlebnikov 2018-02-11 20:09 ` Matthew Wilcox 2018-02-11 20:09 ` Matthew Wilcox 2018-02-12 13:58 ` [PATCH v3 1/2] mm/page_ref: use atomic_set_release in page_ref_unfreeze Konstantin Khlebnikov 2018-02-12 13:58 ` Konstantin Khlebnikov 2018-02-12 14:07 ` Kirill A. Shutemov 2018-02-12 14:07 ` Kirill A. Shutemov 2018-02-12 13:58 ` [PATCH v3 2/2] mm/huge_memory.c: reorder operations in __split_huge_page_tail() Konstantin Khlebnikov 2018-02-12 13:58 ` Konstantin Khlebnikov 2018-02-12 14:11 ` Kirill A. Shutemov 2018-02-12 14:11 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=151834531706.176342.14968581451762734122.stgit@buzz \ --to=khlebnikov@yandex-team.ru \ --cc=akpm@linux-foundation.org \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.com \ --cc=npiggin@gmail.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.