linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: kirill.shutemov@linux.intel.com, hughd@google.com,
	aarcange@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: khugepaged: fix potential page state corruption
Date: Wed, 18 Mar 2020 16:47:42 -0700	[thread overview]
Message-ID: <b7891211-a08a-453f-83c4-76889d6ec9fe@linux.alibaba.com> (raw)
In-Reply-To: <20200318234036.aw3awl25gxrjd2jl@box>



On 3/18/20 4:40 PM, Kirill A. Shutemov wrote:
> On Thu, Mar 19, 2020 at 07:19:42AM +0800, Yang Shi wrote:
>> When khugepaged collapses anonymous pages, the base pages would be freed
>> via pagevec or free_page_and_swap_cache().  But, the anonymous page may
>> be added back to LRU, then it might result in the below race:
>>
>> 	CPU A				CPU B
>> khugepaged:
>>    unlock page
>>    putback_lru_page
>>      add to lru
>> 				page reclaim:
>> 				  isolate this page
>> 				  try_to_unmap
>>    page_remove_rmap <-- corrupt _mapcount
>>
>> It looks nothing would prevent the pages from isolating by reclaimer.
>>
>> The other problem is the page's active or unevictable flag might be
>> still set when freeing the page via free_page_and_swap_cache().  The
>> putback_lru_page() would not clear those two flags if the pages are
>> released via pagevec, it sounds nothing prevents from isolating active
>> or unevictable pages.
>>
>> However I didn't really run into these problems, just in theory by visual
>> inspection.
>>
>> And, it also seems unnecessary to have the pages add back to LRU again since
>> they are about to be freed when reaching this point.  So, clearing active
>> and unevictable flags, unlocking and dropping refcount from isolate
>> instead of calling putback_lru_page() as what page cache collapse does.
>>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>>   mm/khugepaged.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b679908..f42fa4e 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -673,7 +673,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>   			src_page = pte_page(pteval);
>>   			copy_user_highpage(page, src_page, address, vma);
>>   			VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page);
>> -			release_pte_page(src_page);
>>   			/*
>>   			 * ptl mostly unnecessary, but preempt has to
>>   			 * be disabled to update the per-cpu stats
>> @@ -687,6 +686,15 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>   			pte_clear(vma->vm_mm, address, _pte);
>>   			page_remove_rmap(src_page, false);
>>   			spin_unlock(ptl);
>> +
>> +			dec_node_page_state(src_page,
>> +				NR_ISOLATED_ANON + page_is_file_cache(src_page));
>> +			ClearPageActive(src_page);
>> +			ClearPageUnevictable(src_page);
>> +			unlock_page(src_page);
>> +			/* Drop refcount from isolate */
>> +			put_page(src_page);
>> +
>>   			free_page_and_swap_cache(src_page);
>>   		}
>>   	}
> I will look at this closer tomorrow, but looks like an easier fix is to
> move release_pte_page() under ptl that we take just after it.

Thanks, Kirill. However, it looks putting the page back to lru is not 
necessary. And, it also sounds not very efficient to do this under ptl 
IMHO since it may spins to wait for lru lock.

>



  reply	other threads:[~2020-03-18 23:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18 23:19 [PATCH] mm: khugepaged: fix potential page state corruption Yang Shi
2020-03-18 23:40 ` Kirill A. Shutemov
2020-03-18 23:47   ` Yang Shi [this message]
2020-03-19  0:12 ` Kirill A. Shutemov
2020-03-19  0:55   ` Yang Shi
2020-03-19  5:39     ` Yang Shi
2020-03-19 10:49       ` Kirill A. Shutemov
2020-03-19 16:57         ` Yang Shi
2020-03-19 17:22           ` Yang Shi
2020-03-20 11:45           ` Kirill A. Shutemov
2020-03-20 16:34             ` Yang Shi
2020-03-24 17:17         ` Yang Shi
2020-03-25 11:26           ` Kirill A. Shutemov
2020-03-25 18:42             ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7891211-a08a-453f-83c4-76889d6ec9fe@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).