From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933121AbcFBXPM (ORCPT ); Thu, 2 Jun 2016 19:15:12 -0400 Received: from mail-pf0-f178.google.com ([209.85.192.178]:34749 "EHLO mail-pf0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932090AbcFBXPJ (ORCPT ); Thu, 2 Jun 2016 19:15:09 -0400 Subject: Re: [PATCH] mm: check the return value of lookup_page_ext for all call sites To: Minchan Kim References: <1464023768-31025-1-git-send-email-yang.shi@linaro.org> <20160524025811.GA29094@bbox> <20160526003719.GB9661@bbox> <8ae0197c-47b7-e5d2-20c3-eb9d01e6b65c@linaro.org> <20160527051432.GF2322@bbox> <20160527060839.GC13661@js1304-P5Q-DELUXE> <20160527081108.GG2322@bbox> <20160530061117.GB28624@bbox> <20160602050039.GA3304@bbox> Cc: Joonsoo Kim , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linaro-kernel@lists.linaro.org, Tang Chen , Yasuaki Ishimatsu , Kamezawa Hiroyuki From: "Shi, Yang" Message-ID: Date: Thu, 2 Jun 2016 16:15:02 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160602050039.GA3304@bbox> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/1/2016 10:00 PM, Minchan Kim wrote: > On Wed, Jun 01, 2016 at 01:40:48PM -0700, Shi, Yang wrote: >> On 5/29/2016 11:11 PM, Minchan Kim wrote: >>> On Fri, May 27, 2016 at 11:16:41AM -0700, Shi, Yang wrote: >>> >>> >>> >>>>> >>>>> If we goes this way, how to guarantee this race? >>>> >>>> Thanks for pointing out this. It sounds reasonable. However, this >>>> should be only possible to happen on 32 bit since just 32 bit >>>> version page_is_idle() calls lookup_page_ext(), it doesn't do it on >>>> 64 bit. >>>> >>>> And, such race condition should exist regardless of whether DEBUG_VM >>>> is enabled or not, right? >>>> >>>> rcu might be good enough to protect it. >>>> >>>> A quick fix may look like: >>>> >>>> diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h >>>> index 8f5d4ad..bf0cd6a 100644 >>>> --- a/include/linux/page_idle.h >>>> +++ b/include/linux/page_idle.h >>>> @@ -77,8 +77,12 @@ static inline bool >>>> test_and_clear_page_young(struct page *page) >>>> static inline bool page_is_idle(struct page *page) >>>> { >>>> struct page_ext *page_ext; >>>> + >>>> + rcu_read_lock(); >>>> page_ext = lookup_page_ext(page); >>>> + rcu_read_unlock(); >>>> + >>>> if (unlikely(!page_ext)) >>>> return false; >>>> >>>> diff --git a/mm/page_ext.c b/mm/page_ext.c >>>> index 56b160f..94927c9 100644 >>>> --- a/mm/page_ext.c >>>> +++ b/mm/page_ext.c >>>> @@ -183,7 +183,6 @@ struct page_ext *lookup_page_ext(struct page *page) >>>> { >>>> unsigned long pfn = page_to_pfn(page); >>>> struct mem_section *section = __pfn_to_section(pfn); >>>> -#if defined(CONFIG_DEBUG_VM) || defined(CONFIG_PAGE_POISONING) >>>> /* >>>> * The sanity checks the page allocator does upon freeing a >>>> * page can reach here before the page_ext arrays are >>>> @@ -195,7 +194,7 @@ struct page_ext *lookup_page_ext(struct page *page) >>>> */ >>>> if (!section->page_ext) >>>> return NULL; >>>> -#endif >>>> + >>>> return section->page_ext + pfn; >>>> } >>>> >>>> @@ -279,7 +278,8 @@ static void __free_page_ext(unsigned long pfn) >>>> return; >>>> base = ms->page_ext + pfn; >>>> free_page_ext(base); >>>> - ms->page_ext = NULL; >>>> + rcu_assign_pointer(ms->page_ext, NULL); >>>> + synchronize_rcu(); >>> >>> How does it fix the problem? >>> I cannot understand your point. >> >> Assigning NULL pointer to page_Ext will be blocked until >> rcu_read_lock critical section is done, so the lookup and writing >> operations will be serialized. And, rcu_read_lock disables preempt >> too. > > I meant your rcu_read_lock in page_idle should cover test_bit op. Yes, definitely. Thanks for catching it. > One more thing, you should use rcu_dereference. I will check which one is the best since I saw some use rcu_assign_pointer. > > As well, please cover memory onlining case I mentioned in another > thread as well as memory offlining. I will look into it too. Thanks, Yang > > Anyway, to me, every caller of page_ext should prepare lookup_page_ext > can return NULL anytime and they should use rcu_read_[un]lock, which > is not good. :( >