From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751446AbcFAUxG (ORCPT ); Wed, 1 Jun 2016 16:53:06 -0400 Received: from mail-pf0-f177.google.com ([209.85.192.177]:32922 "EHLO mail-pf0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751104AbcFAUxD (ORCPT ); Wed, 1 Jun 2016 16:53:03 -0400 Subject: Re: [PATCH] mm: check the return value of lookup_page_ext for all call sites To: Minchan Kim , Joonsoo Kim , Kamezawa Hiroyuki References: <1464023768-31025-1-git-send-email-yang.shi@linaro.org> <20160524025811.GA29094@bbox> <20160526003719.GB9661@bbox> <8ae0197c-47b7-e5d2-20c3-eb9d01e6b65c@linaro.org> <20160527051432.GF2322@bbox> <20160527060839.GC13661@js1304-P5Q-DELUXE> <20160527081108.GG2322@bbox> <20160530053906.GA25079@js1304-P5Q-DELUXE> <20160530060803.GA28624@bbox> Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linaro-kernel@lists.linaro.org, Tang Chen , Yasuaki Ishimatsu From: "Shi, Yang" Message-ID: Date: Wed, 1 Jun 2016 13:52:59 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160530060803.GA28624@bbox> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/29/2016 11:08 PM, Minchan Kim wrote: > On Mon, May 30, 2016 at 02:39:06PM +0900, Joonsoo Kim wrote: >> On Fri, May 27, 2016 at 05:11:08PM +0900, Minchan Kim wrote: >>> On Fri, May 27, 2016 at 03:08:39PM +0900, Joonsoo Kim wrote: >>>> On Fri, May 27, 2016 at 02:14:32PM +0900, Minchan Kim wrote: >>>>> On Thu, May 26, 2016 at 04:15:28PM -0700, Shi, Yang wrote: >>>>>> On 5/25/2016 5:37 PM, Minchan Kim wrote: >>>>>>> On Tue, May 24, 2016 at 11:58:11AM +0900, Minchan Kim wrote: >>>>>>>> On Mon, May 23, 2016 at 10:16:08AM -0700, Yang Shi wrote: >>>>>>>>> Per the discussion with Joonsoo Kim [1], we need check the return value of >>>>>>>>> lookup_page_ext() for all call sites since it might return NULL in some cases, >>>>>>>>> although it is unlikely, i.e. memory hotplug. >>>>>>>>> >>>>>>>>> Tested with ltp with "page_owner=0". >>>>>>>>> >>>>>>>>> [1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE >>>>>>>>> >>>>>>>>> Signed-off-by: Yang Shi >>>>>>>> >>>>>>>> I didn't read code code in detail to see how page_ext memory space >>>>>>>> allocated in boot code and memory hotplug but to me, it's not good >>>>>>>> to check NULL whenever we calls lookup_page_ext. >>>>>>>> >>>>>>>> More dangerous thing is now page_ext is used by optionable feature(ie, not >>>>>>>> critical for system stability) but if we want to use page_ext as >>>>>>>> another important tool for the system in future, >>>>>>>> it could be a serious problem. >>>> >>>> Hello, Minchan. >>> >>> Hi Joonsoo, >>> >>>> >>>> I wonder how pages that isn't managed by kernel yet will cause serious >>>> problem. Until onlining, these pages are out of our scope. Any >>>> information about them would be useless until it is actually >>>> activated. I guess that returning NULL for those pages will not hurt >>>> any functionality. Do you have any possible scenario that this causes the >>>> serious problem? >>> >>> I don't have any specific usecase now. That's why I said "in future". >>> And I don't want to argue whether there is possible scenario or not >>> to make the feature useful but if you want, I should write novel. >>> One of example, pop up my mind, xen, hv and even memory_hotplug itself >>> might want to use page_ext for their functionality extension to hook >>> guest pages. >> >> There is no detail so I can't guess how to use it and how it causes >> the serious problem. But, we can do it when it is really needed. >> >>> >>> My opinion is that page_ext is extension of struct page so it would >>> be better to allow any operation on struct page without any limitation >>> if we can do it. Whether it's useful or useless depend on random >>> usecase and we don't need to limit that way from the beginning. >> >> If there is no drawback, it would be a better design. But, we have >> trade-off that for some case that the memory is added but not >> onlined, there is memory saving if we allocate page_ext later. >> So, in current situation that there is no user to require such >> guarantee, I don't think it's worth doing right now. >> >>> However, current design allows deferred page_ext population so any user >>> of page_ext should keep it in mind and should either make fallback plan >>> or don't use page_ext for those cases. If we decide go this way through >>> discussion, at least, we should make such limitation more clear to >>> somewhere in this chance, maybe around page_ext_operation->need comment. >> >> Agreed. > > Okay, We realized from this discussion that by design, guest of page_ext > at the meoment should know his page_ext access from the page can be failed > so every caller should prepare for it. > > Shi, Yang, Please include some comment about that in your patch to > prevent further reviewer waste his time with repeating same discussion > and client of page_ext can know the limitation. > >> >>> My comment's point is that we should consider that way at least. It's >>> worth to discuss pros and cons, what's the best and what makes that way >>> hesitate if we can't. >> >> Yes, your suggestion would be good for future direction, but, for now, >> I think that inserting NULL to all callers is right fix. >> >> 1) Current design that page_ext is allocated when online is design >> decision of page_ext to save memory as much as possible. Fixing >> possible problem within this design decision looks good to me. > > Okay. Shi Yang, please include this comment in your patch, too. There is already such comment in lookup_page_ext: * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are * allocated when feeding a range of pages to the allocator * for the first time during bootup or memory hotplug. I will add more details, i.e. newly added but not onlined memory could reach here too. > >> >> 2) Maybe, we need to backport fixes because it would crash older >> kernels. In this case, fix with NULL is easy to backport. > > Agreed. > > Then, Shi Yang need to mark the page as stable. Agreed. > Shi, Please resend your patch with hard testing and more better > description with marking it as stable. Will come with an incremental patch to remove CONFIG_DEBUG #ifdef and fix the improve the comments since the comments are included by it. Thanks, Yang > And I know another race problem about Shi's patch. > I will reply to the thread. > >> >>>> >>>> And, allocation such memory space doesn't come from free. If someone >>>> just add the memory device and don't online it, these memory will be >>> >>> Here goes several questions. >>> Cced hotplug guys >>> >>> 1. >>> If someone just add the memory device without onlining, kernel code >>> can return pfn_valid == true on the offlined page? >> >> AFAIK, yes. >>> >>> 2. >>> If so, it means memmap on offline memory is already populated somewhere. >>> Where is the memmap allocated? part of offlined memory space or other memory? >> >> Other memory. >> >>> 3. Could we allocate page_ext in part of offline memory space so that >>> it doesn't consume online memory. >>> >>>> wasted. I don't know if there is such a usecase but it's possible >>>> scenario. >>> >>>> >>>>>>>> >>>>>>>> Can we put some hooks of page_ext into memory-hotplug so guarantee >>>>>>>> that page_ext memory space is allocated with memmap space at the >>>>>>>> same time? IOW, once every PFN wakers find a page is valid, page_ext >>>>>>>> is valid, too so lookup_page_ext never returns NULL on valid page >>>>>>>> by design. >>>>>>>> >>>>>>>> I hope we consider this direction, too. >>>>>>> >>>>>>> Yang, Could you think about this? >>>>>> >>>>>> Thanks a lot for the suggestion. Sorry for the late reply, I was >>>>>> busy on preparing patches. I do agree this is a direction we should >>>>>> look into, but I haven't got time to think about it deeper. I hope >>>>>> Joonsoo could chime in too since he is the original author for page >>>>>> extension. >>>>>> >>>>>>> >>>>>>> Even, your patch was broken, I think. >>>>>>> It doesn't work with !CONFIG_DEBUG_VM && !CONFIG_PAGE_POISONING because >>>>>>> lookup_page_ext doesn't return NULL in that case. >>>>>> >>>>>> Actually, I think the #ifdef should be removed if lookup_page_ext() >>>>>> is possible to return NULL. It sounds not make sense returning NULL >>>>>> only when DEBUG_VM is enabled. It should return NULL no matter what >>>>>> debug config is selected. If Joonsoo agrees with me I'm going to >>>>>> come up with a patch to fix it. >>>> >>>> Agreed but let's wait for Minchan's response. >>> >>> If we goes this way, how to guarantee this race? >>> >>> kpageflags_read >>> stable_page_flags >>> page_is_idle >>> lookup_page_ext >>> section = __pfn_to_section(pfn) >>> offline_pages >>> memory_notify(MEM_OFFLINE) >>> offline_page_ext >>> ms->page_ext = NULL >>> section->page_ext + pfn >> >> I think that it is a fundamental problem of memory hotplug. >> There is similar race with struct page for offlined memory. >> >> >> >> kpageflags_read >> pfn_valid >> remove_memory >> stable_page_flags >> crash! >> >> I already reported similar race problem to memory hotplug guys but >> didn't get any answer. >> >> lkml.kernel.org/r/20151221031501.GA32524@js1304-P5Q-DELUXE > > Who's in charge of memory-hotplug? Kame, Could you nudge him? > >> >> Thanks.