From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 768D4C28CBC for ; Wed, 6 May 2020 05:23:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 16618206E6 for ; Wed, 6 May 2020 05:23:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16618206E6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lge.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DF16A8E0005; Wed, 6 May 2020 01:23:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA0638E0003; Wed, 6 May 2020 01:23:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C90898E0005; Wed, 6 May 2020 01:23:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id AFCA38E0003 for ; Wed, 6 May 2020 01:23:32 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 688AE181AEF1A for ; Wed, 6 May 2020 05:23:32 +0000 (UTC) X-FDA: 76785151464.11.wind76_58f6e41fd6545 X-HE-Tag: wind76_58f6e41fd6545 X-Filterd-Recvd-Size: 7904 Received: from lgeamrelo11.lge.com (lgeamrelo12.lge.com [156.147.23.52]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 May 2020 05:23:31 +0000 (UTC) Received: from unknown (HELO lgeamrelo02.lge.com) (156.147.1.126) by 156.147.23.52 with ESMTP; 6 May 2020 14:23:27 +0900 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Received: from unknown (HELO localhost) (10.177.220.187) by 156.147.1.126 with ESMTP; 6 May 2020 14:23:27 +0900 X-Original-SENDERIP: 10.177.220.187 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Wed, 6 May 2020 14:23:27 +0900 From: Joonsoo Kim To: "Eric W. Biederman" Cc: Andrew Morton , Linux Memory Management List , LKML , Vlastimil Babka , Laura Abbott , "Aneesh Kumar K . V" , Mel Gorman , Michal Hocko , Johannes Weiner , Roman Gushchin , Minchan Kim , Rik van Riel , Christian Koenig , Huang Rui , "Rafael J . Wysocki" , Pavel Machek , kernel-team@lge.com, Christoph Hellwig , Kexec Mailing List Subject: Re: [PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case Message-ID: <20200506052327.GA25974@js1304-desktop> References: <1588130803-20527-1-git-send-email-iamjoonsoo.kim@lge.com> <1588130803-20527-4-git-send-email-iamjoonsoo.kim@lge.com> <87h7wzvjko.fsf@x220.int.ebiederm.org> <87ftcfpzjn.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87ftcfpzjn.fsf@x220.int.ebiederm.org> User-Agent: Mutt/1.9.4 (2018-02-28) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 04, 2020 at 09:03:56AM -0500, Eric W. Biederman wrote: >=20 > I have added in the kexec mailling list. >=20 > Looking at the patch we are discussing it appears that the kexec code > could be doing much better in highmem situations today but is not. Sound great! >=20 >=20 > Joonsoo Kim writes: >=20 > > 2020=EB=85=84 5=EC=9B=94 1=EC=9D=BC (=EA=B8=88) =EC=98=A4=ED=9B=84 11= :06, Eric W. Biederman =EB=8B=98=EC=9D=B4 =EC=9E=91= =EC=84=B1: > >> > >> js1304@gmail.com writes: > >> > >> > From: Joonsoo Kim > >> > > >> > Until now, PageHighMem() is used for two different cases. One is t= o check > >> > if there is a direct mapping for this page or not. The other is to= check > >> > the zone of this page, that is, weather it is the highmem type zon= e or not. > >> > > >> > Now, we have separate functions, PageHighMem() and PageHighMemZone= () for > >> > each cases. Use appropriate one. > >> > > >> > Note that there are some rules to determine the proper macro. > >> > > >> > 1. If PageHighMem() is called for checking if the direct mapping e= xists > >> > or not, use PageHighMem(). > >> > 2. If PageHighMem() is used to predict the previous gfp_flags for > >> > this page, use PageHighMemZone(). The zone of the page is related = to > >> > the gfp_flags. > >> > 3. If purpose of calling PageHighMem() is to count highmem page an= d > >> > to interact with the system by using this count, use PageHighMemZo= ne(). > >> > This counter is usually used to calculate the available memory for= an > >> > kernel allocation and pages on the highmem zone cannot be availabl= e > >> > for an kernel allocation. > >> > 4. Otherwise, use PageHighMemZone(). It's safe since it's implemen= tation > >> > is just copy of the previous PageHighMem() implementation and won'= t > >> > be changed. > >> > > >> > I apply the rule #2 for this patch. > >> > >> Hmm. > >> > >> What happened to the notion of deprecating and reducing the usage of > >> highmem? I know that we have some embedded architectures where it i= s > >> still important but this feels like it flies in the face of that. > > > > AFAIK, deprecating highmem requires some more time and, before then, > > we need to support it. >=20 > But it at least makes sense to look at what we are doing with highmem > and ask if it makes sense. >=20 > >> This part of kexec would be much more maintainable if it had a prope= r > >> mm layer helper that tested to see if the page matched the passed in > >> gfp flags. That way the mm layer could keep changing and doing weir= d > >> gyrations and this code would not care. > > > > Good idea! I will do it. > > > >> > >> What would be really helpful is if there was a straight forward way = to > >> allocate memory whose physical address fits in the native word size. > >> > >> > >> All I know for certain about this patch is that it takes a piece of = code > >> that looked like it made sense, and transfroms it into something I c= an > >> not easily verify, and can not maintain. > > > > Although I decide to make a helper as you described above, I don't > > understand why you think that a new code isn't maintainable. It is ju= st > > the same thing with different name. Could you elaborate more why do > > you think so? >=20 > Because the current code is already wrong. It does not handle > the general case of what it claims to handle. When the only distinctio= n > that needs to be drawn is highmem or not highmem that is likely fine. > But now you are making it possible to draw more distinctions. At which > point I have no idea which distinction needs to be drawn. >=20 >=20 > The code and the logic is about 20 years old. When it was written I > don't recally taking numa seriously and the kernel only had 3 zones > as I recall (DMA aka the now deprecated GFP_DMA, NORMAL, and HIGH). >=20 > The code attempts to work around limitations of those old zones amd pla= y > nice in a highmem world by allocating memory HIGH memory and not using > it if the memory was above 4G ( on 32bit ). >=20 > Looking the kernel now has GFP_DMA32 so on 32bit with highmem we should > probably be using that, when allocating memory. >=20 >From quick investigation, unfortunately, ZONE_DMA32 isn't available on x86 32bit now so using GFP_DMA32 to allocate memory below 4G would not work. Enabling ZONE_DMA32 on x86 32bit would be not simple, so, IMHO, it would be better to leave the code as it is. >=20 >=20 > Further in dealing with this memory management situation we only > have two situations we call kimage_alloc_page. >=20 > For an indirect page which must have a valid page_address(page). > We could probably relax that if we cared to. >=20 > For a general kexec page to store the next kernel in until we switch. > The general pages can be in high memory. >=20 > In a highmem world all of those pages should be below 32bit. >=20 >=20 >=20 > Given that we fundamentally have two situations my sense is that we > should just refactor the code so that we never have to deal with: >=20 >=20 > /* The old page I have found cannot be a > * destination page, so return it if it's > * gfp_flags honor the ones passed in. > */ > if (!(gfp_mask & __GFP_HIGHMEM) && > PageHighMem(old_page)) { > kimage_free_pages(old_page); > continue; > } >=20 > Either we teach kimage_add_entry how to work with high memory pages > (still 32bit accessible) or we teach kimage_alloc_page to notice it is > an indirect page allocation and to always skip trying to reuse the page > it found in that case. >=20 > That way the code does not need to know about forever changing mm inter= nals. Nice! I already have seen your patch and found that above two lines related to HIGHMEM are removed. Thanks for your help. Thanks.