From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D5D6C2BB1D for ; Wed, 15 Apr 2020 02:35:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 373342076C for ; Wed, 15 Apr 2020 02:35:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GEOc5S8I" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 373342076C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CD76C8E0005; Tue, 14 Apr 2020 22:35:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C87EA8E0001; Tue, 14 Apr 2020 22:35:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9DAB8E0005; Tue, 14 Apr 2020 22:35:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id A334D8E0001 for ; Tue, 14 Apr 2020 22:35:38 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6404C181AEF1D for ; Wed, 15 Apr 2020 02:35:38 +0000 (UTC) X-FDA: 76708523556.04.pets55_1416e9a7e1950 X-HE-Tag: pets55_1416e9a7e1950 X-Filterd-Recvd-Size: 7323 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 15 Apr 2020 02:35:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586918137; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3o4+0HgEBD0h2Go+GpB2trM6HPRj5m5yAn/gvrKMuxQ=; b=GEOc5S8Ib2mYegA3YQsNeCv0uhD4PFXunbfCTX/curKREx5Colh98+Rq2vsb3Fjxir1MGU 6VJdoIYAq6jO8rtg00KLqsWc7MgdUFvg/G7KyU7yiLW05p2354B7mzHzXpHynFiFuz1XR6 ghiALZblgG8o1YmVkjJ4EGv/skgcTz0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-321-WOza4ysLPQyg0KcN4VE8gg-1; Tue, 14 Apr 2020 22:35:33 -0400 X-MC-Unique: WOza4ysLPQyg0KcN4VE8gg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 890218017F3; Wed, 15 Apr 2020 02:35:30 +0000 (UTC) Received: from localhost (ovpn-12-27.pek2.redhat.com [10.72.12.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7B9A899DEE; Wed, 15 Apr 2020 02:35:26 +0000 (UTC) Date: Wed, 15 Apr 2020 10:35:24 +0800 From: Baoquan He To: David Hildenbrand Cc: "Eric W. Biederman" , Russell King - ARM Linux admin , Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, James Morse , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, piliu@redhat.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200415023524.GG4247@MiWiFi-R3L-srv> References: <20200412080836.GM25745@shell.armlinux.org.uk> <87wo6klbw0.fsf@x220.int.ebiederm.org> <20200413023701.GA20265@MiWiFi-R3L-srv> <871rorjzmc.fsf@x220.int.ebiederm.org> <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> MIME-Version: 1.0 In-Reply-To: <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memor= y > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_lo= ad(), > >>>>> just checking or picking those unmovable region to put kernel/initr= d in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblo= ck's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NO= RMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don= 't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel da= ta > >>> is avoided to be put in this area. Or ppc just freely uses it for ker= nel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by= _count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (=3D=3D single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (=3D=3Dsection =3D=3D LMB), offlines and removes them. N= o need for > >> the movable zone and all the involved issues. > >=20 > > Yes, this is a different one, thanks for pointing it out. It sounds lik= e > > balloon driver in virt platform, doesn't it? >=20 > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. >=20 > >=20 > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > >=20 > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory()= . > >> IIRC, we had BUGs related to that, so I think yes. If a section contai= ns > >> no unmovable allocations (after boot), it can get removed. > >=20 > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it.=20 > >=20 > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. >=20 > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean=20 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to >=20 > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). >=20 > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9999AC2BB1D for ; Wed, 15 Apr 2020 02:37:28 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F210F20774 for ; Wed, 15 Apr 2020 02:37:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="APFa9xqE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F210F20774 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 49262Y4WvkzDqyX for ; Wed, 15 Apr 2020 12:37:25 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=207.211.31.81; helo=us-smtp-delivery-1.mimecast.com; envelope-from=bhe@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=APFa9xqE; dkim-atps=neutral Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 49260Y0qWkzDqxQ for ; Wed, 15 Apr 2020 12:35:39 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586918135; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3o4+0HgEBD0h2Go+GpB2trM6HPRj5m5yAn/gvrKMuxQ=; b=APFa9xqEU7VJnr5z77+7garh9xZJicWbh8t09e0TJZFnTGntQzcGi4Th+/DAUx5vGA4RkQ NQbqq9RHjfyYw70vVPhL18FLZayMSnEyEFAFNYBU6hUHP7HKqcFNwd9AApluUdfl5rGICQ 6dPv0Btuetb95sqrC+JZ3xtdNH9Sv1A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-321-WOza4ysLPQyg0KcN4VE8gg-1; Tue, 14 Apr 2020 22:35:33 -0400 X-MC-Unique: WOza4ysLPQyg0KcN4VE8gg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 890218017F3; Wed, 15 Apr 2020 02:35:30 +0000 (UTC) Received: from localhost (ovpn-12-27.pek2.redhat.com [10.72.12.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7B9A899DEE; Wed, 15 Apr 2020 02:35:26 +0000 (UTC) Date: Wed, 15 Apr 2020 10:35:24 +0800 From: Baoquan He To: David Hildenbrand Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200415023524.GG4247@MiWiFi-R3L-srv> References: <20200412080836.GM25745@shell.armlinux.org.uk> <87wo6klbw0.fsf@x220.int.ebiederm.org> <20200413023701.GA20265@MiWiFi-R3L-srv> <871rorjzmc.fsf@x220.int.ebiederm.org> <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> MIME-Version: 1.0 In-Reply-To: <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: piliu@redhat.com, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , "Eric W. Biederman" , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memor= y > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_lo= ad(), > >>>>> just checking or picking those unmovable region to put kernel/initr= d in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblo= ck's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NO= RMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don= 't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel da= ta > >>> is avoided to be put in this area. Or ppc just freely uses it for ker= nel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by= _count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (=3D=3D single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (=3D=3Dsection =3D=3D LMB), offlines and removes them. N= o need for > >> the movable zone and all the involved issues. > >=20 > > Yes, this is a different one, thanks for pointing it out. It sounds lik= e > > balloon driver in virt platform, doesn't it? >=20 > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. >=20 > >=20 > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > >=20 > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory()= . > >> IIRC, we had BUGs related to that, so I think yes. If a section contai= ns > >> no unmovable allocations (after boot), it can get removed. > >=20 > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it.=20 > >=20 > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. >=20 > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean=20 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to >=20 > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). >=20 > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B5A5C2BB1D for ; Wed, 15 Apr 2020 02:35:43 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 541712076C for ; Wed, 15 Apr 2020 02:35:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="IPEUhFpq"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="APFa9xqE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 541712076C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=uDCTHgVAhGGhp3nYtJnU3rsGPPkU1MpXif8UHDI6Aoc=; b=IPEUhFpqfzyEGG IElK51Y4Z9S12QRnBmA/tkGa36epFVBx9hDPgClEZF4m0urjUrCxPG4mwZduPVRpa0fMVN2gKsTEl RG7AKjJsjp4SDFr+UaL89+e31nzgPn4NSYqkRata4fLs+p7UVuEu41Ej5KyOi1ZLuN4M0gYrY9g+O d1bm7TzjyAQuVJ5JudFos3W3S1QA9i8NmG89L7NjOa/f5Muz7fcMc71R+gyvUTBqeq+JfwRLG0iDA CsXNYoj1HLV60f93E0mUAz33eLLP3+q+bBvuUQducZobf6GL7AfpgtdyxYLR+SCPZaxG845c2NHDU HNYr+dIKhvwKXPTMGoHw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jOXts-0004Xl-Bi; Wed, 15 Apr 2020 02:35:40 +0000 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120] helo=us-smtp-1.mimecast.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jOXto-0004Wv-UO for linux-arm-kernel@lists.infradead.org; Wed, 15 Apr 2020 02:35:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586918135; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3o4+0HgEBD0h2Go+GpB2trM6HPRj5m5yAn/gvrKMuxQ=; b=APFa9xqEU7VJnr5z77+7garh9xZJicWbh8t09e0TJZFnTGntQzcGi4Th+/DAUx5vGA4RkQ NQbqq9RHjfyYw70vVPhL18FLZayMSnEyEFAFNYBU6hUHP7HKqcFNwd9AApluUdfl5rGICQ 6dPv0Btuetb95sqrC+JZ3xtdNH9Sv1A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-321-WOza4ysLPQyg0KcN4VE8gg-1; Tue, 14 Apr 2020 22:35:33 -0400 X-MC-Unique: WOza4ysLPQyg0KcN4VE8gg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 890218017F3; Wed, 15 Apr 2020 02:35:30 +0000 (UTC) Received: from localhost (ovpn-12-27.pek2.redhat.com [10.72.12.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7B9A899DEE; Wed, 15 Apr 2020 02:35:26 +0000 (UTC) Date: Wed, 15 Apr 2020 10:35:24 +0800 From: Baoquan He To: David Hildenbrand Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200415023524.GG4247@MiWiFi-R3L-srv> References: <20200412080836.GM25745@shell.armlinux.org.uk> <87wo6klbw0.fsf@x220.int.ebiederm.org> <20200413023701.GA20265@MiWiFi-R3L-srv> <871rorjzmc.fsf@x220.int.ebiederm.org> <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> MIME-Version: 1.0 In-Reply-To: <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200414_193537_059027_DED6AFD6 X-CRM114-Status: GOOD ( 27.24 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: piliu@redhat.com, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , "Eric W. Biederman" , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from us-smtp-2.mimecast.com ([207.211.31.81] helo=us-smtp-delivery-1.mimecast.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jOXtq-0004XF-VL for kexec@lists.infradead.org; Wed, 15 Apr 2020 02:35:40 +0000 Date: Wed, 15 Apr 2020 10:35:24 +0800 From: Baoquan He Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200415023524.GG4247@MiWiFi-R3L-srv> References: <20200412080836.GM25745@shell.armlinux.org.uk> <87wo6klbw0.fsf@x220.int.ebiederm.org> <20200413023701.GA20265@MiWiFi-R3L-srv> <871rorjzmc.fsf@x220.int.ebiederm.org> <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> MIME-Version: 1.0 In-Reply-To: <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> Content-Disposition: inline List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: David Hildenbrand Cc: piliu@redhat.com, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , "Eric W. Biederman" , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec