From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE6EC5518A for ; Wed, 22 Apr 2020 10:36:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 042422075A for ; Wed, 22 Apr 2020 10:36:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YeWrJz15" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 042422075A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 911718E000A; Wed, 22 Apr 2020 06:36:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C3C88E0003; Wed, 22 Apr 2020 06:36:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D7E68E000A; Wed, 22 Apr 2020 06:36:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id 6791E8E0003 for ; Wed, 22 Apr 2020 06:36:37 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 34460824934B for ; Wed, 22 Apr 2020 10:36:37 +0000 (UTC) X-FDA: 76735137234.23.chalk44_73f2aa7729816 X-HE-Tag: chalk44_73f2aa7729816 X-Filterd-Recvd-Size: 8046 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Apr 2020 10:36:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1587551796; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vi1WDtY0BZGKxdQdcdOvRzo61jzFKRgMVI1AV+nvsO0=; b=YeWrJz15nkVNl20by9sHQarMUWchENQ63QpjFB4Jd3FJxLVqzhWwcWTTtICe9QE+bQ4Kyi A2UdqHFXmuEqQNJHuyfbRV+NX5Z5gjuqJIw3gMGj9cdHIF95o3RKg5NBycPntqifQjAaEc dsZUGmpwBr0uEn7JQ5YPSpW/tXjsZcY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-374-vKdB4GUsPkaQRYP4GmMljw-1; Wed, 22 Apr 2020 06:36:32 -0400 X-MC-Unique: vKdB4GUsPkaQRYP4GmMljw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EA3BF800FF0; Wed, 22 Apr 2020 10:36:30 +0000 (UTC) Received: from localhost (ovpn-12-47.pek2.redhat.com [10.72.12.47]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3D69B5D710; Wed, 22 Apr 2020 10:36:27 +0000 (UTC) Date: Wed, 22 Apr 2020 18:36:23 +0800 From: Baoquan He To: David Hildenbrand Cc: Andrew Morton , "Eric W. Biederman" , Russell King - ARM Linux admin , Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, James Morse , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, piliu@redhat.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200422103623.GV4247@MiWiFi-R3L-srv> References: <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> <20200422091718.GT4247@MiWiFi-R3L-srv> <20200422095733.GU4247@MiWiFi-R3L-srv> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If = we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged wit= h > >>>>>> ACPI only, this won't happen on bare metal though. Need check care= fully.=20 > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACP= I. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem = is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec ker= nel > >>>>> will not detect it automatically (good!), instead load the virtio-m= em > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some s= pare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped= . > >>>> The kexec kernel only uses memory in the crash region. The virtio-me= m > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not se= em > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-rig= ht > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, t= he > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pas= s > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updat= ed > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kerne= l > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when ho= t > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because o= f > >>> the searching strategy bottom up. Just adding them into e820_table_ke= xec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > >=20 > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feat= ure > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to mak= e > > it, but this will corrupt guest on HyperV. So he had to revert the >=20 > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). >=20 > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. >=20 > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. >=20 > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defec= t > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > >=20 > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. >=20 > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC578C55186 for ; Wed, 22 Apr 2020 10:39:45 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 20AD820656 for ; Wed, 22 Apr 2020 10:39:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XWnN3fmR"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XWnN3fmR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20AD820656 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 496cPp2kBJzDqlp for ; Wed, 22 Apr 2020 20:39:42 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=205.139.110.61; helo=us-smtp-delivery-1.mimecast.com; envelope-from=bhe@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XWnN3fmR; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XWnN3fmR; dkim-atps=neutral Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 496cLH5DtmzDqfG for ; Wed, 22 Apr 2020 20:36:37 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1587551794; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vi1WDtY0BZGKxdQdcdOvRzo61jzFKRgMVI1AV+nvsO0=; b=XWnN3fmR12FNcgUD64JGwJqzw7w5QnzzXLwx2KGH0oUdg5o82WQtf7R2KBinv74yzyPmFy X3XmHLqgqDNgG51oH/qfFoZIHKpH27Q6n2+lvnT9RCaUKQsvCzABI9de0kTd43/NTNAcK0 XHM8urAgBMa3BgakBA/6eGfvwENKi1U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1587551794; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vi1WDtY0BZGKxdQdcdOvRzo61jzFKRgMVI1AV+nvsO0=; b=XWnN3fmR12FNcgUD64JGwJqzw7w5QnzzXLwx2KGH0oUdg5o82WQtf7R2KBinv74yzyPmFy X3XmHLqgqDNgG51oH/qfFoZIHKpH27Q6n2+lvnT9RCaUKQsvCzABI9de0kTd43/NTNAcK0 XHM8urAgBMa3BgakBA/6eGfvwENKi1U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-374-vKdB4GUsPkaQRYP4GmMljw-1; Wed, 22 Apr 2020 06:36:32 -0400 X-MC-Unique: vKdB4GUsPkaQRYP4GmMljw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EA3BF800FF0; Wed, 22 Apr 2020 10:36:30 +0000 (UTC) Received: from localhost (ovpn-12-47.pek2.redhat.com [10.72.12.47]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3D69B5D710; Wed, 22 Apr 2020 10:36:27 +0000 (UTC) Date: Wed, 22 Apr 2020 18:36:23 +0800 From: Baoquan He To: David Hildenbrand Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200422103623.GV4247@MiWiFi-R3L-srv> References: <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> <20200422091718.GT4247@MiWiFi-R3L-srv> <20200422095733.GU4247@MiWiFi-R3L-srv> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: piliu@redhat.com, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , "Eric W. Biederman" , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If = we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged wit= h > >>>>>> ACPI only, this won't happen on bare metal though. Need check care= fully.=20 > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACP= I. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem = is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec ker= nel > >>>>> will not detect it automatically (good!), instead load the virtio-m= em > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some s= pare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped= . > >>>> The kexec kernel only uses memory in the crash region. The virtio-me= m > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not se= em > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-rig= ht > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, t= he > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pas= s > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updat= ed > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kerne= l > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when ho= t > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because o= f > >>> the searching strategy bottom up. Just adding them into e820_table_ke= xec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > >=20 > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feat= ure > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to mak= e > > it, but this will corrupt guest on HyperV. So he had to revert the >=20 > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). >=20 > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. >=20 > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. >=20 > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defec= t > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > >=20 > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. >=20 > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FC9CC55186 for ; Wed, 22 Apr 2020 10:36:41 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DB8252075A for ; Wed, 22 Apr 2020 10:36:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="fde5WP9o"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XWnN3fmR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB8252075A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MMb5d7TM26tnC7yJLpJju7TfZgRalkpHGqsP7yiynSs=; b=fde5WP9oE+N/T8 Lz9LVdfEZeLLuf5x0+8bIe24MrGfl4ZwqHtA+vobwoQxP2Y0sOxLU+wstOg6J+AOcoYJb9fnVUaqA Vv/YQFRuj1tF0T8Z8UQ7GSEPDVbAEr4DyuTqR2J2R9pi+dxQM/Pl8IiRHxkuM4lf4Mx8U1B3+ZFbk JxVcFFJ3tdUiynSt1ckKQYARDtizdD59MsT3CwBIzwakDztlTVEsochXHY9QPdeHzf8y1rBcE2t3o H0yiNRJ+m9s+Q+P0XqsO2bMmJ28FeNGl09C6olpQQfarXdvt5Ypnu7KdhGCp7PP1zSFbxoW0vZpu+ j+Cvt/IoMJWtXjWy5cwA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jRCkB-00010r-Ec; Wed, 22 Apr 2020 10:36:39 +0000 Received: from us-smtp-2.mimecast.com ([205.139.110.61] helo=us-smtp-delivery-1.mimecast.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jRCk7-0000yf-74 for linux-arm-kernel@lists.infradead.org; Wed, 22 Apr 2020 10:36:36 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1587551794; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vi1WDtY0BZGKxdQdcdOvRzo61jzFKRgMVI1AV+nvsO0=; b=XWnN3fmR12FNcgUD64JGwJqzw7w5QnzzXLwx2KGH0oUdg5o82WQtf7R2KBinv74yzyPmFy X3XmHLqgqDNgG51oH/qfFoZIHKpH27Q6n2+lvnT9RCaUKQsvCzABI9de0kTd43/NTNAcK0 XHM8urAgBMa3BgakBA/6eGfvwENKi1U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-374-vKdB4GUsPkaQRYP4GmMljw-1; Wed, 22 Apr 2020 06:36:32 -0400 X-MC-Unique: vKdB4GUsPkaQRYP4GmMljw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EA3BF800FF0; Wed, 22 Apr 2020 10:36:30 +0000 (UTC) Received: from localhost (ovpn-12-47.pek2.redhat.com [10.72.12.47]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3D69B5D710; Wed, 22 Apr 2020 10:36:27 +0000 (UTC) Date: Wed, 22 Apr 2020 18:36:23 +0800 From: Baoquan He To: David Hildenbrand Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200422103623.GV4247@MiWiFi-R3L-srv> References: <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> <20200422091718.GT4247@MiWiFi-R3L-srv> <20200422095733.GU4247@MiWiFi-R3L-srv> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200422_033635_334677_E90D0123 X-CRM114-Status: GOOD ( 28.89 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: piliu@redhat.com, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , "Eric W. Biederman" , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from us-smtp-1.mimecast.com ([205.139.110.61] helo=us-smtp-delivery-1.mimecast.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jRCk7-0000ye-75 for kexec@lists.infradead.org; Wed, 22 Apr 2020 10:36:36 +0000 Date: Wed, 22 Apr 2020 18:36:23 +0800 From: Baoquan He Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Message-ID: <20200422103623.GV4247@MiWiFi-R3L-srv> References: <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> <20200422091718.GT4247@MiWiFi-R3L-srv> <20200422095733.GU4247@MiWiFi-R3L-srv> MIME-Version: 1.0 In-Reply-To: Content-Disposition: inline List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: David Hildenbrand Cc: piliu@redhat.com, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , "Eric W. Biederman" , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec