From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E01DC54FC9 for ; Tue, 21 Apr 2020 14:02:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE35A206F4 for ; Tue, 21 Apr 2020 14:02:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE35A206F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6E3F68E0006; Tue, 21 Apr 2020 10:02:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B7A98E0003; Tue, 21 Apr 2020 10:02:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CE1E8E0006; Tue, 21 Apr 2020 10:02:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id 46F1A8E0003 for ; Tue, 21 Apr 2020 10:02:40 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 085246103 for ; Tue, 21 Apr 2020 14:02:40 +0000 (UTC) X-FDA: 76732027680.21.cap80_6e64ebca14549 X-HE-Tag: cap80_6e64ebca14549 X-Filterd-Recvd-Size: 8349 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Apr 2020 14:02:38 +0000 (UTC) Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jQtTv-0005xC-GY; Tue, 21 Apr 2020 08:02:35 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jQtTt-0001fg-Qr; Tue, 21 Apr 2020 08:02:35 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Hildenbrand Cc: Baoquan He , Andrew Morton , Russell King - ARM Linux admin , Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, James Morse , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, piliu@redhat.com References: <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> <20200415023524.GG4247@MiWiFi-R3L-srv> <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> Date: Tue, 21 Apr 2020 08:59:27 -0500 In-Reply-To: <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> (David Hildenbrand's message of "Tue, 21 Apr 2020 15:29:37 +0200") Message-ID: <87a735548w.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1jQtTt-0001fg-Qr;;;mid=<87a735548w.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+XPCnvTWzC4ci/zsgEXcGATbogYgI8il0= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Hildenbrand writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31845C54FC9 for ; Tue, 21 Apr 2020 14:55:25 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DA422206A2 for ; Tue, 21 Apr 2020 14:55:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA422206A2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 49667G4DjPzDqL5 for ; Wed, 22 Apr 2020 00:55:22 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=xmission.com (client-ip=166.70.13.232; helo=out02.mta.xmission.com; envelope-from=ebiederm@xmission.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=xmission.com Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4964yZ18HWzDqCG for ; Wed, 22 Apr 2020 00:02:43 +1000 (AEST) Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jQtTv-0005xC-GY; Tue, 21 Apr 2020 08:02:35 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jQtTt-0001fg-Qr; Tue, 21 Apr 2020 08:02:35 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Hildenbrand References: <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> <20200415023524.GG4247@MiWiFi-R3L-srv> <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> Date: Tue, 21 Apr 2020 08:59:27 -0500 In-Reply-To: <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> (David Hildenbrand's message of "Tue, 21 Apr 2020 15:29:37 +0200") Message-ID: <87a735548w.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1jQtTt-0001fg-Qr; ; ; mid=<87a735548w.fsf@x220.int.ebiederm.org>; ; ; hst=in02.mta.xmission.com; ; ; ip=68.227.160.95; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-XM-AID: U2FsdGVkX1+XPCnvTWzC4ci/zsgEXcGATbogYgI8il0= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: piliu@redhat.com, Baoquan He , Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" David Hildenbrand writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F1A6C54FCC for ; Tue, 21 Apr 2020 14:02:59 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1E57620724 for ; Tue, 21 Apr 2020 14:02:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="F7KnR4V7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E57620724 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Subject:MIME-Version:Message-ID: In-Reply-To:Date:References:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Edf6Ju4lvn2jPAKGZiL1k/vFQAOioXwjqt71HdYB7mg=; b=F7KnR4V7zhnow7 S/gR5SS3isELmsPyF3RmJgENLVcy/Xy3qkU1vEJdYEPQbAx+5rZUvZAhUJC7PHNfaotCJmoB8M/w3 JZXq4egDiB3g1CBfx/wk3LA+rhoLSc7F7ListiabdhAFbjDKB4+tp6GwVnx8HTLczlacxskYTJ+0Z ZXD0jRqF2kpondMyfFeB0nHHkqe6Z3opwRnog6LObehOa3w/jWeo7RLMD0V5otG0CP3A4ksYrUlDq P7FtRvNVOSmHWuPIrUAdpXNXTmSjXj0+qoKMxGTJN0X85vrKt34nbLtExPqCk7814duRJ/aUgsgTI xUblqnKyi3LhYg23Mqkw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jQtUI-0005s5-KO; Tue, 21 Apr 2020 14:02:58 +0000 Received: from out02.mta.xmission.com ([166.70.13.232]) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jQtUD-0005ij-Ss; Tue, 21 Apr 2020 14:02:56 +0000 Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jQtTv-0005xC-GY; Tue, 21 Apr 2020 08:02:35 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jQtTt-0001fg-Qr; Tue, 21 Apr 2020 08:02:35 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Hildenbrand References: <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> <20200415023524.GG4247@MiWiFi-R3L-srv> <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> Date: Tue, 21 Apr 2020 08:59:27 -0500 In-Reply-To: <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> (David Hildenbrand's message of "Tue, 21 Apr 2020 15:29:37 +0200") Message-ID: <87a735548w.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-XM-SPF: eid=1jQtTt-0001fg-Qr; ; ; mid=<87a735548w.fsf@x220.int.ebiederm.org>; ; ; hst=in02.mta.xmission.com; ; ; ip=68.227.160.95; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-XM-AID: U2FsdGVkX1+XPCnvTWzC4ci/zsgEXcGATbogYgI8il0= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200421_070253_927149_F921A5DA X-CRM114-Status: GOOD ( 22.19 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: piliu@redhat.com, Baoquan He , Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org David Hildenbrand writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: From: ebiederm@xmission.com (Eric W. Biederman) References: <20200414064031.GB4247@MiWiFi-R3L-srv> <86e96214-7053-340b-5c1a-ff97fb94d8e0@redhat.com> <20200414092201.GD4247@MiWiFi-R3L-srv> <20200414143912.GE4247@MiWiFi-R3L-srv> <0085f460-b0c7-b25f-36a7-fa3bafaab6fe@redhat.com> <20200415023524.GG4247@MiWiFi-R3L-srv> <18cf6afd-c651-25c7-aca3-3ca3c0e07547@redhat.com> <20200416140247.GA12723@MiWiFi-R3L-srv> <4e1546eb-4416-dc6d-d549-62d1cecccbc8@redhat.com> <20200416143634.GH4247@MiWiFi-R3L-srv> <2525cc9c-3566-6275-105b-7f4af8f980bc@redhat.com> <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> Date: Tue, 21 Apr 2020 08:59:27 -0500 In-Reply-To: <9a4eb1d7-33bf-8707-9c0c-1ca657c3e502@redhat.com> (David Hildenbrand's message of "Tue, 21 Apr 2020 15:29:37 +0200") Message-ID: <87a735548w.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: David Hildenbrand Cc: piliu@redhat.com, Baoquan He , Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , linuxppc-dev@lists.ozlabs.org, kexec@lists.infradead.org, Russell King - ARM Linux admin , linux-mm@kvack.org, James Morse , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org David Hildenbrand writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec