From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 579B1C54FCB for ; Thu, 23 Apr 2020 16:32:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F243720728 for ; Thu, 23 Apr 2020 16:32:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F243720728 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 52CCA8E0005; Thu, 23 Apr 2020 12:32:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DD168E0003; Thu, 23 Apr 2020 12:32:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CB6A8E0005; Thu, 23 Apr 2020 12:32:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 269848E0003 for ; Thu, 23 Apr 2020 12:32:34 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C65E6180AD81F for ; Thu, 23 Apr 2020 16:32:33 +0000 (UTC) X-FDA: 76739662986.14.desk10_508351eb0f547 X-HE-Tag: desk10_508351eb0f547 X-Filterd-Recvd-Size: 6718 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 23 Apr 2020 16:32:33 +0000 (UTC) Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jRem6-0006IM-Tk; Thu, 23 Apr 2020 10:32:30 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jRem6-00024E-1n; Thu, 23 Apr 2020 10:32:30 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Hildenbrand Cc: James Morse , kexec@lists.infradead.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , Andrew Morton , Will Deacon References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> <87d088h4k8.fsf@x220.int.ebiederm.org> <87y2qn1r18.fsf@x220.int.ebiederm.org> Date: Thu, 23 Apr 2020 11:29:21 -0500 In-Reply-To: (David Hildenbrand's message of "Wed, 22 Apr 2020 18:40:22 +0200") Message-ID: <87ftcuxj1a.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1jRem6-00024E-1n;;;mid=<87ftcuxj1a.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18RRR3Gu40n+RTgF2QTKTSdLGaMeQHjuvg= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Hildenbrand writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECF7EC54FD0 for ; Thu, 23 Apr 2020 16:32:56 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD2D220728 for ; Thu, 23 Apr 2020 16:32:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="NZi8ILzU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD2D220728 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Subject:MIME-Version:Message-ID: In-Reply-To:Date:References:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ohZKuMidh+jZGsCJEBdnP2mB+317dAWh147cYwCWA6Q=; b=NZi8ILzURuej5M rxELmPeB2Yz29533n4m0tDCXKn/ZMOk2kXR2ICautvQ6yiWQ8j/B6h8AqGX67JOFrO2L1RVmgsYpa SNRP0sUenH0yj+oslt8xehwjfrMaxEMn4TOBezHseFVQMj32a43ugKd+pKt+VXcmyxEU7CCwWr/wb D+Xn2hHKLS2ujkCy7ZhHcLn/FAguVnIUXDvoOQstUqjhtEeKcmrOPp2dWdYSzFu076EdHpK4k+QkX FQy6I/RPdFPe20DuIP7RonSxGUf6iypbm91JZ+ue4P6Y33sMOq7wKoIWLSl6b0E5EjITttEv9X/ze jY9XlewSb6ePYywpKjhg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jRemV-00053o-3V; Thu, 23 Apr 2020 16:32:55 +0000 Received: from out01.mta.xmission.com ([166.70.13.231]) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jRemS-0004xc-Ev; Thu, 23 Apr 2020 16:32:53 +0000 Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jRem6-0006IM-Tk; Thu, 23 Apr 2020 10:32:30 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jRem6-00024E-1n; Thu, 23 Apr 2020 10:32:30 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Hildenbrand References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> <87d088h4k8.fsf@x220.int.ebiederm.org> <87y2qn1r18.fsf@x220.int.ebiederm.org> Date: Thu, 23 Apr 2020 11:29:21 -0500 In-Reply-To: (David Hildenbrand's message of "Wed, 22 Apr 2020 18:40:22 +0200") Message-ID: <87ftcuxj1a.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-XM-SPF: eid=1jRem6-00024E-1n; ; ; mid=<87ftcuxj1a.fsf@x220.int.ebiederm.org>; ; ; hst=in02.mta.xmission.com; ; ; ip=68.227.160.95; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-XM-AID: U2FsdGVkX18RRR3Gu40n+RTgF2QTKTSdLGaMeQHjuvg= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200423_093252_499110_C8B56978 X-CRM114-Status: GOOD ( 17.97 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, James Morse , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org David Hildenbrand writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: From: ebiederm@xmission.com (Eric W. Biederman) References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> <87d088h4k8.fsf@x220.int.ebiederm.org> <87y2qn1r18.fsf@x220.int.ebiederm.org> Date: Thu, 23 Apr 2020 11:29:21 -0500 In-Reply-To: (David Hildenbrand's message of "Wed, 22 Apr 2020 18:40:22 +0200") Message-ID: <87ftcuxj1a.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: David Hildenbrand Cc: Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, James Morse , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org David Hildenbrand writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec