From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20BAFC43331 for ; Mon, 30 Mar 2020 13:01:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CBD9120733 for ; Mon, 30 Mar 2020 13:01:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBD9120733 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5BB068E0001; Mon, 30 Mar 2020 09:01:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 56AD36B0037; Mon, 30 Mar 2020 09:01:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A8748E0001; Mon, 30 Mar 2020 09:01:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 314C56B0032 for ; Mon, 30 Mar 2020 09:01:05 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 04A96180AD811 for ; Mon, 30 Mar 2020 13:01:05 +0000 (UTC) X-FDA: 76652038890.11.beef24_26793825a7956 X-HE-Tag: beef24_26793825a7956 X-Filterd-Recvd-Size: 6454 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Mar 2020 13:01:04 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 660BC30E; Mon, 30 Mar 2020 06:01:03 -0700 (PDT) Received: from [172.16.1.108] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A4CA33F71E; Mon, 30 Mar 2020 06:01:01 -0700 (PDT) Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image To: David Hildenbrand Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Eric Biederman , Andrew Morton , Catalin Marinas , Will Deacon , Anshuman Khandual , Bhupesh Sharma References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> <321e6bf7-e898-7701-dd60-6c25237ff9cd@redhat.com> <9cb4ea0d-34c3-de42-4b3f-ee25a59c4835@redhat.com> <72672e2c-a57a-8df9-0cff-8035cbce7740@redhat.com> From: James Morse Openpgp: preference=signencrypt Message-ID: <34274b02-60ba-eb78-eacd-6dc1146ed3cd@arm.com> Date: Mon, 30 Mar 2020 14:00:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <72672e2c-a57a-8df9-0cff-8035cbce7740@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi David, On 3/27/20 6:52 PM, David Hildenbrand wrote: >>> 2. You do the kexec. The kexec kernel will only operate on a reserved >>> memory region (reserved via e.g., kernel cmdline crashkernel=128M). >> >> I think you are merging the kexec and kdump behaviours. >> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') > > Oh, I see - I think your example below clarifies things. Something like > that should go in the cover letter if we end up in this patch being > required :) Do you mean the commit message? I think its far too long... Adding a sentence about the way kexec load works may help, the first paragraph would read: | Kexec allows user-space to specify the address that the kexec image should be | loaded to. Because this memory may be in use, an image loaded for kexec is not | stored in place, instead its segments are scattered through memory, and are | re-assembled when needed. In the meantime, the target memory may have been | removed. Do you think thats clearer? > (I missed that the problematic part is "random" addresses passed by user > space to the kernel, where it wants data to be loaded to on kexec -e) [...] >> Load kexec: >> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >> > > I assume this will trigger > > kexec_load -> do_kexec_load -> kimage_load_segment -> > kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages > > Which will just allocate a bunch of pages and mark them reserved. > > Now, AFAIKs, all allocations will be unmovable. So none of the kexec > segment allocations will actually end up on your DIMM (as it is onlined > online_movable). > > So, the loaded image (with its segments) from user won't be problematic > and not get placed on your DIMM. > > > Now, the problematic part is (via man kexec_load) "mem and memsz specify > a physical address range that is the target of the copy." > > So the place where the image will be "assembled" at when doing the > reboot. Understood :) Yup. [...] > I wonder if we should instead make the "kexec -e" fail. It tries to > touch random system memory. Heh, isn't touching random system memory what kexec does?! Its all described to user-space as 'System RAM'. Teaching it to probe /sys/devices/memory/... would require a user-space change. > Denying to offline MOVABLE memory should be avoided - and what kexec > does here sounds dangerous to me (allowing it to write random system > memory). > Roughly what I am thinking is this: > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index ba1d91e868ca..70c39a5307e5 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1135,6 +1135,10 @@ int kernel_kexec(void) > error = -EINVAL; > goto Unlock; > } > + if (!kexec_image_validate()) { > + error = -EINVAL; > + goto Unlock; > + } > > #ifdef CONFIG_KEXEC_JUMP > if (kexec_image->preserve_context) { > > > kexec_image_validate() would go over all segments and validate that the > involved pages are actual valid memory (pfn_to_online_page()). > > All we have to do is protect from memory hotplug until we switch to the > new kernel. (migrate_to_reboot_cpu() can sleep), I think you'd end up with something like this patch, but only while kexec_in_progress. I don't think letting kexec fail if the events occur in a different order is good for user-space. > Will probably need some thought. But it will actually also bail out when > user space passes wrong physical memory addresses, instead of > triple-faulting silently. With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This thing doesn't usually return, so we're likely to trigger error-handling that has never run before. (Last time I debugged one of these, it turned out kexec had taken the network interfaces down, meaning the nfsroot was no longer accessible) How can user-space know whether kexec is going to succeed, or fail like this? Any loaded kexec kernel could secretly be in this broken state. Can user-space know what caused this to become unreliable? (without reading the kernel source) Given kexec can be unloaded by user-space, I think its better to prevent us getting into the broken state, preferably giving the hint that kexec us using that memory. The user can 'kexec -u', then retry removing the memory. I think forbidding the memory-offline is simpler for user-space to deal with. Thanks, James From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51093C43331 for ; Mon, 30 Mar 2020 13:01:10 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 22D3120733 for ; Mon, 30 Mar 2020 13:01:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ou2zxMUt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22D3120733 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=vET0UAxQxzHkrh58nj/UkaEZq0I4KsYvnrCU/Y+tFnE=; b=ou2zxMUtuGfa09 0y36oEIBM5FtU8AuOib8gk0udIqOH7CcxD4lUMCNOVt4jDnZ24YJ9YfLs/uYi+Vl9jFmb+5er2F+F XMcjdyaKJUbwRy82EMKmL+f8990dGAnDWvuNYpC7AlFZbKde6YwWYeERh+ryrmsew9SQ2ohpW+1SL q1+PymiJUiZHA/IZxWRjzg4xbveYxZRLHFk2K2MwmdztqRFbGAviQ2qw4uDRFt+pp+khUGdruVAjf VUvL8ov868BiUesXqsPs1Eg1Z9fLU16TPrbDJ5EKMbxts/6XEghjuj1Fr8tzbDRqwW+JZE4f1SM0T Pwy/KdiD2cqRS0+XK4+A==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jIu2O-0007EB-Bx; Mon, 30 Mar 2020 13:01:08 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jIu2K-0007Cz-V3; Mon, 30 Mar 2020 13:01:06 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 660BC30E; Mon, 30 Mar 2020 06:01:03 -0700 (PDT) Received: from [172.16.1.108] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A4CA33F71E; Mon, 30 Mar 2020 06:01:01 -0700 (PDT) Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image To: David Hildenbrand References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> <321e6bf7-e898-7701-dd60-6c25237ff9cd@redhat.com> <9cb4ea0d-34c3-de42-4b3f-ee25a59c4835@redhat.com> <72672e2c-a57a-8df9-0cff-8035cbce7740@redhat.com> From: James Morse Openpgp: preference=signencrypt Message-ID: <34274b02-60ba-eb78-eacd-6dc1146ed3cd@arm.com> Date: Mon, 30 Mar 2020 14:00:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <72672e2c-a57a-8df9-0cff-8035cbce7740@redhat.com> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200330_060105_087664_4EADC632 X-CRM114-Status: GOOD ( 29.16 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Anshuman Khandual , Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, Eric Biederman , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi David, On 3/27/20 6:52 PM, David Hildenbrand wrote: >>> 2. You do the kexec. The kexec kernel will only operate on a reserved >>> memory region (reserved via e.g., kernel cmdline crashkernel=128M). >> >> I think you are merging the kexec and kdump behaviours. >> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') > > Oh, I see - I think your example below clarifies things. Something like > that should go in the cover letter if we end up in this patch being > required :) Do you mean the commit message? I think its far too long... Adding a sentence about the way kexec load works may help, the first paragraph would read: | Kexec allows user-space to specify the address that the kexec image should be | loaded to. Because this memory may be in use, an image loaded for kexec is not | stored in place, instead its segments are scattered through memory, and are | re-assembled when needed. In the meantime, the target memory may have been | removed. Do you think thats clearer? > (I missed that the problematic part is "random" addresses passed by user > space to the kernel, where it wants data to be loaded to on kexec -e) [...] >> Load kexec: >> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >> > > I assume this will trigger > > kexec_load -> do_kexec_load -> kimage_load_segment -> > kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages > > Which will just allocate a bunch of pages and mark them reserved. > > Now, AFAIKs, all allocations will be unmovable. So none of the kexec > segment allocations will actually end up on your DIMM (as it is onlined > online_movable). > > So, the loaded image (with its segments) from user won't be problematic > and not get placed on your DIMM. > > > Now, the problematic part is (via man kexec_load) "mem and memsz specify > a physical address range that is the target of the copy." > > So the place where the image will be "assembled" at when doing the > reboot. Understood :) Yup. [...] > I wonder if we should instead make the "kexec -e" fail. It tries to > touch random system memory. Heh, isn't touching random system memory what kexec does?! Its all described to user-space as 'System RAM'. Teaching it to probe /sys/devices/memory/... would require a user-space change. > Denying to offline MOVABLE memory should be avoided - and what kexec > does here sounds dangerous to me (allowing it to write random system > memory). > Roughly what I am thinking is this: > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index ba1d91e868ca..70c39a5307e5 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1135,6 +1135,10 @@ int kernel_kexec(void) > error = -EINVAL; > goto Unlock; > } > + if (!kexec_image_validate()) { > + error = -EINVAL; > + goto Unlock; > + } > > #ifdef CONFIG_KEXEC_JUMP > if (kexec_image->preserve_context) { > > > kexec_image_validate() would go over all segments and validate that the > involved pages are actual valid memory (pfn_to_online_page()). > > All we have to do is protect from memory hotplug until we switch to the > new kernel. (migrate_to_reboot_cpu() can sleep), I think you'd end up with something like this patch, but only while kexec_in_progress. I don't think letting kexec fail if the events occur in a different order is good for user-space. > Will probably need some thought. But it will actually also bail out when > user space passes wrong physical memory addresses, instead of > triple-faulting silently. With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This thing doesn't usually return, so we're likely to trigger error-handling that has never run before. (Last time I debugged one of these, it turned out kexec had taken the network interfaces down, meaning the nfsroot was no longer accessible) How can user-space know whether kexec is going to succeed, or fail like this? Any loaded kexec kernel could secretly be in this broken state. Can user-space know what caused this to become unreliable? (without reading the kernel source) Given kexec can be unloaded by user-space, I think its better to prevent us getting into the broken state, preferably giving the hint that kexec us using that memory. The user can 'kexec -u', then retry removing the memory. I think forbidding the memory-offline is simpler for user-space to deal with. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel