From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D453C43331 for ; Fri, 27 Mar 2020 15:47:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D709120748 for ; Fri, 27 Mar 2020 15:47:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D709120748 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5FC526B0008; Fri, 27 Mar 2020 11:47:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ACAC6B0010; Fri, 27 Mar 2020 11:47:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C20A6B0032; Fri, 27 Mar 2020 11:47:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 330426B0008 for ; Fri, 27 Mar 2020 11:47:04 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 062AC180AD815 for ; Fri, 27 Mar 2020 15:47:04 +0000 (UTC) X-FDA: 76641570768.14.swim31_3702ccfae504 X-HE-Tag: swim31_3702ccfae504 X-Filterd-Recvd-Size: 6087 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 15:47:03 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E86051FB; Fri, 27 Mar 2020 08:47:02 -0700 (PDT) Received: from [172.16.1.108] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6411E3F71F; Fri, 27 Mar 2020 08:47:01 -0700 (PDT) Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image To: Anshuman Khandual Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Eric Biederman , Andrew Morton , Catalin Marinas , Will Deacon , Bhupesh Sharma References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> From: James Morse Openpgp: preference=signencrypt Message-ID: Date: Fri, 27 Mar 2020 15:46:56 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Anshuman, On 3/27/20 12:43 AM, Anshuman Khandual wrote: > On 03/26/2020 11:37 PM, James Morse wrote: >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. > Why the isolation process does not fail when these pages are currently > being used by kexec ? Kexec isn't using them right now, but it will once kexec is triggered. Those physical addresses are held in some internal kexec data structures until kexec is triggered. >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. > > Yes this is definitely an added protection for these kexec loaded kernels > memory areas from being offlined but I would have expected the preceding > offlining to have failed as well. kexec hasn't allocate the memory, part of the regions user-space may specify for the next kernel may be in use. There is nothing to stop the memory being used in the meantime. Any other way of doing this would prevent us saying why it failed. Like this, the user can spot the 'kexec: Memory region in use message', and unload kexec. >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index c19c0dad1ebe..ba1d91e868ca 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) >> >> void __weak arch_kexec_unprotect_crashkres(void) >> {} >> + >> +/* >> + * If user-space wants to offline memory that is in use by a loaded kexec >> + * image, it should unload the image first. >> + */ > Probably this would need kexec user manual and related system call man pages > update as well. I can't see anything relevant under Documentation. (kdump yes, kexec no...) >> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, >> + void *data) >> +{ >> + int rv = NOTIFY_OK, i; >> + struct memory_notify *arg = data; >> + unsigned long pfn = arg->start_pfn; >> + unsigned long nr_segments, sstart, send; >> + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; >> + >> + might_sleep(); > > Required ? Habit, and I think best practice. We take a mutex, so might_sleep(), but we also conditionally return before lockdep would see the mutex. Having this annotation means a dangerous change to the way this is called triggers a warning without having to test memory hotplug explicitly. >> + >> + if (action != MEM_GOING_OFFLINE) >> + return NOTIFY_DONE; >> + >> + mutex_lock(&kexec_mutex); >> + if (kexec_image) { >> + nr_segments = kexec_image->nr_segments; >> + >> + for (i = 0; i < nr_segments; i++) { >> + sstart = PFN_DOWN(kexec_image->segment[i].mem); >> + send = PFN_UP(kexec_image->segment[i].mem + >> + kexec_image->segment[i].memsz); >> + >> + if ((pfn <= sstart && sstart < end_pfn) || >> + (pfn <= send && send < end_pfn)) { >> + pr_warn("Memory region in use\n"); >> + rv = NOTIFY_BAD; >> + break; >> + } >> + } >> + } >> + mutex_unlock(&kexec_mutex); >> + >> + return rv; > > Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. You'd prefer a mutex_unlock() in the middle of the loop? ... or goto? (I'm not convinced) >> +} >> + >> +static struct notifier_block mem_remove_nb = { >> + .notifier_call = mem_remove_cb, >> +}; >> + >> +static int __init register_mem_remove_cb(void) >> +{ >> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > > Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE > to reduce the scope as well as final code size when the config is disabled. The compiler is really good at this. "if (false)" means this is all dead code, and static means its not exported, so the compiler is free to remove it. Not #ifdef-ing it makes it much more readable, and means the compiler checks its valid C before it removes it. This avoids weird header include problems that only show up on some rand-config builds. Thanks, James >> + return register_memory_notifier(&mem_remove_nb); >> + >> + return 0; >> +} >> +device_initcall(register_mem_remove_cb); >> > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDCEFC43331 for ; Fri, 27 Mar 2020 15:47:09 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 881DA2078B for ; Fri, 27 Mar 2020 15:47:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="LKb9h5ye" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 881DA2078B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=YgXSlc1J+7VrB855i/vtu0jywvQ+D1li3Z03PsTb+K8=; b=LKb9h5ye8mBrsR xbQEBWKlew427/+MLqBMkoExh8+ze924n7hcrwhCC76AObEBXE8o0KHbqX8e2jztKSLMGtGSMDeu6 aqHOvx8Z+vJYy4AUABZKxR1AFoJsHCQlinsYW0/ya67gUp/x4+AmFRg3KFmDfJ4UHw9Uap2C/beN+ kvzB7LAL2nAw3FoHJszmLRUazCkTakjhLICgAdLrk+NPTE6lvxAmvAVfBlc4Sbd9tnJ0pYZ711mu/ jTdBT0IB4UX66A1HyPva7lEYQCV8/UZvAumzxfHXcA9PmCnGE+MFCskYTMD/Ro/uFYdFkKZYbaPVe 5yA3V8vUTnBS7ywCizYw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jHrCO-00023M-23; Fri, 27 Mar 2020 15:47:08 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jHrCJ-000227-PJ; Fri, 27 Mar 2020 15:47:05 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E86051FB; Fri, 27 Mar 2020 08:47:02 -0700 (PDT) Received: from [172.16.1.108] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6411E3F71F; Fri, 27 Mar 2020 08:47:01 -0700 (PDT) Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image To: Anshuman Khandual References: <20200326180730.4754-1-james.morse@arm.com> <20200326180730.4754-2-james.morse@arm.com> From: James Morse Openpgp: preference=signencrypt Message-ID: Date: Fri, 27 Mar 2020 15:46:56 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200327_084703_911169_30B53218 X-CRM114-Status: GOOD ( 25.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Catalin Marinas , Bhupesh Sharma , kexec@lists.infradead.org, linux-mm@kvack.org, Eric Biederman , Andrew Morton , Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Anshuman, On 3/27/20 12:43 AM, Anshuman Khandual wrote: > On 03/26/2020 11:37 PM, James Morse wrote: >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. > Why the isolation process does not fail when these pages are currently > being used by kexec ? Kexec isn't using them right now, but it will once kexec is triggered. Those physical addresses are held in some internal kexec data structures until kexec is triggered. >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. > > Yes this is definitely an added protection for these kexec loaded kernels > memory areas from being offlined but I would have expected the preceding > offlining to have failed as well. kexec hasn't allocate the memory, part of the regions user-space may specify for the next kernel may be in use. There is nothing to stop the memory being used in the meantime. Any other way of doing this would prevent us saying why it failed. Like this, the user can spot the 'kexec: Memory region in use message', and unload kexec. >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index c19c0dad1ebe..ba1d91e868ca 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) >> >> void __weak arch_kexec_unprotect_crashkres(void) >> {} >> + >> +/* >> + * If user-space wants to offline memory that is in use by a loaded kexec >> + * image, it should unload the image first. >> + */ > Probably this would need kexec user manual and related system call man pages > update as well. I can't see anything relevant under Documentation. (kdump yes, kexec no...) >> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, >> + void *data) >> +{ >> + int rv = NOTIFY_OK, i; >> + struct memory_notify *arg = data; >> + unsigned long pfn = arg->start_pfn; >> + unsigned long nr_segments, sstart, send; >> + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; >> + >> + might_sleep(); > > Required ? Habit, and I think best practice. We take a mutex, so might_sleep(), but we also conditionally return before lockdep would see the mutex. Having this annotation means a dangerous change to the way this is called triggers a warning without having to test memory hotplug explicitly. >> + >> + if (action != MEM_GOING_OFFLINE) >> + return NOTIFY_DONE; >> + >> + mutex_lock(&kexec_mutex); >> + if (kexec_image) { >> + nr_segments = kexec_image->nr_segments; >> + >> + for (i = 0; i < nr_segments; i++) { >> + sstart = PFN_DOWN(kexec_image->segment[i].mem); >> + send = PFN_UP(kexec_image->segment[i].mem + >> + kexec_image->segment[i].memsz); >> + >> + if ((pfn <= sstart && sstart < end_pfn) || >> + (pfn <= send && send < end_pfn)) { >> + pr_warn("Memory region in use\n"); >> + rv = NOTIFY_BAD; >> + break; >> + } >> + } >> + } >> + mutex_unlock(&kexec_mutex); >> + >> + return rv; > > Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. You'd prefer a mutex_unlock() in the middle of the loop? ... or goto? (I'm not convinced) >> +} >> + >> +static struct notifier_block mem_remove_nb = { >> + .notifier_call = mem_remove_cb, >> +}; >> + >> +static int __init register_mem_remove_cb(void) >> +{ >> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > > Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE > to reduce the scope as well as final code size when the config is disabled. The compiler is really good at this. "if (false)" means this is all dead code, and static means its not exported, so the compiler is free to remove it. Not #ifdef-ing it makes it much more readable, and means the compiler checks its valid C before it removes it. This avoids weird header include problems that only show up on some rand-config builds. Thanks, James >> + return register_memory_notifier(&mem_remove_nb); >> + >> + return 0; >> +} >> +device_initcall(register_mem_remove_cb); >> > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel