From: David Hildenbrand <david@redhat.com> To: James Morse <james.morse@arm.com> Cc: kexec@lists.infradead.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Eric Biederman <ebiederm@xmission.com>, Andrew Morton <akpm@linux-foundation.org>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Anshuman Khandual <anshuman.khandual@arm.com>, Bhupesh Sharma <bhsharma@redhat.com> Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Date: Mon, 30 Mar 2020 15:13:28 +0200 [thread overview] Message-ID: <80e4d1d7-f493-3f66-f700-86f18002d692@redhat.com> (raw) In-Reply-To: <34274b02-60ba-eb78-eacd-6dc1146ed3cd@arm.com> > Adding a sentence about the way kexec load works may help, the first paragraph > would read: > > | Kexec allows user-space to specify the address that the kexec image should be > | loaded to. Because this memory may be in use, an image loaded for kexec is not > | stored in place, instead its segments are scattered through memory, and are > | re-assembled when needed. In the meantime, the target memory may have been > | removed. > > Do you think thats clearer? Yes, very much. Maybe add, that the target is described by user space during kexec_load() and that user space - right now - parses /proc/iomem to find applicable system memory. > [...] > >>> Load kexec: >>> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >>> >> >> I assume this will trigger >> >> kexec_load -> do_kexec_load -> kimage_load_segment -> >> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages >> >> Which will just allocate a bunch of pages and mark them reserved. >> >> Now, AFAIKs, all allocations will be unmovable. So none of the kexec >> segment allocations will actually end up on your DIMM (as it is onlined >> online_movable). >> >> So, the loaded image (with its segments) from user won't be problematic >> and not get placed on your DIMM. >> >> >> Now, the problematic part is (via man kexec_load) "mem and memsz specify >> a physical address range that is the target of the copy." >> >> So the place where the image will be "assembled" at when doing the >> reboot. Understood :) > > Yup. > > [...] > >> I wonder if we should instead make the "kexec -e" fail. It tries to >> touch random system memory. > > Heh, isn't touching random system memory what kexec does?! Having a racy user interface that can trigger kernel crashes feels very wrong. We should limit the impact. > > Its all described to user-space as 'System RAM'. Teaching it to probe > /sys/devices/memory/... would require a user-space change. I think we should really rename hotplugged memory on all architectures. Especially also relevant for virtio-mem/hyper-v balloon, where some pieces of (hotplugged )memory blocks are partially unavailable and should not be touched - accessing them results in unpredictable behavior (e.g., crashes or discarded writes). [...] >> Will probably need some thought. But it will actually also bail out when >> user space passes wrong physical memory addresses, instead of >> triple-faulting silently. > > With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This > thing doesn't usually return, so we're likely to trigger error-handling that has > never run before. > > (Last time I debugged one of these, it turned out kexec had taken the network > interfaces down, meaning the nfsroot was no longer accessible) > > How can user-space know whether kexec is going to succeed, or fail like this? > Any loaded kexec kernel could secretly be in this broken state. > > Can user-space know what caused this to become unreliable? (without reading the > kernel source) > > > Given kexec can be unloaded by user-space, I think its better to prevent us > getting into the broken state, preferably giving the hint that kexec us using > that memory. The user can 'kexec -u', then retry removing the memory. > > I think forbidding the memory-offline is simpler for user-space to deal with. I thought about this over the weekend, and I don't think it's the right approach. 1. It's racy. If memory is getting offlined/unplugged just while user space is about to trigger the kexec_load(), you end up with the very same triple-fault. 2. It's semantically wrong. kexec does not need online memory ("managed by the buddy"), but still you disallow offlining memory. I would really much rather want to see user-space choosing boot memory (e.g., renaming hotplugged memory on all architectures), and checking during "kexec -e" if the selected memory is actually "there", before trying to write to it. -- Thanks, David / dhildenb
WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com> To: James Morse <james.morse@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, Bhupesh Sharma <bhsharma@redhat.com>, kexec@lists.infradead.org, linux-mm@kvack.org, Eric Biederman <ebiederm@xmission.com>, Andrew Morton <akpm@linux-foundation.org>, Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image Date: Mon, 30 Mar 2020 15:13:28 +0200 [thread overview] Message-ID: <80e4d1d7-f493-3f66-f700-86f18002d692@redhat.com> (raw) In-Reply-To: <34274b02-60ba-eb78-eacd-6dc1146ed3cd@arm.com> > Adding a sentence about the way kexec load works may help, the first paragraph > would read: > > | Kexec allows user-space to specify the address that the kexec image should be > | loaded to. Because this memory may be in use, an image loaded for kexec is not > | stored in place, instead its segments are scattered through memory, and are > | re-assembled when needed. In the meantime, the target memory may have been > | removed. > > Do you think thats clearer? Yes, very much. Maybe add, that the target is described by user space during kexec_load() and that user space - right now - parses /proc/iomem to find applicable system memory. > [...] > >>> Load kexec: >>> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >>> >> >> I assume this will trigger >> >> kexec_load -> do_kexec_load -> kimage_load_segment -> >> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages >> >> Which will just allocate a bunch of pages and mark them reserved. >> >> Now, AFAIKs, all allocations will be unmovable. So none of the kexec >> segment allocations will actually end up on your DIMM (as it is onlined >> online_movable). >> >> So, the loaded image (with its segments) from user won't be problematic >> and not get placed on your DIMM. >> >> >> Now, the problematic part is (via man kexec_load) "mem and memsz specify >> a physical address range that is the target of the copy." >> >> So the place where the image will be "assembled" at when doing the >> reboot. Understood :) > > Yup. > > [...] > >> I wonder if we should instead make the "kexec -e" fail. It tries to >> touch random system memory. > > Heh, isn't touching random system memory what kexec does?! Having a racy user interface that can trigger kernel crashes feels very wrong. We should limit the impact. > > Its all described to user-space as 'System RAM'. Teaching it to probe > /sys/devices/memory/... would require a user-space change. I think we should really rename hotplugged memory on all architectures. Especially also relevant for virtio-mem/hyper-v balloon, where some pieces of (hotplugged )memory blocks are partially unavailable and should not be touched - accessing them results in unpredictable behavior (e.g., crashes or discarded writes). [...] >> Will probably need some thought. But it will actually also bail out when >> user space passes wrong physical memory addresses, instead of >> triple-faulting silently. > > With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This > thing doesn't usually return, so we're likely to trigger error-handling that has > never run before. > > (Last time I debugged one of these, it turned out kexec had taken the network > interfaces down, meaning the nfsroot was no longer accessible) > > How can user-space know whether kexec is going to succeed, or fail like this? > Any loaded kexec kernel could secretly be in this broken state. > > Can user-space know what caused this to become unreliable? (without reading the > kernel source) > > > Given kexec can be unloaded by user-space, I think its better to prevent us > getting into the broken state, preferably giving the hint that kexec us using > that memory. The user can 'kexec -u', then retry removing the memory. > > I think forbidding the memory-offline is simpler for user-space to deal with. I thought about this over the weekend, and I don't think it's the right approach. 1. It's racy. If memory is getting offlined/unplugged just while user space is about to trigger the kexec_load(), you end up with the very same triple-fault. 2. It's semantically wrong. kexec does not need online memory ("managed by the buddy"), but still you disallow offlining memory. I would really much rather want to see user-space choosing boot memory (e.g., renaming hotplugged memory on all architectures), and checking during "kexec -e" if the selected memory is actually "there", before trying to write to it. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-03-30 13:13 UTC|newest] Thread overview: 264+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse 2020-03-26 18:07 ` James Morse 2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse 2020-03-26 18:07 ` James Morse 2020-03-27 0:43 ` Anshuman Khandual 2020-03-27 0:43 ` Anshuman Khandual 2020-03-27 2:54 ` Baoquan He 2020-03-27 2:54 ` Baoquan He 2020-03-27 15:46 ` James Morse 2020-03-27 15:46 ` James Morse 2020-03-27 2:34 ` Baoquan He 2020-03-27 2:34 ` Baoquan He 2020-03-27 9:30 ` David Hildenbrand 2020-03-27 9:30 ` David Hildenbrand 2020-03-27 16:56 ` James Morse 2020-03-27 16:56 ` James Morse 2020-03-27 17:06 ` David Hildenbrand 2020-03-27 17:06 ` David Hildenbrand 2020-03-27 18:07 ` James Morse 2020-03-27 18:07 ` James Morse 2020-03-27 18:52 ` David Hildenbrand 2020-03-27 18:52 ` David Hildenbrand 2020-03-30 13:00 ` James Morse 2020-03-30 13:00 ` James Morse 2020-03-30 13:13 ` David Hildenbrand [this message] 2020-03-30 13:13 ` David Hildenbrand 2020-03-30 17:17 ` James Morse 2020-03-30 17:17 ` James Morse 2020-03-30 18:14 ` David Hildenbrand 2020-03-30 18:14 ` David Hildenbrand 2020-04-10 19:10 ` Andrew Morton 2020-04-10 19:10 ` Andrew Morton 2020-04-10 19:10 ` Andrew Morton 2020-04-11 3:44 ` Baoquan He 2020-04-11 3:44 ` Baoquan He 2020-04-11 3:44 ` Baoquan He 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:58 ` David Hildenbrand 2020-04-11 9:58 ` David Hildenbrand 2020-04-11 9:58 ` David Hildenbrand 2020-04-12 5:35 ` Baoquan He 2020-04-12 5:35 ` Baoquan He 2020-04-12 5:35 ` Baoquan He 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-13 2:37 ` Baoquan He 2020-04-13 2:37 ` Baoquan He 2020-04-13 2:37 ` Baoquan He 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 23:01 ` Andrew Morton 2020-04-13 23:01 ` Andrew Morton 2020-04-13 23:01 ` Andrew Morton 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:49 ` David Hildenbrand 2020-04-14 14:49 ` David Hildenbrand 2020-04-14 14:49 ` David Hildenbrand 2020-04-14 14:49 ` David Hildenbrand 2020-04-15 2:35 ` Baoquan He 2020-04-15 2:35 ` Baoquan He 2020-04-15 2:35 ` Baoquan He 2020-04-15 2:35 ` Baoquan He 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:47 ` David Hildenbrand 2020-04-16 14:47 ` David Hildenbrand 2020-04-16 14:47 ` David Hildenbrand 2020-04-16 14:47 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 14:30 ` David Hildenbrand 2020-04-21 14:30 ` David Hildenbrand 2020-04-21 14:30 ` David Hildenbrand 2020-04-21 14:30 ` David Hildenbrand 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:57 ` Baoquan He 2020-04-22 9:57 ` Baoquan He 2020-04-22 9:57 ` Baoquan He 2020-04-22 9:57 ` Baoquan He 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:36 ` Baoquan He 2020-04-22 10:36 ` Baoquan He 2020-04-22 10:36 ` Baoquan He 2020-04-22 10:36 ` Baoquan He 2020-04-14 9:16 ` Dave Young 2020-04-14 9:16 ` Dave Young 2020-04-14 9:16 ` Dave Young 2020-04-14 9:38 ` Dave Young 2020-04-14 9:38 ` Dave Young 2020-04-14 9:38 ` Dave Young 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 16:55 ` James Morse 2020-04-14 16:55 ` James Morse 2020-04-14 16:55 ` James Morse 2020-04-14 17:41 ` David Hildenbrand 2020-04-14 17:41 ` David Hildenbrand 2020-04-14 17:41 ` David Hildenbrand 2020-04-15 20:33 ` Eric W. Biederman 2020-04-15 20:33 ` Eric W. Biederman 2020-04-15 20:33 ` Eric W. Biederman 2020-04-22 12:28 ` James Morse 2020-04-22 12:28 ` James Morse 2020-04-22 12:28 ` James Morse 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 16:40 ` David Hildenbrand 2020-04-22 16:40 ` David Hildenbrand 2020-04-22 16:40 ` David Hildenbrand 2020-04-23 16:29 ` Eric W. Biederman 2020-04-23 16:29 ` Eric W. Biederman 2020-04-23 16:29 ` Eric W. Biederman 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 2020-05-01 16:55 ` James Morse 2020-05-01 16:55 ` James Morse 2020-05-01 16:55 ` James Morse 2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse 2020-03-26 18:07 ` James Morse 2020-03-27 9:59 ` David Hildenbrand 2020-03-27 9:59 ` David Hildenbrand 2020-03-27 15:39 ` James Morse 2020-03-27 15:39 ` James Morse 2020-03-30 13:23 ` David Hildenbrand 2020-03-30 13:23 ` David Hildenbrand 2020-03-30 17:17 ` James Morse 2020-03-30 17:17 ` James Morse 2020-04-02 5:49 ` Dave Young 2020-04-02 5:49 ` Dave Young 2020-04-02 5:49 ` Dave Young 2020-04-02 6:12 ` piliu 2020-04-02 6:12 ` piliu 2020-04-02 6:12 ` piliu 2020-04-14 17:21 ` James Morse 2020-04-14 17:21 ` James Morse 2020-04-14 17:21 ` James Morse 2020-04-15 20:36 ` Eric W. Biederman 2020-04-15 20:36 ` Eric W. Biederman 2020-04-15 20:36 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-05-09 0:45 ` Andrew Morton 2020-05-09 0:45 ` Andrew Morton 2020-05-09 0:45 ` Andrew Morton 2020-05-11 8:35 ` David Hildenbrand 2020-05-11 8:35 ` David Hildenbrand 2020-05-11 8:35 ` David Hildenbrand 2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse 2020-03-26 18:07 ` James Morse 2020-03-30 19:01 ` David Hildenbrand 2020-03-30 19:01 ` David Hildenbrand 2020-04-15 20:37 ` Eric W. Biederman 2020-04-15 20:37 ` Eric W. Biederman 2020-04-15 20:37 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-03-27 2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He 2020-03-27 2:11 ` Baoquan He 2020-03-27 15:40 ` James Morse 2020-03-27 15:40 ` James Morse 2020-03-27 9:27 ` David Hildenbrand 2020-03-27 9:27 ` David Hildenbrand 2020-03-27 15:42 ` James Morse 2020-03-27 15:42 ` James Morse 2020-03-30 13:18 ` David Hildenbrand 2020-03-30 13:18 ` David Hildenbrand 2020-03-30 13:55 ` Baoquan He 2020-03-30 13:55 ` Baoquan He 2020-03-30 17:17 ` James Morse 2020-03-30 17:17 ` James Morse 2020-03-31 3:46 ` Dave Young 2020-03-31 3:46 ` Dave Young 2020-04-14 17:31 ` James Morse 2020-04-14 17:31 ` James Morse 2020-04-14 17:31 ` James Morse 2020-03-31 3:38 ` Dave Young 2020-03-31 3:38 ` Dave Young 2020-04-15 20:29 ` Eric W. Biederman 2020-04-15 20:29 ` Eric W. Biederman 2020-04-15 20:29 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 15:40 ` James Morse 2020-04-22 15:40 ` James Morse 2020-04-22 15:40 ` James Morse
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=80e4d1d7-f493-3f66-f700-86f18002d692@redhat.com \ --to=david@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=anshuman.khandual@arm.com \ --cc=bhsharma@redhat.com \ --cc=catalin.marinas@arm.com \ --cc=ebiederm@xmission.com \ --cc=james.morse@arm.com \ --cc=kexec@lists.infradead.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-mm@kvack.org \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.