* [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support @ 2016-11-09 3:01 Dave Young 2016-11-09 3:17 ` Dave Young ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Dave Young @ 2016-11-09 3:01 UTC (permalink / raw) To: wency, qiaonuohan; +Cc: lersek, anderson, qemu-devel, bhe Hi, Latest linux kernel enabled kaslr to randomiz phys/virt memory addresses, we had some effort to support kexec/kdump so that crash utility can still works in case crashed kernel has kaslr enabled. But according to Dave Anderson virsh dump does not work, quoted messages from Dave below: """ with virsh dump, there's no way of even knowing that KASLR has randomized the kernel __START_KERNEL_map region, because there is no virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump vmcoreinfo data to compare against the vmlinux file symbol value. Unless virsh dump can export some basic virtual memory data, which they say it can't, I don't see how KASLR can ever be supported. """ I assume virsh dump is using qemu guest memory dump facility so it should be first addressed in qemu. Thus post this query to qemu devel list. If this is not correct please let me know. Could you qemu dump people make it work? Or we can not support virt dump as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. Thanks Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young @ 2016-11-09 3:17 ` Dave Young 2016-11-09 3:58 ` Wen Congyang 2016-11-09 10:40 ` Andrew Jones 2016-11-09 14:32 ` Dave Anderson 2 siblings, 1 reply; 30+ messages in thread From: Dave Young @ 2016-11-09 3:17 UTC (permalink / raw) To: wency; +Cc: lersek, anderson, qemu-devel, bhe Drop qiaonuohan, seems the mail address is wrong.. On 11/09/16 at 11:01am, Dave Young wrote: > Hi, > > Latest linux kernel enabled kaslr to randomiz phys/virt memory > addresses, we had some effort to support kexec/kdump so that crash > utility can still works in case crashed kernel has kaslr enabled. > > But according to Dave Anderson virsh dump does not work, quoted messages > from Dave below: > > """ > with virsh dump, there's no way of even knowing that KASLR > has randomized the kernel __START_KERNEL_map region, because there is no > virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > vmcoreinfo data to compare against the vmlinux file symbol value. > Unless virsh dump can export some basic virtual memory data, which > they say it can't, I don't see how KASLR can ever be supported. > """ > > I assume virsh dump is using qemu guest memory dump facility so it > should be first addressed in qemu. Thus post this query to qemu devel > list. If this is not correct please let me know. > > Could you qemu dump people make it work? Or we can not support virt dump > as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > Thanks > Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 3:17 ` Dave Young @ 2016-11-09 3:58 ` Wen Congyang 2016-11-09 5:02 ` Dave Young 0 siblings, 1 reply; 30+ messages in thread From: Wen Congyang @ 2016-11-09 3:58 UTC (permalink / raw) To: Dave Young; +Cc: lersek, anderson, qemu-devel, bhe On 11/09/2016 11:17 AM, Dave Young wrote: > Drop qiaonuohan, seems the mail address is wrong.. > > On 11/09/16 at 11:01am, Dave Young wrote: >> Hi, >> >> Latest linux kernel enabled kaslr to randomiz phys/virt memory >> addresses, we had some effort to support kexec/kdump so that crash >> utility can still works in case crashed kernel has kaslr enabled. >> >> But according to Dave Anderson virsh dump does not work, quoted messages >> from Dave below: >> >> """ >> with virsh dump, there's no way of even knowing that KASLR >> has randomized the kernel __START_KERNEL_map region, because there is no >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump >> vmcoreinfo data to compare against the vmlinux file symbol value. >> Unless virsh dump can export some basic virtual memory data, which >> they say it can't, I don't see how KASLR can ever be supported. >> """ >> >> I assume virsh dump is using qemu guest memory dump facility so it >> should be first addressed in qemu. Thus post this query to qemu devel >> list. If this is not correct please let me know. IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump' uses migration to dump. I think I should study kaslr first... Thanks Wen Congyang >> >> Could you qemu dump people make it work? Or we can not support virt dump >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. >> >> Thanks >> Dave > > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 3:58 ` Wen Congyang @ 2016-11-09 5:02 ` Dave Young 2016-11-09 7:42 ` Wen Congyang 2016-11-09 14:36 ` Dave Anderson 0 siblings, 2 replies; 30+ messages in thread From: Dave Young @ 2016-11-09 5:02 UTC (permalink / raw) To: Wen Congyang, anderson; +Cc: lersek, qemu-devel, bhe On 11/09/16 at 11:58am, Wen Congyang wrote: > On 11/09/2016 11:17 AM, Dave Young wrote: > > Drop qiaonuohan, seems the mail address is wrong.. > > > > On 11/09/16 at 11:01am, Dave Young wrote: > >> Hi, > >> > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > >> addresses, we had some effort to support kexec/kdump so that crash > >> utility can still works in case crashed kernel has kaslr enabled. > >> > >> But according to Dave Anderson virsh dump does not work, quoted messages > >> from Dave below: > >> > >> """ > >> with virsh dump, there's no way of even knowing that KASLR > >> has randomized the kernel __START_KERNEL_map region, because there is no > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > >> vmcoreinfo data to compare against the vmlinux file symbol value. > >> Unless virsh dump can export some basic virtual memory data, which > >> they say it can't, I don't see how KASLR can ever be supported. > >> """ > >> > >> I assume virsh dump is using qemu guest memory dump facility so it > >> should be first addressed in qemu. Thus post this query to qemu devel > >> list. If this is not correct please let me know. > > IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump' > uses migration to dump. Do they need different fixes? Dave, I guess you mean --memory-only, but could you clarify and confirm it? > > I think I should study kaslr first... Thanks for taking care of it. > > Thanks > Wen Congyang > > >> > >> Could you qemu dump people make it work? Or we can not support virt dump > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > >> > >> Thanks > >> Dave > > > > > > > > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 5:02 ` Dave Young @ 2016-11-09 7:42 ` Wen Congyang 2016-11-09 8:25 ` Dave Young 2016-11-09 14:36 ` Dave Anderson 1 sibling, 1 reply; 30+ messages in thread From: Wen Congyang @ 2016-11-09 7:42 UTC (permalink / raw) To: Dave Young, anderson; +Cc: lersek, qemu-devel, bhe On 11/09/2016 01:02 PM, Dave Young wrote: > On 11/09/16 at 11:58am, Wen Congyang wrote: >> On 11/09/2016 11:17 AM, Dave Young wrote: >>> Drop qiaonuohan, seems the mail address is wrong.. >>> >>> On 11/09/16 at 11:01am, Dave Young wrote: >>>> Hi, >>>> >>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory >>>> addresses, we had some effort to support kexec/kdump so that crash >>>> utility can still works in case crashed kernel has kaslr enabled. >>>> >>>> But according to Dave Anderson virsh dump does not work, quoted messages >>>> from Dave below: >>>> >>>> """ >>>> with virsh dump, there's no way of even knowing that KASLR >>>> has randomized the kernel __START_KERNEL_map region, because there is no >>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump >>>> vmcoreinfo data to compare against the vmlinux file symbol value. >>>> Unless virsh dump can export some basic virtual memory data, which >>>> they say it can't, I don't see how KASLR can ever be supported. >>>> """ >>>> >>>> I assume virsh dump is using qemu guest memory dump facility so it >>>> should be first addressed in qemu. Thus post this query to qemu devel >>>> list. If this is not correct please let me know. >> >> IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump' >> uses migration to dump. > > Do they need different fixes? Dave, I guess you mean --memory-only, but > could you clarify and confirm it? > >> >> I think I should study kaslr first... > > Thanks for taking care of it. Can you give me the patch for kexec/kdump. I want to know what I need to do for dump-guest-memory. Thanks Wen Congyang > >> >> Thanks >> Wen Congyang >> >>>> >>>> Could you qemu dump people make it work? Or we can not support virt dump >>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. >>>> >>>> Thanks >>>> Dave >>> >>> >>> >> >> >> > > > . > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 7:42 ` Wen Congyang @ 2016-11-09 8:25 ` Dave Young 0 siblings, 0 replies; 30+ messages in thread From: Dave Young @ 2016-11-09 8:25 UTC (permalink / raw) To: Wen Congyang; +Cc: anderson, lersek, qemu-devel, bhe On 11/09/16 at 03:42pm, Wen Congyang wrote: > On 11/09/2016 01:02 PM, Dave Young wrote: > > On 11/09/16 at 11:58am, Wen Congyang wrote: > >> On 11/09/2016 11:17 AM, Dave Young wrote: > >>> Drop qiaonuohan, seems the mail address is wrong.. > >>> > >>> On 11/09/16 at 11:01am, Dave Young wrote: > >>>> Hi, > >>>> > >>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > >>>> addresses, we had some effort to support kexec/kdump so that crash > >>>> utility can still works in case crashed kernel has kaslr enabled. > >>>> > >>>> But according to Dave Anderson virsh dump does not work, quoted messages > >>>> from Dave below: > >>>> > >>>> """ > >>>> with virsh dump, there's no way of even knowing that KASLR > >>>> has randomized the kernel __START_KERNEL_map region, because there is no > >>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > >>>> vmcoreinfo data to compare against the vmlinux file symbol value. > >>>> Unless virsh dump can export some basic virtual memory data, which > >>>> they say it can't, I don't see how KASLR can ever be supported. > >>>> """ > >>>> > >>>> I assume virsh dump is using qemu guest memory dump facility so it > >>>> should be first addressed in qemu. Thus post this query to qemu devel > >>>> list. If this is not correct please let me know. > >> > >> IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump' > >> uses migration to dump. > > > > Do they need different fixes? Dave, I guess you mean --memory-only, but > > could you clarify and confirm it? > > > >> > >> I think I should study kaslr first... > > > > Thanks for taking care of it. > > Can you give me the patch for kexec/kdump. I want to know what I need to do > for dump-guest-memory. AFAIK, there are below patches for kexec/kdump userspace: kexec-tools, git commit: commit 9f62cbddddfc93d78d9aafbddf3e1208cb242f7b Author: Thomas Garnier <thgarnie@google.com> Date: Tue Sep 13 15:10:05 2016 +0800 kexec/arch/i386: Add support for KASLR memory randomization Originally Baoquan He posted below patches to export vmcoreinfo for some kernel fields: http://lists.infradead.org/pipermail/kexec/2016-September/017191.html But later it was dropped, we finally do it in userspace with several makedumpfile patches: http://lists.infradead.org/pipermail/kexec/2016-October/017540.html http://lists.infradead.org/pipermail/kexec/2016-October/017539.html http://lists.infradead.org/pipermail/kexec/2016-October/017541.html For virsh dumped vmcore it should manage to export some infomation so that crash utility can use. I would leave Dave to provide more information what he needs because the goal is userspace utility like crash can correctly analysis the vmcore. > > Thanks > Wen Congyang > > > > >> > >> Thanks > >> Wen Congyang > >> > >>>> > >>>> Could you qemu dump people make it work? Or we can not support virt dump > >>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > >>>> > >>>> Thanks > >>>> Dave > >>> > >>> > >>> > >> > >> > >> > > > > > > . > > > > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 5:02 ` Dave Young 2016-11-09 7:42 ` Wen Congyang @ 2016-11-09 14:36 ` Dave Anderson 2016-11-09 14:42 ` Daniel P. Berrange 1 sibling, 1 reply; 30+ messages in thread From: Dave Anderson @ 2016-11-09 14:36 UTC (permalink / raw) To: Dave Young; +Cc: Wen Congyang, lersek, qemu-devel, bhe ----- Original Message ----- > On 11/09/16 at 11:58am, Wen Congyang wrote: > > On 11/09/2016 11:17 AM, Dave Young wrote: > > > Drop qiaonuohan, seems the mail address is wrong.. > > > > > > On 11/09/16 at 11:01am, Dave Young wrote: > > >> Hi, > > >> > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > >> addresses, we had some effort to support kexec/kdump so that crash > > >> utility can still works in case crashed kernel has kaslr enabled. > > >> > > >> But according to Dave Anderson virsh dump does not work, quoted messages > > >> from Dave below: > > >> > > >> """ > > >> with virsh dump, there's no way of even knowing that KASLR > > >> has randomized the kernel __START_KERNEL_map region, because there is no > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > >> vmcoreinfo data to compare against the vmlinux file symbol value. > > >> Unless virsh dump can export some basic virtual memory data, which > > >> they say it can't, I don't see how KASLR can ever be supported. > > >> """ > > >> > > >> I assume virsh dump is using qemu guest memory dump facility so it > > >> should be first addressed in qemu. Thus post this query to qemu devel > > >> list. If this is not correct please let me know. > > > > IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump' > > uses migration to dump. > > Do they need different fixes? Dave, I guess you mean --memory-only, but > could you clarify and confirm it? As I understand it, the "--memory-only" option uses a new "dump-guest-memory" QEMU monitor command that creates an ELF kdump vmcore clone. Dave > > > > > I think I should study kaslr first... > > Thanks for taking care of it. > > > > > Thanks > > Wen Congyang > > > > >> > > >> Could you qemu dump people make it work? Or we can not support virt dump > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in > > >> x86_64. > > >> > > >> Thanks > > >> Dave > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 14:36 ` Dave Anderson @ 2016-11-09 14:42 ` Daniel P. Berrange 0 siblings, 0 replies; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-09 14:42 UTC (permalink / raw) To: Dave Anderson; +Cc: Dave Young, bhe, lersek, qemu-devel On Wed, Nov 09, 2016 at 09:36:08AM -0500, Dave Anderson wrote: > > > ----- Original Message ----- > > On 11/09/16 at 11:58am, Wen Congyang wrote: > > > On 11/09/2016 11:17 AM, Dave Young wrote: > > > > Drop qiaonuohan, seems the mail address is wrong.. > > > > > > > > On 11/09/16 at 11:01am, Dave Young wrote: > > > >> Hi, > > > >> > > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > >> addresses, we had some effort to support kexec/kdump so that crash > > > >> utility can still works in case crashed kernel has kaslr enabled. > > > >> > > > >> But according to Dave Anderson virsh dump does not work, quoted messages > > > >> from Dave below: > > > >> > > > >> """ > > > >> with virsh dump, there's no way of even knowing that KASLR > > > >> has randomized the kernel __START_KERNEL_map region, because there is no > > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > > >> vmcoreinfo data to compare against the vmlinux file symbol value. > > > >> Unless virsh dump can export some basic virtual memory data, which > > > >> they say it can't, I don't see how KASLR can ever be supported. > > > >> """ > > > >> > > > >> I assume virsh dump is using qemu guest memory dump facility so it > > > >> should be first addressed in qemu. Thus post this query to qemu devel > > > >> list. If this is not correct please let me know. > > > > > > IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump' > > > uses migration to dump. > > > > Do they need different fixes? Dave, I guess you mean --memory-only, but > > could you clarify and confirm it? > > As I understand it, the "--memory-only" option uses a new "dump-guest-memory" > QEMU monitor command that creates an ELF kdump vmcore clone. IIRC, the use of the traditional 'virsh dump' (which just splats out the QEMU migration data stream) is no longer supported with crash and everyone should be using the --memory-only flag to ensure the ELF format core. IOW, I think we can just ignore the historical migration based dump and focus exclusively on the dump-guest-memory based impl. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young 2016-11-09 3:17 ` Dave Young @ 2016-11-09 10:40 ` Andrew Jones 2016-11-09 11:26 ` Laszlo Ersek 2016-11-09 15:28 ` Dave Anderson 2016-11-09 14:32 ` Dave Anderson 2 siblings, 2 replies; 30+ messages in thread From: Andrew Jones @ 2016-11-09 10:40 UTC (permalink / raw) To: Dave Young; +Cc: wency, qiaonuohan, anderson, lersek, qemu-devel, bhe On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > Hi, > > Latest linux kernel enabled kaslr to randomiz phys/virt memory > addresses, we had some effort to support kexec/kdump so that crash > utility can still works in case crashed kernel has kaslr enabled. > > But according to Dave Anderson virsh dump does not work, quoted messages > from Dave below: > > """ > with virsh dump, there's no way of even knowing that KASLR > has randomized the kernel __START_KERNEL_map region, because there is no > virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > vmcoreinfo data to compare against the vmlinux file symbol value. > Unless virsh dump can export some basic virtual memory data, which > they say it can't, I don't see how KASLR can ever be supported. > """ > > I assume virsh dump is using qemu guest memory dump facility so it > should be first addressed in qemu. Thus post this query to qemu devel > list. If this is not correct please let me know. > > Could you qemu dump people make it work? Or we can not support virt dump > as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > When the -kernel command line option is used, then it may be possible to extract some information that could be used to supplement the memory dump that dump-guest-memory provides. However, that would be a specific use. In general, QEMU knows nothing about the guest kernel. It doesn't know where it is in the disk image, and it doesn't even know if it's Linux. Is there anything a guest userspace application could probe from e.g. /proc that would work? If so, then the guest agent could gain a new feature providing that. Thanks, drew ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 10:40 ` Andrew Jones @ 2016-11-09 11:26 ` Laszlo Ersek 2016-11-09 11:37 ` Daniel P. Berrange 2016-11-09 15:28 ` Dave Anderson 1 sibling, 1 reply; 30+ messages in thread From: Laszlo Ersek @ 2016-11-09 11:26 UTC (permalink / raw) To: Andrew Jones, Dave Young; +Cc: wency, qiaonuohan, anderson, qemu-devel, bhe On 11/09/16 11:40, Andrew Jones wrote: > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: >> Hi, >> >> Latest linux kernel enabled kaslr to randomiz phys/virt memory >> addresses, we had some effort to support kexec/kdump so that crash >> utility can still works in case crashed kernel has kaslr enabled. >> >> But according to Dave Anderson virsh dump does not work, quoted messages >> from Dave below: >> >> """ >> with virsh dump, there's no way of even knowing that KASLR >> has randomized the kernel __START_KERNEL_map region, because there is no >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump >> vmcoreinfo data to compare against the vmlinux file symbol value. >> Unless virsh dump can export some basic virtual memory data, which >> they say it can't, I don't see how KASLR can ever be supported. >> """ >> >> I assume virsh dump is using qemu guest memory dump facility so it >> should be first addressed in qemu. Thus post this query to qemu devel >> list. If this is not correct please let me know. >> >> Could you qemu dump people make it work? Or we can not support virt dump >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. >> > > When the -kernel command line option is used, then it may be possible > to extract some information that could be used to supplement the memory > dump that dump-guest-memory provides. However, that would be a specific > use. In general, QEMU knows nothing about the guest kernel. It doesn't > know where it is in the disk image, and it doesn't even know if it's > Linux. > > Is there anything a guest userspace application could probe from e.g. > /proc that would work? If so, then the guest agent could gain a new > feature providing that. I fully agree. This is exactly what I suggested too, independently, in the downstream thread, before arriving at this upstream thread. Let me quote that email: On 11/09/16 12:09, Laszlo Ersek wrote: > [...] the dump-guest-memory QEMU command supports an option called > "paging". Here's its documentation, from the "qapi-schema.json" source > file: > >> # @paging: if true, do paging to get guest's memory mapping. This allows >> # using gdb to process the core file. >> # >> # IMPORTANT: this option can make QEMU allocate several gigabytes >> # of RAM. This can happen for a large guest, or a >> # malicious guest pretending to be large. >> # >> # Also, paging=true has the following limitations: >> # >> # 1. The guest may be in a catastrophic state or can have corrupted >> # memory, which cannot be trusted >> # 2. The guest can be in real-mode even if paging is enabled. For >> # example, the guest uses ACPI to sleep, and ACPI sleep state >> # goes in real-mode >> # 3. Currently only supported on i386 and x86_64. >> # > > "virsh dump --memory-only" sets paging=false, for obvious reasons. > > [...] the dump-guest-memory command provides a raw snapshot of the > virtual machine's memory (and of the registers of the VCPUs); it is > not enlightened about the guest. > > If the additional information you are looking for can be retrieved > within the running Linux guest, using an appropriately privieleged > userspace process, then I would recommend considering an extension to > the qemu guest agent. The management layer (libvirt, [...]) could > first invoke the guest agent (a process with root privileges running > in the guest) from the host side, through virtio-serial. The new guest > agent command would return the information necessary to deal with > KASLR. Then the management layer would initiate the dump like always. > Finally, the extra information would be combined with (or placed > beside) the dump file in some way. > > So, this proposal would affect the guest agent and the management > layer (= libvirt). Given that we already dislike "paging=true", enlightening dump-guest-memory with even more guest-specific insight is the wrong approach, IMO. That kind of knowledge belongs to the guest agent. Thanks Laszlo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 11:26 ` Laszlo Ersek @ 2016-11-09 11:37 ` Daniel P. Berrange 2016-11-09 11:48 ` Andrew Jones 0 siblings, 1 reply; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-09 11:37 UTC (permalink / raw) To: Laszlo Ersek Cc: Andrew Jones, Dave Young, qiaonuohan, bhe, anderson, qemu-devel On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > On 11/09/16 11:40, Andrew Jones wrote: > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > >> Hi, > >> > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > >> addresses, we had some effort to support kexec/kdump so that crash > >> utility can still works in case crashed kernel has kaslr enabled. > >> > >> But according to Dave Anderson virsh dump does not work, quoted messages > >> from Dave below: > >> > >> """ > >> with virsh dump, there's no way of even knowing that KASLR > >> has randomized the kernel __START_KERNEL_map region, because there is no > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > >> vmcoreinfo data to compare against the vmlinux file symbol value. > >> Unless virsh dump can export some basic virtual memory data, which > >> they say it can't, I don't see how KASLR can ever be supported. > >> """ > >> > >> I assume virsh dump is using qemu guest memory dump facility so it > >> should be first addressed in qemu. Thus post this query to qemu devel > >> list. If this is not correct please let me know. > >> > >> Could you qemu dump people make it work? Or we can not support virt dump > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > >> > > > > When the -kernel command line option is used, then it may be possible > > to extract some information that could be used to supplement the memory > > dump that dump-guest-memory provides. However, that would be a specific > > use. In general, QEMU knows nothing about the guest kernel. It doesn't > > know where it is in the disk image, and it doesn't even know if it's > > Linux. > > > > Is there anything a guest userspace application could probe from e.g. > > /proc that would work? If so, then the guest agent could gain a new > > feature providing that. > > I fully agree. This is exactly what I suggested too, independently, in > the downstream thread, before arriving at this upstream thread. Let me > quote that email: > > On 11/09/16 12:09, Laszlo Ersek wrote: > > [...] the dump-guest-memory QEMU command supports an option called > > "paging". Here's its documentation, from the "qapi-schema.json" source > > file: > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows > >> # using gdb to process the core file. > >> # > >> # IMPORTANT: this option can make QEMU allocate several gigabytes > >> # of RAM. This can happen for a large guest, or a > >> # malicious guest pretending to be large. > >> # > >> # Also, paging=true has the following limitations: > >> # > >> # 1. The guest may be in a catastrophic state or can have corrupted > >> # memory, which cannot be trusted > >> # 2. The guest can be in real-mode even if paging is enabled. For > >> # example, the guest uses ACPI to sleep, and ACPI sleep state > >> # goes in real-mode > >> # 3. Currently only supported on i386 and x86_64. > >> # > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > > [...] the dump-guest-memory command provides a raw snapshot of the > > virtual machine's memory (and of the registers of the VCPUs); it is > > not enlightened about the guest. > > > > If the additional information you are looking for can be retrieved > > within the running Linux guest, using an appropriately privieleged > > userspace process, then I would recommend considering an extension to > > the qemu guest agent. The management layer (libvirt, [...]) could > > first invoke the guest agent (a process with root privileges running > > in the guest) from the host side, through virtio-serial. The new guest > > agent command would return the information necessary to deal with > > KASLR. Then the management layer would initiate the dump like always. > > Finally, the extra information would be combined with (or placed > > beside) the dump file in some way. > > > > So, this proposal would affect the guest agent and the management > > layer (= libvirt). > > Given that we already dislike "paging=true", enlightening > dump-guest-memory with even more guest-specific insight is the wrong > approach, IMO. That kind of knowledge belongs to the guest agent. If you're trying to debug a hung/panicked guest, then using a guest agent to fetch info is a complete non-starter as it'll be dead. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 11:37 ` Daniel P. Berrange @ 2016-11-09 11:48 ` Andrew Jones 2016-11-09 11:58 ` Daniel P. Berrange 0 siblings, 1 reply; 30+ messages in thread From: Andrew Jones @ 2016-11-09 11:48 UTC (permalink / raw) To: Daniel P. Berrange Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > On 11/09/16 11:40, Andrew Jones wrote: > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > >> Hi, > > >> > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > >> addresses, we had some effort to support kexec/kdump so that crash > > >> utility can still works in case crashed kernel has kaslr enabled. > > >> > > >> But according to Dave Anderson virsh dump does not work, quoted messages > > >> from Dave below: > > >> > > >> """ > > >> with virsh dump, there's no way of even knowing that KASLR > > >> has randomized the kernel __START_KERNEL_map region, because there is no > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > >> vmcoreinfo data to compare against the vmlinux file symbol value. > > >> Unless virsh dump can export some basic virtual memory data, which > > >> they say it can't, I don't see how KASLR can ever be supported. > > >> """ > > >> > > >> I assume virsh dump is using qemu guest memory dump facility so it > > >> should be first addressed in qemu. Thus post this query to qemu devel > > >> list. If this is not correct please let me know. > > >> > > >> Could you qemu dump people make it work? Or we can not support virt dump > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > >> > > > > > > When the -kernel command line option is used, then it may be possible > > > to extract some information that could be used to supplement the memory > > > dump that dump-guest-memory provides. However, that would be a specific > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't > > > know where it is in the disk image, and it doesn't even know if it's > > > Linux. > > > > > > Is there anything a guest userspace application could probe from e.g. > > > /proc that would work? If so, then the guest agent could gain a new > > > feature providing that. > > > > I fully agree. This is exactly what I suggested too, independently, in > > the downstream thread, before arriving at this upstream thread. Let me > > quote that email: > > > > On 11/09/16 12:09, Laszlo Ersek wrote: > > > [...] the dump-guest-memory QEMU command supports an option called > > > "paging". Here's its documentation, from the "qapi-schema.json" source > > > file: > > > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows > > >> # using gdb to process the core file. > > >> # > > >> # IMPORTANT: this option can make QEMU allocate several gigabytes > > >> # of RAM. This can happen for a large guest, or a > > >> # malicious guest pretending to be large. > > >> # > > >> # Also, paging=true has the following limitations: > > >> # > > >> # 1. The guest may be in a catastrophic state or can have corrupted > > >> # memory, which cannot be trusted > > >> # 2. The guest can be in real-mode even if paging is enabled. For > > >> # example, the guest uses ACPI to sleep, and ACPI sleep state > > >> # goes in real-mode > > >> # 3. Currently only supported on i386 and x86_64. > > >> # > > > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > > > > [...] the dump-guest-memory command provides a raw snapshot of the > > > virtual machine's memory (and of the registers of the VCPUs); it is > > > not enlightened about the guest. > > > > > > If the additional information you are looking for can be retrieved > > > within the running Linux guest, using an appropriately privieleged > > > userspace process, then I would recommend considering an extension to > > > the qemu guest agent. The management layer (libvirt, [...]) could > > > first invoke the guest agent (a process with root privileges running > > > in the guest) from the host side, through virtio-serial. The new guest > > > agent command would return the information necessary to deal with > > > KASLR. Then the management layer would initiate the dump like always. > > > Finally, the extra information would be combined with (or placed > > > beside) the dump file in some way. > > > > > > So, this proposal would affect the guest agent and the management > > > layer (= libvirt). > > > > Given that we already dislike "paging=true", enlightening > > dump-guest-memory with even more guest-specific insight is the wrong > > approach, IMO. That kind of knowledge belongs to the guest agent. > > If you're trying to debug a hung/panicked guest, then using a guest > agent to fetch info is a complete non-starter as it'll be dead. So don't wait. Management software can make this query immediately after the guest agent goes live. The information needed won't change. drew ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 11:48 ` Andrew Jones @ 2016-11-09 11:58 ` Daniel P. Berrange 2016-11-09 12:20 ` Andrew Jones 0 siblings, 1 reply; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-09 11:58 UTC (permalink / raw) To: Andrew Jones Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > > On 11/09/16 11:40, Andrew Jones wrote: > > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > > >> Hi, > > > >> > > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > >> addresses, we had some effort to support kexec/kdump so that crash > > > >> utility can still works in case crashed kernel has kaslr enabled. > > > >> > > > >> But according to Dave Anderson virsh dump does not work, quoted messages > > > >> from Dave below: > > > >> > > > >> """ > > > >> with virsh dump, there's no way of even knowing that KASLR > > > >> has randomized the kernel __START_KERNEL_map region, because there is no > > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > > >> vmcoreinfo data to compare against the vmlinux file symbol value. > > > >> Unless virsh dump can export some basic virtual memory data, which > > > >> they say it can't, I don't see how KASLR can ever be supported. > > > >> """ > > > >> > > > >> I assume virsh dump is using qemu guest memory dump facility so it > > > >> should be first addressed in qemu. Thus post this query to qemu devel > > > >> list. If this is not correct please let me know. > > > >> > > > >> Could you qemu dump people make it work? Or we can not support virt dump > > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > > >> > > > > > > > > When the -kernel command line option is used, then it may be possible > > > > to extract some information that could be used to supplement the memory > > > > dump that dump-guest-memory provides. However, that would be a specific > > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't > > > > know where it is in the disk image, and it doesn't even know if it's > > > > Linux. > > > > > > > > Is there anything a guest userspace application could probe from e.g. > > > > /proc that would work? If so, then the guest agent could gain a new > > > > feature providing that. > > > > > > I fully agree. This is exactly what I suggested too, independently, in > > > the downstream thread, before arriving at this upstream thread. Let me > > > quote that email: > > > > > > On 11/09/16 12:09, Laszlo Ersek wrote: > > > > [...] the dump-guest-memory QEMU command supports an option called > > > > "paging". Here's its documentation, from the "qapi-schema.json" source > > > > file: > > > > > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows > > > >> # using gdb to process the core file. > > > >> # > > > >> # IMPORTANT: this option can make QEMU allocate several gigabytes > > > >> # of RAM. This can happen for a large guest, or a > > > >> # malicious guest pretending to be large. > > > >> # > > > >> # Also, paging=true has the following limitations: > > > >> # > > > >> # 1. The guest may be in a catastrophic state or can have corrupted > > > >> # memory, which cannot be trusted > > > >> # 2. The guest can be in real-mode even if paging is enabled. For > > > >> # example, the guest uses ACPI to sleep, and ACPI sleep state > > > >> # goes in real-mode > > > >> # 3. Currently only supported on i386 and x86_64. > > > >> # > > > > > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > > > > > > [...] the dump-guest-memory command provides a raw snapshot of the > > > > virtual machine's memory (and of the registers of the VCPUs); it is > > > > not enlightened about the guest. > > > > > > > > If the additional information you are looking for can be retrieved > > > > within the running Linux guest, using an appropriately privieleged > > > > userspace process, then I would recommend considering an extension to > > > > the qemu guest agent. The management layer (libvirt, [...]) could > > > > first invoke the guest agent (a process with root privileges running > > > > in the guest) from the host side, through virtio-serial. The new guest > > > > agent command would return the information necessary to deal with > > > > KASLR. Then the management layer would initiate the dump like always. > > > > Finally, the extra information would be combined with (or placed > > > > beside) the dump file in some way. > > > > > > > > So, this proposal would affect the guest agent and the management > > > > layer (= libvirt). > > > > > > Given that we already dislike "paging=true", enlightening > > > dump-guest-memory with even more guest-specific insight is the wrong > > > approach, IMO. That kind of knowledge belongs to the guest agent. > > > > If you're trying to debug a hung/panicked guest, then using a guest > > agent to fetch info is a complete non-starter as it'll be dead. > > So don't wait. Management software can make this query immediately > after the guest agent goes live. The information needed won't change. That doesn't help with trying to diagnose a crash during boot up, since the guest agent isn't running till fairly late. I'm also concerned that the QEMU guest agent is likely to be far from widely deployed in guests, so reliance on the guest agent will mean the dump facility is no longer reliably available. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 11:58 ` Daniel P. Berrange @ 2016-11-09 12:20 ` Andrew Jones 2016-11-09 14:47 ` Daniel P. Berrange 0 siblings, 1 reply; 30+ messages in thread From: Andrew Jones @ 2016-11-09 12:20 UTC (permalink / raw) To: Daniel P. Berrange Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > > > On 11/09/16 11:40, Andrew Jones wrote: > > > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > > > >> Hi, > > > > >> > > > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > > >> addresses, we had some effort to support kexec/kdump so that crash > > > > >> utility can still works in case crashed kernel has kaslr enabled. > > > > >> > > > > >> But according to Dave Anderson virsh dump does not work, quoted messages > > > > >> from Dave below: > > > > >> > > > > >> """ > > > > >> with virsh dump, there's no way of even knowing that KASLR > > > > >> has randomized the kernel __START_KERNEL_map region, because there is no > > > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > > > >> vmcoreinfo data to compare against the vmlinux file symbol value. > > > > >> Unless virsh dump can export some basic virtual memory data, which > > > > >> they say it can't, I don't see how KASLR can ever be supported. > > > > >> """ > > > > >> > > > > >> I assume virsh dump is using qemu guest memory dump facility so it > > > > >> should be first addressed in qemu. Thus post this query to qemu devel > > > > >> list. If this is not correct please let me know. > > > > >> > > > > >> Could you qemu dump people make it work? Or we can not support virt dump > > > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > > > >> > > > > > > > > > > When the -kernel command line option is used, then it may be possible > > > > > to extract some information that could be used to supplement the memory > > > > > dump that dump-guest-memory provides. However, that would be a specific > > > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't > > > > > know where it is in the disk image, and it doesn't even know if it's > > > > > Linux. > > > > > > > > > > Is there anything a guest userspace application could probe from e.g. > > > > > /proc that would work? If so, then the guest agent could gain a new > > > > > feature providing that. > > > > > > > > I fully agree. This is exactly what I suggested too, independently, in > > > > the downstream thread, before arriving at this upstream thread. Let me > > > > quote that email: > > > > > > > > On 11/09/16 12:09, Laszlo Ersek wrote: > > > > > [...] the dump-guest-memory QEMU command supports an option called > > > > > "paging". Here's its documentation, from the "qapi-schema.json" source > > > > > file: > > > > > > > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows > > > > >> # using gdb to process the core file. > > > > >> # > > > > >> # IMPORTANT: this option can make QEMU allocate several gigabytes > > > > >> # of RAM. This can happen for a large guest, or a > > > > >> # malicious guest pretending to be large. > > > > >> # > > > > >> # Also, paging=true has the following limitations: > > > > >> # > > > > >> # 1. The guest may be in a catastrophic state or can have corrupted > > > > >> # memory, which cannot be trusted > > > > >> # 2. The guest can be in real-mode even if paging is enabled. For > > > > >> # example, the guest uses ACPI to sleep, and ACPI sleep state > > > > >> # goes in real-mode > > > > >> # 3. Currently only supported on i386 and x86_64. > > > > >> # > > > > > > > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > > > > > > > > [...] the dump-guest-memory command provides a raw snapshot of the > > > > > virtual machine's memory (and of the registers of the VCPUs); it is > > > > > not enlightened about the guest. > > > > > > > > > > If the additional information you are looking for can be retrieved > > > > > within the running Linux guest, using an appropriately privieleged > > > > > userspace process, then I would recommend considering an extension to > > > > > the qemu guest agent. The management layer (libvirt, [...]) could > > > > > first invoke the guest agent (a process with root privileges running > > > > > in the guest) from the host side, through virtio-serial. The new guest > > > > > agent command would return the information necessary to deal with > > > > > KASLR. Then the management layer would initiate the dump like always. > > > > > Finally, the extra information would be combined with (or placed > > > > > beside) the dump file in some way. > > > > > > > > > > So, this proposal would affect the guest agent and the management > > > > > layer (= libvirt). > > > > > > > > Given that we already dislike "paging=true", enlightening > > > > dump-guest-memory with even more guest-specific insight is the wrong > > > > approach, IMO. That kind of knowledge belongs to the guest agent. > > > > > > If you're trying to debug a hung/panicked guest, then using a guest > > > agent to fetch info is a complete non-starter as it'll be dead. > > > > So don't wait. Management software can make this query immediately > > after the guest agent goes live. The information needed won't change. > > That doesn't help with trying to diagnose a crash during boot up, since > the guest agent isn't running till fairly late. I'm also concerned that > the QEMU guest agent is likely to be far from widely deployed in guests, > so reliance on the guest agent will mean the dump facility is no longer > reliably available. > It'd still be reliably available and useable during early boot, just like it is now, for kernels that don't use KASLR. This proposal is only attempting to *also* address KASLR kernels, for which there is currently no support whatsoever. Call it a best-effort. Of course we can get support for [probably] early boot and guest-agent-less guests using KASLR too if we introduce a paravirt solution, requiring guest kernel and KVM changes. Is it worth it? Thanks, drew ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 12:20 ` Andrew Jones @ 2016-11-09 14:47 ` Daniel P. Berrange 2016-11-09 15:38 ` Laszlo Ersek 0 siblings, 1 reply; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-09 14:47 UTC (permalink / raw) To: Andrew Jones Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > > On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > > > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > > > > On 11/09/16 11:40, Andrew Jones wrote: > > > > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > > > > >> Hi, > > > > > >> > > > > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > > > >> addresses, we had some effort to support kexec/kdump so that crash > > > > > >> utility can still works in case crashed kernel has kaslr enabled. > > > > > >> > > > > > >> But according to Dave Anderson virsh dump does not work, quoted messages > > > > > >> from Dave below: > > > > > >> > > > > > >> """ > > > > > >> with virsh dump, there's no way of even knowing that KASLR > > > > > >> has randomized the kernel __START_KERNEL_map region, because there is no > > > > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > > > > >> vmcoreinfo data to compare against the vmlinux file symbol value. > > > > > >> Unless virsh dump can export some basic virtual memory data, which > > > > > >> they say it can't, I don't see how KASLR can ever be supported. > > > > > >> """ > > > > > >> > > > > > >> I assume virsh dump is using qemu guest memory dump facility so it > > > > > >> should be first addressed in qemu. Thus post this query to qemu devel > > > > > >> list. If this is not correct please let me know. > > > > > >> > > > > > >> Could you qemu dump people make it work? Or we can not support virt dump > > > > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > > > > >> > > > > > > > > > > > > When the -kernel command line option is used, then it may be possible > > > > > > to extract some information that could be used to supplement the memory > > > > > > dump that dump-guest-memory provides. However, that would be a specific > > > > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't > > > > > > know where it is in the disk image, and it doesn't even know if it's > > > > > > Linux. > > > > > > > > > > > > Is there anything a guest userspace application could probe from e.g. > > > > > > /proc that would work? If so, then the guest agent could gain a new > > > > > > feature providing that. > > > > > > > > > > I fully agree. This is exactly what I suggested too, independently, in > > > > > the downstream thread, before arriving at this upstream thread. Let me > > > > > quote that email: > > > > > > > > > > On 11/09/16 12:09, Laszlo Ersek wrote: > > > > > > [...] the dump-guest-memory QEMU command supports an option called > > > > > > "paging". Here's its documentation, from the "qapi-schema.json" source > > > > > > file: > > > > > > > > > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows > > > > > >> # using gdb to process the core file. > > > > > >> # > > > > > >> # IMPORTANT: this option can make QEMU allocate several gigabytes > > > > > >> # of RAM. This can happen for a large guest, or a > > > > > >> # malicious guest pretending to be large. > > > > > >> # > > > > > >> # Also, paging=true has the following limitations: > > > > > >> # > > > > > >> # 1. The guest may be in a catastrophic state or can have corrupted > > > > > >> # memory, which cannot be trusted > > > > > >> # 2. The guest can be in real-mode even if paging is enabled. For > > > > > >> # example, the guest uses ACPI to sleep, and ACPI sleep state > > > > > >> # goes in real-mode > > > > > >> # 3. Currently only supported on i386 and x86_64. > > > > > >> # > > > > > > > > > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > > > > > > > > > > [...] the dump-guest-memory command provides a raw snapshot of the > > > > > > virtual machine's memory (and of the registers of the VCPUs); it is > > > > > > not enlightened about the guest. > > > > > > > > > > > > If the additional information you are looking for can be retrieved > > > > > > within the running Linux guest, using an appropriately privieleged > > > > > > userspace process, then I would recommend considering an extension to > > > > > > the qemu guest agent. The management layer (libvirt, [...]) could > > > > > > first invoke the guest agent (a process with root privileges running > > > > > > in the guest) from the host side, through virtio-serial. The new guest > > > > > > agent command would return the information necessary to deal with > > > > > > KASLR. Then the management layer would initiate the dump like always. > > > > > > Finally, the extra information would be combined with (or placed > > > > > > beside) the dump file in some way. > > > > > > > > > > > > So, this proposal would affect the guest agent and the management > > > > > > layer (= libvirt). > > > > > > > > > > Given that we already dislike "paging=true", enlightening > > > > > dump-guest-memory with even more guest-specific insight is the wrong > > > > > approach, IMO. That kind of knowledge belongs to the guest agent. > > > > > > > > If you're trying to debug a hung/panicked guest, then using a guest > > > > agent to fetch info is a complete non-starter as it'll be dead. > > > > > > So don't wait. Management software can make this query immediately > > > after the guest agent goes live. The information needed won't change. > > > > That doesn't help with trying to diagnose a crash during boot up, since > > the guest agent isn't running till fairly late. I'm also concerned that > > the QEMU guest agent is likely to be far from widely deployed in guests, > > so reliance on the guest agent will mean the dump facility is no longer > > reliably available. > > > > It'd still be reliably available and useable during early boot, just like > it is now, for kernels that don't use KASLR. This proposal is only > attempting to *also* address KASLR kernels, for which there is currently > no support whatsoever. Call it a best-effort. > > Of course we can get support for [probably] early boot and > guest-agent-less guests using KASLR too if we introduce a paravirt > solution, requiring guest kernel and KVM changes. Is it worth it? There's a standard for persistent storage that is intended to allow the kernel to dump out data at time of crash: https://lwn.net/Articles/434821/ and there's some recent patches to provide a QEMU backend. Could we leverage that facility to get the data we need from the guest kernel ? Instead of only using pstore at time of crash, the kernel could see that its running on KVM, and write out the paging data to pstore. So when QEMU later generates a core dump, it can grab the corresponding data from pstore backend ? Still requires an extra device, to be configured, but at lesat we would not have to invent yet another paravirt device ourselves, just use the existing framework. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 14:47 ` Daniel P. Berrange @ 2016-11-09 15:38 ` Laszlo Ersek 2016-11-09 16:01 ` Daniel P. Berrange 2016-11-14 5:32 ` Dave Young 0 siblings, 2 replies; 30+ messages in thread From: Laszlo Ersek @ 2016-11-09 15:38 UTC (permalink / raw) To: Daniel P. Berrange, Andrew Jones Cc: Dave Young, qiaonuohan, bhe, anderson, qemu-devel On 11/09/16 15:47, Daniel P. Berrange wrote: > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: >>>>>> On 11/09/16 11:40, Andrew Jones wrote: >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. >>>>>>>> >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages >>>>>>>> from Dave below: >>>>>>>> >>>>>>>> """ >>>>>>>> with virsh dump, there's no way of even knowing that KASLR >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. >>>>>>>> Unless virsh dump can export some basic virtual memory data, which >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. >>>>>>>> """ >>>>>>>> >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel >>>>>>>> list. If this is not correct please let me know. >>>>>>>> >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. >>>>>>>> >>>>>>> >>>>>>> When the -kernel command line option is used, then it may be possible >>>>>>> to extract some information that could be used to supplement the memory >>>>>>> dump that dump-guest-memory provides. However, that would be a specific >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't >>>>>>> know where it is in the disk image, and it doesn't even know if it's >>>>>>> Linux. >>>>>>> >>>>>>> Is there anything a guest userspace application could probe from e.g. >>>>>>> /proc that would work? If so, then the guest agent could gain a new >>>>>>> feature providing that. >>>>>> >>>>>> I fully agree. This is exactly what I suggested too, independently, in >>>>>> the downstream thread, before arriving at this upstream thread. Let me >>>>>> quote that email: >>>>>> >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: >>>>>>> [...] the dump-guest-memory QEMU command supports an option called >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source >>>>>>> file: >>>>>>> >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows >>>>>>>> # using gdb to process the core file. >>>>>>>> # >>>>>>>> # IMPORTANT: this option can make QEMU allocate several gigabytes >>>>>>>> # of RAM. This can happen for a large guest, or a >>>>>>>> # malicious guest pretending to be large. >>>>>>>> # >>>>>>>> # Also, paging=true has the following limitations: >>>>>>>> # >>>>>>>> # 1. The guest may be in a catastrophic state or can have corrupted >>>>>>>> # memory, which cannot be trusted >>>>>>>> # 2. The guest can be in real-mode even if paging is enabled. For >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep state >>>>>>>> # goes in real-mode >>>>>>>> # 3. Currently only supported on i386 and x86_64. >>>>>>>> # >>>>>>> >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. >>>>>>> >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is >>>>>>> not enlightened about the guest. >>>>>>> >>>>>>> If the additional information you are looking for can be retrieved >>>>>>> within the running Linux guest, using an appropriately privieleged >>>>>>> userspace process, then I would recommend considering an extension to >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could >>>>>>> first invoke the guest agent (a process with root privileges running >>>>>>> in the guest) from the host side, through virtio-serial. The new guest >>>>>>> agent command would return the information necessary to deal with >>>>>>> KASLR. Then the management layer would initiate the dump like always. >>>>>>> Finally, the extra information would be combined with (or placed >>>>>>> beside) the dump file in some way. >>>>>>> >>>>>>> So, this proposal would affect the guest agent and the management >>>>>>> layer (= libvirt). >>>>>> >>>>>> Given that we already dislike "paging=true", enlightening >>>>>> dump-guest-memory with even more guest-specific insight is the wrong >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. >>>>> >>>>> If you're trying to debug a hung/panicked guest, then using a guest >>>>> agent to fetch info is a complete non-starter as it'll be dead. Yes, I realized this a while after posting... >>>> So don't wait. Management software can make this query immediately >>>> after the guest agent goes live. The information needed won't change. ... and then figured this would solve the problem. >>> That doesn't help with trying to diagnose a crash during boot up, since >>> the guest agent isn't running till fairly late. I'm also concerned that >>> the QEMU guest agent is likely to be far from widely deployed in guests, I have no hard data, but from the recent Fedora and RHEL-7 guest installations I've done, it seems like qga is installed automatically. (Not sure if that's because Anaconda realizes it's installing the OS in a VM.) Once I made sure there was an appropriate virtio-serial config in the domain XMLs, I could talk to the agents (mainly for fstrim's sake) immediately. >>> so reliance on the guest agent will mean the dump facility is no longer >>> reliably available. >>> >> >> It'd still be reliably available and useable during early boot, just like >> it is now, for kernels that don't use KASLR. This proposal is only >> attempting to *also* address KASLR kernels, for which there is currently >> no support whatsoever. Call it a best-effort. >> >> Of course we can get support for [probably] early boot and >> guest-agent-less guests using KASLR too if we introduce a paravirt >> solution, requiring guest kernel and KVM changes. Is it worth it? > > There's a standard for persistent storage that is intended to allow > the kernel to dump out data at time of crash: > > https://lwn.net/Articles/434821/ > > and there's some recent patches to provide a QEMU backend. Could we > leverage that facility to get the data we need from the guest kernel ? > > Instead of only using pstore at time of crash, the kernel could see > that its running on KVM, and write out the paging data to pstore. So > when QEMU later generates a core dump, it can grab the corresponding > data from pstore backend ? > > Still requires an extra device, to be configured, but at lesat we > would not have to invent yet another paravirt device ourselves, just > use the existing framework. Not disagreeing, I'd just like to point out that the kernel can also crash before the extra device (the pstore driver) is configured (especially if the driver is built as a module). Laszlo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 15:38 ` Laszlo Ersek @ 2016-11-09 16:01 ` Daniel P. Berrange 2016-11-14 10:27 ` Paolo Bonzini 2016-11-14 5:32 ` Dave Young 1 sibling, 1 reply; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-09 16:01 UTC (permalink / raw) To: Laszlo Ersek Cc: Andrew Jones, Dave Young, qiaonuohan, bhe, anderson, qemu-devel On Wed, Nov 09, 2016 at 04:38:36PM +0100, Laszlo Ersek wrote: > On 11/09/16 15:47, Daniel P. Berrange wrote: > >>> That doesn't help with trying to diagnose a crash during boot up, since > >>> the guest agent isn't running till fairly late. I'm also concerned that > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > I have no hard data, but from the recent Fedora and RHEL-7 guest > installations I've done, it seems like qga is installed automatically. > (Not sure if that's because Anaconda realizes it's installing the OS in > a VM.) Once I made sure there was an appropriate virtio-serial config in > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > immediately. I'm thinking about cloud deployment where people rarely use Anaconda directly - they'll use a pre-built cloud image, or customize the basic cloud image. Neither Fedora or Ubuntu include the qemu guest agent in their cloud images AFAICT, so very few OpenStack deployments will have QEMU guest agent in at this time. Of course we could try to get distros to embed qemu guets agent by default, but its not clear if we'd succeed, given how aggressive they are at stripping stuff out to create the smallest practical images. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 16:01 ` Daniel P. Berrange @ 2016-11-14 10:27 ` Paolo Bonzini 0 siblings, 0 replies; 30+ messages in thread From: Paolo Bonzini @ 2016-11-14 10:27 UTC (permalink / raw) To: Daniel P. Berrange, Laszlo Ersek Cc: Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson, Dave Young On 09/11/2016 17:01, Daniel P. Berrange wrote: > On Wed, Nov 09, 2016 at 04:38:36PM +0100, Laszlo Ersek wrote: >> On 11/09/16 15:47, Daniel P. Berrange wrote: >>>>> That doesn't help with trying to diagnose a crash during boot up, since >>>>> the guest agent isn't running till fairly late. I'm also concerned that >>>>> the QEMU guest agent is likely to be far from widely deployed in guests, >> >> I have no hard data, but from the recent Fedora and RHEL-7 guest >> installations I've done, it seems like qga is installed automatically. >> (Not sure if that's because Anaconda realizes it's installing the OS in >> a VM.) Once I made sure there was an appropriate virtio-serial config in >> the domain XMLs, I could talk to the agents (mainly for fstrim's sake) >> immediately. > > I'm thinking about cloud deployment where people rarely use Anaconda > directly - they'll use a pre-built cloud image, or customize the basic > cloud image. Neither Fedora or Ubuntu include the qemu guest agent in > their cloud images AFAICT, so very few OpenStack deployments will have > QEMU guest agent in at this time. That's a bug in my opinion. The guest agent is necessary in order to avoid losing data in snapshots. We should not only get distros to embed the guest agent, but also to include the relevant freeze/thaw hooks. Paolo > Of course we could try to get distros to embed qemu guets agent by > default, but its not clear if we'd succeed, given how aggressive > they are at stripping stuff out to create the smallest practical > images. > > Regards, > Daniel > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 15:38 ` Laszlo Ersek 2016-11-09 16:01 ` Daniel P. Berrange @ 2016-11-14 5:32 ` Dave Young 2016-11-14 9:47 ` Andrew Jones 2016-11-14 10:10 ` Daniel P. Berrange 1 sibling, 2 replies; 30+ messages in thread From: Dave Young @ 2016-11-14 5:32 UTC (permalink / raw) To: Laszlo Ersek Cc: Daniel P. Berrange, Andrew Jones, qiaonuohan, bhe, anderson, qemu-devel On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > On 11/09/16 15:47, Daniel P. Berrange wrote: > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > >>>>>>>> > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages > >>>>>>>> from Dave below: > >>>>>>>> > >>>>>>>> """ > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > >>>>>>>> """ > >>>>>>>> > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel > >>>>>>>> list. If this is not correct please let me know. > >>>>>>>> > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > >>>>>>>> > >>>>>>> > >>>>>>> When the -kernel command line option is used, then it may be possible > >>>>>>> to extract some information that could be used to supplement the memory > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > >>>>>>> Linux. > >>>>>>> > >>>>>>> Is there anything a guest userspace application could probe from e.g. > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > >>>>>>> feature providing that. > >>>>>> > >>>>>> I fully agree. This is exactly what I suggested too, independently, in > >>>>>> the downstream thread, before arriving at this upstream thread. Let me > >>>>>> quote that email: > >>>>>> > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source > >>>>>>> file: > >>>>>>> > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows > >>>>>>>> # using gdb to process the core file. > >>>>>>>> # > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several gigabytes > >>>>>>>> # of RAM. This can happen for a large guest, or a > >>>>>>>> # malicious guest pretending to be large. > >>>>>>>> # > >>>>>>>> # Also, paging=true has the following limitations: > >>>>>>>> # > >>>>>>>> # 1. The guest may be in a catastrophic state or can have corrupted > >>>>>>>> # memory, which cannot be trusted > >>>>>>>> # 2. The guest can be in real-mode even if paging is enabled. For > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep state > >>>>>>>> # goes in real-mode > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > >>>>>>>> # > >>>>>>> > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > >>>>>>> > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > >>>>>>> not enlightened about the guest. > >>>>>>> > >>>>>>> If the additional information you are looking for can be retrieved > >>>>>>> within the running Linux guest, using an appropriately privieleged > >>>>>>> userspace process, then I would recommend considering an extension to > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > >>>>>>> first invoke the guest agent (a process with root privileges running > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest > >>>>>>> agent command would return the information necessary to deal with > >>>>>>> KASLR. Then the management layer would initiate the dump like always. > >>>>>>> Finally, the extra information would be combined with (or placed > >>>>>>> beside) the dump file in some way. > >>>>>>> > >>>>>>> So, this proposal would affect the guest agent and the management > >>>>>>> layer (= libvirt). > >>>>>> > >>>>>> Given that we already dislike "paging=true", enlightening > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > >>>>> > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > Yes, I realized this a while after posting... > > >>>> So don't wait. Management software can make this query immediately > >>>> after the guest agent goes live. The information needed won't change. > > ... and then figured this would solve the problem. > > >>> That doesn't help with trying to diagnose a crash during boot up, since > >>> the guest agent isn't running till fairly late. I'm also concerned that > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > I have no hard data, but from the recent Fedora and RHEL-7 guest > installations I've done, it seems like qga is installed automatically. > (Not sure if that's because Anaconda realizes it's installing the OS in > a VM.) Once I made sure there was an appropriate virtio-serial config in > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > immediately. > > >>> so reliance on the guest agent will mean the dump facility is no longer > >>> reliably available. > >>> > >> > >> It'd still be reliably available and useable during early boot, just like > >> it is now, for kernels that don't use KASLR. This proposal is only > >> attempting to *also* address KASLR kernels, for which there is currently > >> no support whatsoever. Call it a best-effort. > >> > >> Of course we can get support for [probably] early boot and > >> guest-agent-less guests using KASLR too if we introduce a paravirt > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > There's a standard for persistent storage that is intended to allow > > the kernel to dump out data at time of crash: > > > > https://lwn.net/Articles/434821/ > > > > and there's some recent patches to provide a QEMU backend. Could we > > leverage that facility to get the data we need from the guest kernel ? > > > > Instead of only using pstore at time of crash, the kernel could see > > that its running on KVM, and write out the paging data to pstore. So > > when QEMU later generates a core dump, it can grab the corresponding > > data from pstore backend ? > > > > Still requires an extra device, to be configured, but at lesat we > > would not have to invent yet another paravirt device ourselves, just > > use the existing framework. > > Not disagreeing, I'd just like to point out that the kernel can also > crash before the extra device (the pstore driver) is configured > (especially if the driver is built as a module). Boot phase crash is also a problem for kdump, but hopefully the boot phase crash will be found early and get fixed early. The run time problems are harder, it will still be helpful. I'm not a virt expert, but from my feeling comparint guest agent and pstore I would vote for guest agent, it is ready to work on now, no? For pstore I'm not sure how to make a pstore device for all guests. I know uefi guest can use its nvram, but introducing some general pstore sounds hard.. Thanks Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 5:32 ` Dave Young @ 2016-11-14 9:47 ` Andrew Jones 2016-11-16 2:48 ` Dave Young 2016-11-14 10:10 ` Daniel P. Berrange 1 sibling, 1 reply; 30+ messages in thread From: Andrew Jones @ 2016-11-14 9:47 UTC (permalink / raw) To: Dave Young; +Cc: Laszlo Ersek, bhe, qemu-devel, qiaonuohan, anderson On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote: > On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > > On 11/09/16 15:47, Daniel P. Berrange wrote: > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > > >>>>>>>> > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages > > >>>>>>>> from Dave below: > > >>>>>>>> > > >>>>>>>> """ > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > > >>>>>>>> """ > > >>>>>>>> > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel > > >>>>>>>> list. If this is not correct please let me know. > > >>>>>>>> > > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > >>>>>>>> > > >>>>>>> > > >>>>>>> When the -kernel command line option is used, then it may be possible > > >>>>>>> to extract some information that could be used to supplement the memory > > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't > > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > > >>>>>>> Linux. > > >>>>>>> > > >>>>>>> Is there anything a guest userspace application could probe from e.g. > > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > > >>>>>>> feature providing that. > > >>>>>> > > >>>>>> I fully agree. This is exactly what I suggested too, independently, in > > >>>>>> the downstream thread, before arriving at this upstream thread. Let me > > >>>>>> quote that email: > > >>>>>> > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source > > >>>>>>> file: > > >>>>>>> > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows > > >>>>>>>> # using gdb to process the core file. > > >>>>>>>> # > > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several gigabytes > > >>>>>>>> # of RAM. This can happen for a large guest, or a > > >>>>>>>> # malicious guest pretending to be large. > > >>>>>>>> # > > >>>>>>>> # Also, paging=true has the following limitations: > > >>>>>>>> # > > >>>>>>>> # 1. The guest may be in a catastrophic state or can have corrupted > > >>>>>>>> # memory, which cannot be trusted > > >>>>>>>> # 2. The guest can be in real-mode even if paging is enabled. For > > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep state > > >>>>>>>> # goes in real-mode > > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > > >>>>>>>> # > > >>>>>>> > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > > >>>>>>> > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > > >>>>>>> not enlightened about the guest. > > >>>>>>> > > >>>>>>> If the additional information you are looking for can be retrieved > > >>>>>>> within the running Linux guest, using an appropriately privieleged > > >>>>>>> userspace process, then I would recommend considering an extension to > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > > >>>>>>> first invoke the guest agent (a process with root privileges running > > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest > > >>>>>>> agent command would return the information necessary to deal with > > >>>>>>> KASLR. Then the management layer would initiate the dump like always. > > >>>>>>> Finally, the extra information would be combined with (or placed > > >>>>>>> beside) the dump file in some way. > > >>>>>>> > > >>>>>>> So, this proposal would affect the guest agent and the management > > >>>>>>> layer (= libvirt). > > >>>>>> > > >>>>>> Given that we already dislike "paging=true", enlightening > > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > > >>>>> > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > > > Yes, I realized this a while after posting... > > > > >>>> So don't wait. Management software can make this query immediately > > >>>> after the guest agent goes live. The information needed won't change. > > > > ... and then figured this would solve the problem. > > > > >>> That doesn't help with trying to diagnose a crash during boot up, since > > >>> the guest agent isn't running till fairly late. I'm also concerned that > > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > > > I have no hard data, but from the recent Fedora and RHEL-7 guest > > installations I've done, it seems like qga is installed automatically. > > (Not sure if that's because Anaconda realizes it's installing the OS in > > a VM.) Once I made sure there was an appropriate virtio-serial config in > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > > immediately. > > > > >>> so reliance on the guest agent will mean the dump facility is no longer > > >>> reliably available. > > >>> > > >> > > >> It'd still be reliably available and useable during early boot, just like > > >> it is now, for kernels that don't use KASLR. This proposal is only > > >> attempting to *also* address KASLR kernels, for which there is currently > > >> no support whatsoever. Call it a best-effort. > > >> > > >> Of course we can get support for [probably] early boot and > > >> guest-agent-less guests using KASLR too if we introduce a paravirt > > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > > > There's a standard for persistent storage that is intended to allow > > > the kernel to dump out data at time of crash: > > > > > > https://lwn.net/Articles/434821/ > > > > > > and there's some recent patches to provide a QEMU backend. Could we > > > leverage that facility to get the data we need from the guest kernel ? > > > > > > Instead of only using pstore at time of crash, the kernel could see > > > that its running on KVM, and write out the paging data to pstore. So > > > when QEMU later generates a core dump, it can grab the corresponding > > > data from pstore backend ? > > > > > > Still requires an extra device, to be configured, but at lesat we > > > would not have to invent yet another paravirt device ourselves, just > > > use the existing framework. > > > > Not disagreeing, I'd just like to point out that the kernel can also > > crash before the extra device (the pstore driver) is configured > > (especially if the driver is built as a module). > > Boot phase crash is also a problem for kdump, but hopefully the boot > phase crash will be found early and get fixed early. The run time > problems are harder, it will still be helpful. > > I'm not a virt expert, but from my feeling comparint guest agent and > pstore I would vote for guest agent, it is ready to work on now, no? > For pstore I'm not sure how to make a pstore device for all guests. I > know uefi guest can use its nvram, but introducing some general pstore > sounds hard.. > Nothing is stopping us from doing both, eventually. Care should be taken on the management side to make it general enough. It should be designed such that it can use guest-agent now, but in no way is bound to guest- agent. We can decide later if we want to replace guest-agent with some paravirt solution. Nothing is blocking guest-agent patches now, that I know of. Thanks, drew ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 9:47 ` Andrew Jones @ 2016-11-16 2:48 ` Dave Young 0 siblings, 0 replies; 30+ messages in thread From: Dave Young @ 2016-11-16 2:48 UTC (permalink / raw) To: Andrew Jones; +Cc: Laszlo Ersek, bhe, qemu-devel, qiaonuohan, anderson On 11/14/16 at 10:47am, Andrew Jones wrote: > On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote: > > On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > > > On 11/09/16 15:47, Daniel P. Berrange wrote: > > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > > > >>>>>>>> > > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages > > > >>>>>>>> from Dave below: > > > >>>>>>>> > > > >>>>>>>> """ > > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no > > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > > > >>>>>>>> """ > > > >>>>>>>> > > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel > > > >>>>>>>> list. If this is not correct please let me know. > > > >>>>>>>> > > > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump > > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> When the -kernel command line option is used, then it may be possible > > > >>>>>>> to extract some information that could be used to supplement the memory > > > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific > > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't > > > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > > > >>>>>>> Linux. > > > >>>>>>> > > > >>>>>>> Is there anything a guest userspace application could probe from e.g. > > > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > > > >>>>>>> feature providing that. > > > >>>>>> > > > >>>>>> I fully agree. This is exactly what I suggested too, independently, in > > > >>>>>> the downstream thread, before arriving at this upstream thread. Let me > > > >>>>>> quote that email: > > > >>>>>> > > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source > > > >>>>>>> file: > > > >>>>>>> > > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows > > > >>>>>>>> # using gdb to process the core file. > > > >>>>>>>> # > > > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several gigabytes > > > >>>>>>>> # of RAM. This can happen for a large guest, or a > > > >>>>>>>> # malicious guest pretending to be large. > > > >>>>>>>> # > > > >>>>>>>> # Also, paging=true has the following limitations: > > > >>>>>>>> # > > > >>>>>>>> # 1. The guest may be in a catastrophic state or can have corrupted > > > >>>>>>>> # memory, which cannot be trusted > > > >>>>>>>> # 2. The guest can be in real-mode even if paging is enabled. For > > > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep state > > > >>>>>>>> # goes in real-mode > > > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > > > >>>>>>>> # > > > >>>>>>> > > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > >>>>>>> > > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > > > >>>>>>> not enlightened about the guest. > > > >>>>>>> > > > >>>>>>> If the additional information you are looking for can be retrieved > > > >>>>>>> within the running Linux guest, using an appropriately privieleged > > > >>>>>>> userspace process, then I would recommend considering an extension to > > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > > > >>>>>>> first invoke the guest agent (a process with root privileges running > > > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest > > > >>>>>>> agent command would return the information necessary to deal with > > > >>>>>>> KASLR. Then the management layer would initiate the dump like always. > > > >>>>>>> Finally, the extra information would be combined with (or placed > > > >>>>>>> beside) the dump file in some way. > > > >>>>>>> > > > >>>>>>> So, this proposal would affect the guest agent and the management > > > >>>>>>> layer (= libvirt). > > > >>>>>> > > > >>>>>> Given that we already dislike "paging=true", enlightening > > > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > > > >>>>> > > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > > > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > > > > > Yes, I realized this a while after posting... > > > > > > >>>> So don't wait. Management software can make this query immediately > > > >>>> after the guest agent goes live. The information needed won't change. > > > > > > ... and then figured this would solve the problem. > > > > > > >>> That doesn't help with trying to diagnose a crash during boot up, since > > > >>> the guest agent isn't running till fairly late. I'm also concerned that > > > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > > > > > I have no hard data, but from the recent Fedora and RHEL-7 guest > > > installations I've done, it seems like qga is installed automatically. > > > (Not sure if that's because Anaconda realizes it's installing the OS in > > > a VM.) Once I made sure there was an appropriate virtio-serial config in > > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > > > immediately. > > > > > > >>> so reliance on the guest agent will mean the dump facility is no longer > > > >>> reliably available. > > > >>> > > > >> > > > >> It'd still be reliably available and useable during early boot, just like > > > >> it is now, for kernels that don't use KASLR. This proposal is only > > > >> attempting to *also* address KASLR kernels, for which there is currently > > > >> no support whatsoever. Call it a best-effort. > > > >> > > > >> Of course we can get support for [probably] early boot and > > > >> guest-agent-less guests using KASLR too if we introduce a paravirt > > > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > > > > > There's a standard for persistent storage that is intended to allow > > > > the kernel to dump out data at time of crash: > > > > > > > > https://lwn.net/Articles/434821/ > > > > > > > > and there's some recent patches to provide a QEMU backend. Could we > > > > leverage that facility to get the data we need from the guest kernel ? > > > > > > > > Instead of only using pstore at time of crash, the kernel could see > > > > that its running on KVM, and write out the paging data to pstore. So > > > > when QEMU later generates a core dump, it can grab the corresponding > > > > data from pstore backend ? > > > > > > > > Still requires an extra device, to be configured, but at lesat we > > > > would not have to invent yet another paravirt device ourselves, just > > > > use the existing framework. > > > > > > Not disagreeing, I'd just like to point out that the kernel can also > > > crash before the extra device (the pstore driver) is configured > > > (especially if the driver is built as a module). > > > > Boot phase crash is also a problem for kdump, but hopefully the boot > > phase crash will be found early and get fixed early. The run time > > problems are harder, it will still be helpful. > > > > I'm not a virt expert, but from my feeling comparint guest agent and > > pstore I would vote for guest agent, it is ready to work on now, no? > > For pstore I'm not sure how to make a pstore device for all guests. I > > know uefi guest can use its nvram, but introducing some general pstore > > sounds hard.. > > > > Nothing is stopping us from doing both, eventually. Care should be taken > on the management side to make it general enough. It should be designed > such that it can use guest-agent now, but in no way is bound to guest- > agent. We can decide later if we want to replace guest-agent with some > paravirt solution. > > Nothing is blocking guest-agent patches now, that I know of. Sounds a good idea, Drew. Thanks Dave > > Thanks, > drew ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 5:32 ` Dave Young 2016-11-14 9:47 ` Andrew Jones @ 2016-11-14 10:10 ` Daniel P. Berrange 2016-11-14 10:28 ` Paolo Bonzini 1 sibling, 1 reply; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-14 10:10 UTC (permalink / raw) To: Dave Young Cc: Laszlo Ersek, Andrew Jones, qiaonuohan, bhe, anderson, qemu-devel On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote: > On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > > On 11/09/16 15:47, Daniel P. Berrange wrote: > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > > >>>>>>>> > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages > > >>>>>>>> from Dave below: > > >>>>>>>> > > >>>>>>>> """ > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > > >>>>>>>> """ > > >>>>>>>> > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel > > >>>>>>>> list. If this is not correct please let me know. > > >>>>>>>> > > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > >>>>>>>> > > >>>>>>> > > >>>>>>> When the -kernel command line option is used, then it may be possible > > >>>>>>> to extract some information that could be used to supplement the memory > > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't > > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > > >>>>>>> Linux. > > >>>>>>> > > >>>>>>> Is there anything a guest userspace application could probe from e.g. > > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > > >>>>>>> feature providing that. > > >>>>>> > > >>>>>> I fully agree. This is exactly what I suggested too, independently, in > > >>>>>> the downstream thread, before arriving at this upstream thread. Let me > > >>>>>> quote that email: > > >>>>>> > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source > > >>>>>>> file: > > >>>>>>> > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows > > >>>>>>>> # using gdb to process the core file. > > >>>>>>>> # > > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several gigabytes > > >>>>>>>> # of RAM. This can happen for a large guest, or a > > >>>>>>>> # malicious guest pretending to be large. > > >>>>>>>> # > > >>>>>>>> # Also, paging=true has the following limitations: > > >>>>>>>> # > > >>>>>>>> # 1. The guest may be in a catastrophic state or can have corrupted > > >>>>>>>> # memory, which cannot be trusted > > >>>>>>>> # 2. The guest can be in real-mode even if paging is enabled. For > > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep state > > >>>>>>>> # goes in real-mode > > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > > >>>>>>>> # > > >>>>>>> > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > > >>>>>>> > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > > >>>>>>> not enlightened about the guest. > > >>>>>>> > > >>>>>>> If the additional information you are looking for can be retrieved > > >>>>>>> within the running Linux guest, using an appropriately privieleged > > >>>>>>> userspace process, then I would recommend considering an extension to > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > > >>>>>>> first invoke the guest agent (a process with root privileges running > > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest > > >>>>>>> agent command would return the information necessary to deal with > > >>>>>>> KASLR. Then the management layer would initiate the dump like always. > > >>>>>>> Finally, the extra information would be combined with (or placed > > >>>>>>> beside) the dump file in some way. > > >>>>>>> > > >>>>>>> So, this proposal would affect the guest agent and the management > > >>>>>>> layer (= libvirt). > > >>>>>> > > >>>>>> Given that we already dislike "paging=true", enlightening > > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > > >>>>> > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > > > Yes, I realized this a while after posting... > > > > >>>> So don't wait. Management software can make this query immediately > > >>>> after the guest agent goes live. The information needed won't change. > > > > ... and then figured this would solve the problem. > > > > >>> That doesn't help with trying to diagnose a crash during boot up, since > > >>> the guest agent isn't running till fairly late. I'm also concerned that > > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > > > I have no hard data, but from the recent Fedora and RHEL-7 guest > > installations I've done, it seems like qga is installed automatically. > > (Not sure if that's because Anaconda realizes it's installing the OS in > > a VM.) Once I made sure there was an appropriate virtio-serial config in > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > > immediately. > > > > >>> so reliance on the guest agent will mean the dump facility is no longer > > >>> reliably available. > > >>> > > >> > > >> It'd still be reliably available and useable during early boot, just like > > >> it is now, for kernels that don't use KASLR. This proposal is only > > >> attempting to *also* address KASLR kernels, for which there is currently > > >> no support whatsoever. Call it a best-effort. > > >> > > >> Of course we can get support for [probably] early boot and > > >> guest-agent-less guests using KASLR too if we introduce a paravirt > > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > > > There's a standard for persistent storage that is intended to allow > > > the kernel to dump out data at time of crash: > > > > > > https://lwn.net/Articles/434821/ > > > > > > and there's some recent patches to provide a QEMU backend. Could we > > > leverage that facility to get the data we need from the guest kernel ? > > > > > > Instead of only using pstore at time of crash, the kernel could see > > > that its running on KVM, and write out the paging data to pstore. So > > > when QEMU later generates a core dump, it can grab the corresponding > > > data from pstore backend ? > > > > > > Still requires an extra device, to be configured, but at lesat we > > > would not have to invent yet another paravirt device ourselves, just > > > use the existing framework. > > > > Not disagreeing, I'd just like to point out that the kernel can also > > crash before the extra device (the pstore driver) is configured > > (especially if the driver is built as a module). > > Boot phase crash is also a problem for kdump, but hopefully the boot > phase crash will be found early and get fixed early. The run time > problems are harder, it will still be helpful. > > I'm not a virt expert, but from my feeling comparint guest agent and > pstore I would vote for guest agent, it is ready to work on now, no? > For pstore I'm not sure how to make a pstore device for all guests. I > know uefi guest can use its nvram, but introducing some general pstore > sounds hard.. There's already patches posted to create a virtio-pstore device for QEMU, which is what led me to suggest this as an option: https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 10:10 ` Daniel P. Berrange @ 2016-11-14 10:28 ` Paolo Bonzini 2016-11-14 10:33 ` Daniel P. Berrange 0 siblings, 1 reply; 30+ messages in thread From: Paolo Bonzini @ 2016-11-14 10:28 UTC (permalink / raw) To: Daniel P. Berrange, Dave Young Cc: Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson, Laszlo Ersek On 14/11/2016 11:10, Daniel P. Berrange wrote: > There's already patches posted to create a virtio-pstore device for > QEMU, which is what led me to suggest this as an option: > > https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html It's also possible to use UEFI as a pstore backend. Paolo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 10:28 ` Paolo Bonzini @ 2016-11-14 10:33 ` Daniel P. Berrange 2016-11-14 11:08 ` Laszlo Ersek 2016-11-14 11:55 ` Paolo Bonzini 0 siblings, 2 replies; 30+ messages in thread From: Daniel P. Berrange @ 2016-11-14 10:33 UTC (permalink / raw) To: Paolo Bonzini Cc: Dave Young, Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson, Laszlo Ersek On Mon, Nov 14, 2016 at 11:28:04AM +0100, Paolo Bonzini wrote: > > > On 14/11/2016 11:10, Daniel P. Berrange wrote: > > There's already patches posted to create a virtio-pstore device for > > QEMU, which is what led me to suggest this as an option: > > > > https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html > > It's also possible to use UEFI as a pstore backend. Presumably that'll also require some QEMU patches to provide storage for UEFI's pstore ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 10:33 ` Daniel P. Berrange @ 2016-11-14 11:08 ` Laszlo Ersek 2016-11-14 11:55 ` Paolo Bonzini 1 sibling, 0 replies; 30+ messages in thread From: Laszlo Ersek @ 2016-11-14 11:08 UTC (permalink / raw) To: Daniel P. Berrange, Paolo Bonzini Cc: Dave Young, Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson On 11/14/16 11:33, Daniel P. Berrange wrote: > On Mon, Nov 14, 2016 at 11:28:04AM +0100, Paolo Bonzini wrote: >> >> >> On 14/11/2016 11:10, Daniel P. Berrange wrote: >>> There's already patches posted to create a virtio-pstore device for >>> QEMU, which is what led me to suggest this as an option: >>> >>> https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html >> >> It's also possible to use UEFI as a pstore backend. > > Presumably that'll also require some QEMU patches to provide storage > for UEFI's pstore ? Using UEFI non-volatile variables as a pstore backend is a guest kernel feature, and it already works transparently with OVMF utilizing QEMU's pflash device. If memory serves, the data to be written are broken into 1KB chunks, and saved as separate UEFI variables under a dedicated namespace GUID. https://bugzilla.redhat.com/show_bug.cgi?id=828497 (Private BZ -- I apologize to the non-RedHatter subscribers that read this.) (Also, not everyone has been enthusiastic about this feature: <https://bugzilla.redhat.com/show_bug.cgi?id=919485>.) Anyway, when I say "it works", I mean it works for the direct purpose of storing data (like saving dmesg at panic), and for retrieving data, from within the guest. (At a subsequent guest boot, possibly.) This is the scope of pstore in general, AIUI (see "Documentation/ABI/testing/pstore"). However, host-side insight into the OVMF/edk2 varstore format remains something we don't, and shouldn't, implement. In this regard, the UEFI variables that happen to contain pstore data are no different from other kinds of UEFI variables; they are equally opaque from the host side. (Unless we want to implement and maintain a large utility that reflects and tracks the multi-layer variable driver stack in edk2. "Unless" is rhetorical, we don't want that.) If host-side access is needed to the guest's phys-base / virt-base, then my first preference would be the guest agent (interrogated at guest startup), and my second preference would be virtio-pstore. I reckon virtio-pstore will take a new guest driver, and I suppose the host-side on-disk format is being designed for easy parsing. Thanks Laszlo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 10:33 ` Daniel P. Berrange 2016-11-14 11:08 ` Laszlo Ersek @ 2016-11-14 11:55 ` Paolo Bonzini 1 sibling, 0 replies; 30+ messages in thread From: Paolo Bonzini @ 2016-11-14 11:55 UTC (permalink / raw) To: Daniel P. Berrange Cc: Dave Young, Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson, Laszlo Ersek On 14/11/2016 11:33, Daniel P. Berrange wrote: > On Mon, Nov 14, 2016 at 11:28:04AM +0100, Paolo Bonzini wrote: >> >> >> On 14/11/2016 11:10, Daniel P. Berrange wrote: >>> There's already patches posted to create a virtio-pstore device for >>> QEMU, which is what led me to suggest this as an option: >>> >>> https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html >> >> It's also possible to use UEFI as a pstore backend. > > Presumably that'll also require some QEMU patches to provide storage > for UEFI's pstore ? That's just the UEFI variable store. But for some reason Fedora doesn't set CONFIG_EFI_VARS, so the next possibility is to use ACPI ERST. This would not require any change to guests, unlike virtio-pstore. Paolo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 10:40 ` Andrew Jones 2016-11-09 11:26 ` Laszlo Ersek @ 2016-11-09 15:28 ` Dave Anderson 2016-11-14 10:41 ` Paolo Bonzini 1 sibling, 1 reply; 30+ messages in thread From: Dave Anderson @ 2016-11-09 15:28 UTC (permalink / raw) To: Andrew Jones; +Cc: Dave Young, wency, qiaonuohan, lersek, qemu-devel, bhe ----- Original Message ----- > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > Hi, > > > > Latest linux kernel enabled kaslr to randomiz phys/virt memory > > addresses, we had some effort to support kexec/kdump so that crash > > utility can still works in case crashed kernel has kaslr enabled. > > > > But according to Dave Anderson virsh dump does not work, quoted messages > > from Dave below: > > > > """ > > with virsh dump, there's no way of even knowing that KASLR > > has randomized the kernel __START_KERNEL_map region, because there is no > > virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > vmcoreinfo data to compare against the vmlinux file symbol value. > > Unless virsh dump can export some basic virtual memory data, which > > they say it can't, I don't see how KASLR can ever be supported. > > """ > > > > I assume virsh dump is using qemu guest memory dump facility so it > > should be first addressed in qemu. Thus post this query to qemu devel > > list. If this is not correct please let me know. > > > > Could you qemu dump people make it work? Or we can not support virt dump > > as long as KASLR being enabled. Latest Fedora kernel has enabled it in > > x86_64. > > > > When the -kernel command line option is used, then it may be possible > to extract some information that could be used to supplement the memory > dump that dump-guest-memory provides. However, that would be a specific > use. In general, QEMU knows nothing about the guest kernel. It doesn't > know where it is in the disk image, and it doesn't even know if it's > Linux. > > Is there anything a guest userspace application could probe from e.g. > /proc that would work? If so, then the guest agent could gain a new > feature providing that. > > Thanks, > drew I'm not sure whether this "guest userspace agent" is still in play here, but if there were such a thing, it could theoretically do the same thing that crash currently does when running on a live system. Two basic necessities are are needed, whether running live or against a dumpfile: (1) the CONFIG_RANDOMIZE_BASE relocation value that modifies the kernel virtual address range compiled into the vmlinux file, which starts at the hardwired __START_KERNEL_map address. (2) the contents of the kernel's "phys_base" symbol. Both of those are available or calculatable from the contents of a kdump header. However, on a live system, it's done like this: - /proc/kallsyms is queried for the symbol value of "_text", which would be relocated if KASLR is in play. That value is compared against the "_text" symbol value compiled into the vmlinux file to determine the relocation value generated by CONFIG_RANDOMIZE_BASE. Given that relocation value, and before any kernel memory is accessed, crash goes in a backdoor into its embedded gdb module, and modifies the data structures of all kernel symbols, applying the relocation value. Once that's done, in order to read kernel symbols from the statically-mapped kernel region based at __START_KERNEL_map, it translates a (possibly relocated) kernel virtual address into a physical address like this: physical-address = virtual-address - __START_KERNEL_map + phys_base But it's a chicken-and-egg deal, because the contents of the "phys_base" symbol are needed to calculate the physical address, but it can't read the "phys_base" symbol contents without first knowing its contents. So on a live system, the "phys_base" is calculated by reading the "Kernel Code:" value from /proc/iomem, and then doing this: phys_base = [Kernel Code: value] - ["_text" symbol value] - __START_KERNEL_map So theoretically, the guest agent could read /proc/iomem and /proc/kallsyms for the information required. (I think...) Dave ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 15:28 ` Dave Anderson @ 2016-11-14 10:41 ` Paolo Bonzini 2016-11-15 14:41 ` Dave Anderson 0 siblings, 1 reply; 30+ messages in thread From: Paolo Bonzini @ 2016-11-14 10:41 UTC (permalink / raw) To: Dave Anderson, Andrew Jones Cc: bhe, Dave Young, qemu-devel, qiaonuohan, lersek On 09/11/2016 16:28, Dave Anderson wrote: > I'm not sure whether this "guest userspace agent" is still in play here, > but if there were such a thing, it could theoretically do the same > thing that crash currently does when running on a live system. > > Both of those are available or calculatable from the contents of > a kdump header. However, on a live system, it's done like this: > > - /proc/kallsyms is queried for the symbol value of "_text", which would > be relocated if KASLR is in play. That value is compared against the > "_text" symbol value compiled into the vmlinux file to determine the > relocation value generated by CONFIG_RANDOMIZE_BASE. > > [...] in order to read kernel symbols from the > statically-mapped kernel region based at __START_KERNEL_map, it > translates a (possibly relocated) kernel virtual address into a > physical address like this: > > physical-address = virtual-address - __START_KERNEL_map + phys_base > > But it's a chicken-and-egg deal, because the contents of the "phys_base" > symbol are needed to calculate the physical address, but it can't > read the "phys_base" symbol contents without first knowing its contents. > > So on a live system, the "phys_base" is calculated by reading > the "Kernel Code:" value from /proc/iomem, and then doing this: > > phys_base = [Kernel Code: value] - ["_text" symbol value] - __START_KERNEL_map ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Should there be parentheses around this? The physical-address formula above is equivalent to phys_base = physical-address - (virtual-address - __START_KERNEL_map) > > So theoretically, the guest agent could read /proc/iomem and /proc/kallsyms > for the information required. (I think...) Then yes, the guest-agent could add a command get-kernel-text-start with an output like: { 'virtual': 0xffffffff86000000, 'physical': 0xb6000000 } and libvirt can expose it to crash. In this case, phys_base would be 0xb0000000 if I did the math right, and the relocation value is obtained by comparing the "virtual" address with the vmlinux "_text". IIRC the guest agent runs as root, so reading /proc/iomem is not a problem. Paolo ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-14 10:41 ` Paolo Bonzini @ 2016-11-15 14:41 ` Dave Anderson 0 siblings, 0 replies; 30+ messages in thread From: Dave Anderson @ 2016-11-15 14:41 UTC (permalink / raw) To: Paolo Bonzini Cc: Andrew Jones, bhe, Dave Young, qemu-devel, qiaonuohan, lersek ----- Original Message ----- > > > On 09/11/2016 16:28, Dave Anderson wrote: > > I'm not sure whether this "guest userspace agent" is still in play here, > > but if there were such a thing, it could theoretically do the same > > thing that crash currently does when running on a live system. > > > > Both of those are available or calculatable from the contents of > > a kdump header. However, on a live system, it's done like this: > > > > - /proc/kallsyms is queried for the symbol value of "_text", which would > > be relocated if KASLR is in play. That value is compared against the > > "_text" symbol value compiled into the vmlinux file to determine the > > relocation value generated by CONFIG_RANDOMIZE_BASE. > > > > [...] in order to read kernel symbols from the > > statically-mapped kernel region based at __START_KERNEL_map, it > > translates a (possibly relocated) kernel virtual address into a > > physical address like this: > > > > physical-address = virtual-address - __START_KERNEL_map + phys_base > > > > But it's a chicken-and-egg deal, because the contents of the "phys_base" > > symbol are needed to calculate the physical address, but it can't > > read the "phys_base" symbol contents without first knowing its contents. > > > > So on a live system, the "phys_base" is calculated by reading > > the "Kernel Code:" value from /proc/iomem, and then doing this: > > > > phys_base = [Kernel Code: value] - ["_text" symbol value] - __START_KERNEL_map > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Should there be parentheses around this? Yes, sorry, that's correct -- that's what the code does, and what I meant to express... Dave > The physical-address formula above is equivalent to > > phys_base = physical-address - (virtual-address - __START_KERNEL_map) > > > > > So theoretically, the guest agent could read /proc/iomem and /proc/kallsyms > > for the information required. (I think...) > > Then yes, the guest-agent could add a command get-kernel-text-start with an output like: > > { 'virtual': 0xffffffff86000000, 'physical': 0xb6000000 } > > and libvirt can expose it to crash. In this case, phys_base would be 0xb0000000 > if I did the math right, and the relocation value is obtained by comparing the > "virtual" address with the vmlinux "_text". > > IIRC the guest agent runs as root, so reading /proc/iomem is not a problem. > > Paolo > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support 2016-11-09 3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young 2016-11-09 3:17 ` Dave Young 2016-11-09 10:40 ` Andrew Jones @ 2016-11-09 14:32 ` Dave Anderson 2 siblings, 0 replies; 30+ messages in thread From: Dave Anderson @ 2016-11-09 14:32 UTC (permalink / raw) To: Dave Young; +Cc: wency, qiaonuohan, lersek, qemu-devel, bhe ----- Original Message ----- > Hi, > > Latest linux kernel enabled kaslr to randomiz phys/virt memory > addresses, we had some effort to support kexec/kdump so that crash > utility can still works in case crashed kernel has kaslr enabled. > > But according to Dave Anderson virsh dump does not work, quoted messages > from Dave below: > > """ > with virsh dump, there's no way of even knowing that KASLR > has randomized the kernel __START_KERNEL_map region, because there is no > virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > vmcoreinfo data to compare against the vmlinux file symbol value. > Unless virsh dump can export some basic virtual memory data, which > they say it can't, I don't see how KASLR can ever be supported. > """ We also need the x86_64 phys_base value. As it is right now, virsh dump vmcores work by luck. It is presumed that the __START_KERNEL_map region is unmodified (i.e., what's in the vmlinux file), and the phys_base value is guessed by checking phys_base values from -16MB to +16MB in 1MB chunks. If the phys_base value is not one of those 32 possible values, the crash session will fail. Dave > > I assume virsh dump is using qemu guest memory dump facility so it > should be first addressed in qemu. Thus post this query to qemu devel > list. If this is not correct please let me know. > > Could you qemu dump people make it work? Or we can not support virt dump > as long as KASLR being enabled. Latest Fedora kernel has enabled it in > x86_64. > > Thanks > Dave > ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2016-11-16 2:48 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-09 3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young 2016-11-09 3:17 ` Dave Young 2016-11-09 3:58 ` Wen Congyang 2016-11-09 5:02 ` Dave Young 2016-11-09 7:42 ` Wen Congyang 2016-11-09 8:25 ` Dave Young 2016-11-09 14:36 ` Dave Anderson 2016-11-09 14:42 ` Daniel P. Berrange 2016-11-09 10:40 ` Andrew Jones 2016-11-09 11:26 ` Laszlo Ersek 2016-11-09 11:37 ` Daniel P. Berrange 2016-11-09 11:48 ` Andrew Jones 2016-11-09 11:58 ` Daniel P. Berrange 2016-11-09 12:20 ` Andrew Jones 2016-11-09 14:47 ` Daniel P. Berrange 2016-11-09 15:38 ` Laszlo Ersek 2016-11-09 16:01 ` Daniel P. Berrange 2016-11-14 10:27 ` Paolo Bonzini 2016-11-14 5:32 ` Dave Young 2016-11-14 9:47 ` Andrew Jones 2016-11-16 2:48 ` Dave Young 2016-11-14 10:10 ` Daniel P. Berrange 2016-11-14 10:28 ` Paolo Bonzini 2016-11-14 10:33 ` Daniel P. Berrange 2016-11-14 11:08 ` Laszlo Ersek 2016-11-14 11:55 ` Paolo Bonzini 2016-11-09 15:28 ` Dave Anderson 2016-11-14 10:41 ` Paolo Bonzini 2016-11-15 14:41 ` Dave Anderson 2016-11-09 14:32 ` Dave Anderson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.