All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
@ 2016-11-09  3:01 Dave Young
  2016-11-09  3:17 ` Dave Young
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Dave Young @ 2016-11-09  3:01 UTC (permalink / raw)
  To: wency, qiaonuohan; +Cc: lersek, anderson, qemu-devel, bhe

Hi,

Latest linux kernel enabled kaslr to randomiz phys/virt memory
addresses, we had some effort to support kexec/kdump so that crash
utility can still works in case crashed kernel has kaslr enabled.

But according to Dave Anderson virsh dump does not work, quoted messages
from Dave below:

"""
with virsh dump, there's no way of even knowing that KASLR
has randomized the kernel __START_KERNEL_map region, because there is no
virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
vmcoreinfo data to compare against the vmlinux file symbol value.
Unless virsh dump can export some basic virtual memory data, which
they say it can't, I don't see how KASLR can ever be supported.
"""

I assume virsh dump is using qemu guest memory dump facility so it
should be first addressed in qemu. Thus post this query to qemu devel
list. If this is not correct please let me know.

Could you qemu dump people make it work? Or we can not support virt dump
as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.

Thanks
Dave

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young
@ 2016-11-09  3:17 ` Dave Young
  2016-11-09  3:58   ` Wen Congyang
  2016-11-09 10:40 ` Andrew Jones
  2016-11-09 14:32 ` Dave Anderson
  2 siblings, 1 reply; 30+ messages in thread
From: Dave Young @ 2016-11-09  3:17 UTC (permalink / raw)
  To: wency; +Cc: lersek, anderson, qemu-devel, bhe

Drop qiaonuohan, seems the mail address is wrong..

On 11/09/16 at 11:01am, Dave Young wrote:
> Hi,
> 
> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> addresses, we had some effort to support kexec/kdump so that crash
> utility can still works in case crashed kernel has kaslr enabled.
> 
> But according to Dave Anderson virsh dump does not work, quoted messages
> from Dave below:
> 
> """
> with virsh dump, there's no way of even knowing that KASLR
> has randomized the kernel __START_KERNEL_map region, because there is no
> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> vmcoreinfo data to compare against the vmlinux file symbol value.
> Unless virsh dump can export some basic virtual memory data, which
> they say it can't, I don't see how KASLR can ever be supported.
> """
> 
> I assume virsh dump is using qemu guest memory dump facility so it
> should be first addressed in qemu. Thus post this query to qemu devel
> list. If this is not correct please let me know.
> 
> Could you qemu dump people make it work? Or we can not support virt dump
> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  3:17 ` Dave Young
@ 2016-11-09  3:58   ` Wen Congyang
  2016-11-09  5:02     ` Dave Young
  0 siblings, 1 reply; 30+ messages in thread
From: Wen Congyang @ 2016-11-09  3:58 UTC (permalink / raw)
  To: Dave Young; +Cc: lersek, anderson, qemu-devel, bhe

On 11/09/2016 11:17 AM, Dave Young wrote:
> Drop qiaonuohan, seems the mail address is wrong..
> 
> On 11/09/16 at 11:01am, Dave Young wrote:
>> Hi,
>>
>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
>> addresses, we had some effort to support kexec/kdump so that crash
>> utility can still works in case crashed kernel has kaslr enabled.
>>
>> But according to Dave Anderson virsh dump does not work, quoted messages
>> from Dave below:
>>
>> """
>> with virsh dump, there's no way of even knowing that KASLR
>> has randomized the kernel __START_KERNEL_map region, because there is no
>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
>> vmcoreinfo data to compare against the vmlinux file symbol value.
>> Unless virsh dump can export some basic virtual memory data, which
>> they say it can't, I don't see how KASLR can ever be supported.
>> """
>>
>> I assume virsh dump is using qemu guest memory dump facility so it
>> should be first addressed in qemu. Thus post this query to qemu devel
>> list. If this is not correct please let me know.

IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump'
uses migration to dump.

I think I should study kaslr first...

Thanks
Wen Congyang

>>
>> Could you qemu dump people make it work? Or we can not support virt dump
>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
>>
>> Thanks
>> Dave
> 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  3:58   ` Wen Congyang
@ 2016-11-09  5:02     ` Dave Young
  2016-11-09  7:42       ` Wen Congyang
  2016-11-09 14:36       ` Dave Anderson
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Young @ 2016-11-09  5:02 UTC (permalink / raw)
  To: Wen Congyang, anderson; +Cc: lersek, qemu-devel, bhe

On 11/09/16 at 11:58am, Wen Congyang wrote:
> On 11/09/2016 11:17 AM, Dave Young wrote:
> > Drop qiaonuohan, seems the mail address is wrong..
> > 
> > On 11/09/16 at 11:01am, Dave Young wrote:
> >> Hi,
> >>
> >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> >> addresses, we had some effort to support kexec/kdump so that crash
> >> utility can still works in case crashed kernel has kaslr enabled.
> >>
> >> But according to Dave Anderson virsh dump does not work, quoted messages
> >> from Dave below:
> >>
> >> """
> >> with virsh dump, there's no way of even knowing that KASLR
> >> has randomized the kernel __START_KERNEL_map region, because there is no
> >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> >> vmcoreinfo data to compare against the vmlinux file symbol value.
> >> Unless virsh dump can export some basic virtual memory data, which
> >> they say it can't, I don't see how KASLR can ever be supported.
> >> """
> >>
> >> I assume virsh dump is using qemu guest memory dump facility so it
> >> should be first addressed in qemu. Thus post this query to qemu devel
> >> list. If this is not correct please let me know.
> 
> IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump'
> uses migration to dump.

Do they need different fixes? Dave, I guess you mean --memory-only, but
could you clarify and confirm it?

> 
> I think I should study kaslr first...

Thanks for taking care of it.

> 
> Thanks
> Wen Congyang
> 
> >>
> >> Could you qemu dump people make it work? Or we can not support virt dump
> >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> >>
> >> Thanks
> >> Dave
> > 
> > 
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  5:02     ` Dave Young
@ 2016-11-09  7:42       ` Wen Congyang
  2016-11-09  8:25         ` Dave Young
  2016-11-09 14:36       ` Dave Anderson
  1 sibling, 1 reply; 30+ messages in thread
From: Wen Congyang @ 2016-11-09  7:42 UTC (permalink / raw)
  To: Dave Young, anderson; +Cc: lersek, qemu-devel, bhe

On 11/09/2016 01:02 PM, Dave Young wrote:
> On 11/09/16 at 11:58am, Wen Congyang wrote:
>> On 11/09/2016 11:17 AM, Dave Young wrote:
>>> Drop qiaonuohan, seems the mail address is wrong..
>>>
>>> On 11/09/16 at 11:01am, Dave Young wrote:
>>>> Hi,
>>>>
>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
>>>> addresses, we had some effort to support kexec/kdump so that crash
>>>> utility can still works in case crashed kernel has kaslr enabled.
>>>>
>>>> But according to Dave Anderson virsh dump does not work, quoted messages
>>>> from Dave below:
>>>>
>>>> """
>>>> with virsh dump, there's no way of even knowing that KASLR
>>>> has randomized the kernel __START_KERNEL_map region, because there is no
>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
>>>> Unless virsh dump can export some basic virtual memory data, which
>>>> they say it can't, I don't see how KASLR can ever be supported.
>>>> """
>>>>
>>>> I assume virsh dump is using qemu guest memory dump facility so it
>>>> should be first addressed in qemu. Thus post this query to qemu devel
>>>> list. If this is not correct please let me know.
>>
>> IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump'
>> uses migration to dump.
> 
> Do they need different fixes? Dave, I guess you mean --memory-only, but
> could you clarify and confirm it?
> 
>>
>> I think I should study kaslr first...
> 
> Thanks for taking care of it.

Can you give me the patch for kexec/kdump. I want to know what I need to do
for dump-guest-memory.

Thanks
Wen Congyang

> 
>>
>> Thanks
>> Wen Congyang
>>
>>>>
>>>> Could you qemu dump people make it work? Or we can not support virt dump
>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
>>>>
>>>> Thanks
>>>> Dave
>>>
>>>
>>>
>>
>>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  7:42       ` Wen Congyang
@ 2016-11-09  8:25         ` Dave Young
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Young @ 2016-11-09  8:25 UTC (permalink / raw)
  To: Wen Congyang; +Cc: anderson, lersek, qemu-devel, bhe

On 11/09/16 at 03:42pm, Wen Congyang wrote:
> On 11/09/2016 01:02 PM, Dave Young wrote:
> > On 11/09/16 at 11:58am, Wen Congyang wrote:
> >> On 11/09/2016 11:17 AM, Dave Young wrote:
> >>> Drop qiaonuohan, seems the mail address is wrong..
> >>>
> >>> On 11/09/16 at 11:01am, Dave Young wrote:
> >>>> Hi,
> >>>>
> >>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> >>>> addresses, we had some effort to support kexec/kdump so that crash
> >>>> utility can still works in case crashed kernel has kaslr enabled.
> >>>>
> >>>> But according to Dave Anderson virsh dump does not work, quoted messages
> >>>> from Dave below:
> >>>>
> >>>> """
> >>>> with virsh dump, there's no way of even knowing that KASLR
> >>>> has randomized the kernel __START_KERNEL_map region, because there is no
> >>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> >>>> vmcoreinfo data to compare against the vmlinux file symbol value.
> >>>> Unless virsh dump can export some basic virtual memory data, which
> >>>> they say it can't, I don't see how KASLR can ever be supported.
> >>>> """
> >>>>
> >>>> I assume virsh dump is using qemu guest memory dump facility so it
> >>>> should be first addressed in qemu. Thus post this query to qemu devel
> >>>> list. If this is not correct please let me know.
> >>
> >> IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump'
> >> uses migration to dump.
> > 
> > Do they need different fixes? Dave, I guess you mean --memory-only, but
> > could you clarify and confirm it?
> > 
> >>
> >> I think I should study kaslr first...
> > 
> > Thanks for taking care of it.
> 
> Can you give me the patch for kexec/kdump. I want to know what I need to do
> for dump-guest-memory.

AFAIK, there are below patches for kexec/kdump userspace:
kexec-tools, git commit:
commit 9f62cbddddfc93d78d9aafbddf3e1208cb242f7b
Author: Thomas Garnier <thgarnie@google.com>
Date:   Tue Sep 13 15:10:05 2016 +0800

    kexec/arch/i386: Add support for KASLR memory randomization

Originally Baoquan He posted below patches to export vmcoreinfo for some
kernel fields:
http://lists.infradead.org/pipermail/kexec/2016-September/017191.html
But later it was dropped, we finally do it in userspace with several
makedumpfile patches:
http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
http://lists.infradead.org/pipermail/kexec/2016-October/017539.html
http://lists.infradead.org/pipermail/kexec/2016-October/017541.html

For virsh dumped vmcore it should manage to export some infomation so that
crash utility can use. I would leave Dave to provide more information
what he needs because the goal is userspace utility like crash can
correctly analysis the vmcore. 

> 
> Thanks
> Wen Congyang
> 
> > 
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>>
> >>>> Could you qemu dump people make it work? Or we can not support virt dump
> >>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> >>>>
> >>>> Thanks
> >>>> Dave
> >>>
> >>>
> >>>
> >>
> >>
> >>
> > 
> > 
> > .
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young
  2016-11-09  3:17 ` Dave Young
@ 2016-11-09 10:40 ` Andrew Jones
  2016-11-09 11:26   ` Laszlo Ersek
  2016-11-09 15:28   ` Dave Anderson
  2016-11-09 14:32 ` Dave Anderson
  2 siblings, 2 replies; 30+ messages in thread
From: Andrew Jones @ 2016-11-09 10:40 UTC (permalink / raw)
  To: Dave Young; +Cc: wency, qiaonuohan, anderson, lersek, qemu-devel, bhe

On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> Hi,
> 
> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> addresses, we had some effort to support kexec/kdump so that crash
> utility can still works in case crashed kernel has kaslr enabled.
> 
> But according to Dave Anderson virsh dump does not work, quoted messages
> from Dave below:
> 
> """
> with virsh dump, there's no way of even knowing that KASLR
> has randomized the kernel __START_KERNEL_map region, because there is no
> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> vmcoreinfo data to compare against the vmlinux file symbol value.
> Unless virsh dump can export some basic virtual memory data, which
> they say it can't, I don't see how KASLR can ever be supported.
> """
> 
> I assume virsh dump is using qemu guest memory dump facility so it
> should be first addressed in qemu. Thus post this query to qemu devel
> list. If this is not correct please let me know.
> 
> Could you qemu dump people make it work? Or we can not support virt dump
> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
>

When the -kernel command line option is used, then it may be possible
to extract some information that could be used to supplement the memory
dump that dump-guest-memory provides. However, that would be a specific
use. In general, QEMU knows nothing about the guest kernel. It doesn't
know where it is in the disk image, and it doesn't even know if it's
Linux.

Is there anything a guest userspace application could probe from e.g.
/proc that would work? If so, then the guest agent could gain a new
feature providing that.

Thanks,
drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 10:40 ` Andrew Jones
@ 2016-11-09 11:26   ` Laszlo Ersek
  2016-11-09 11:37     ` Daniel P. Berrange
  2016-11-09 15:28   ` Dave Anderson
  1 sibling, 1 reply; 30+ messages in thread
From: Laszlo Ersek @ 2016-11-09 11:26 UTC (permalink / raw)
  To: Andrew Jones, Dave Young; +Cc: wency, qiaonuohan, anderson, qemu-devel, bhe

On 11/09/16 11:40, Andrew Jones wrote:
> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
>> Hi,
>>
>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
>> addresses, we had some effort to support kexec/kdump so that crash
>> utility can still works in case crashed kernel has kaslr enabled.
>>
>> But according to Dave Anderson virsh dump does not work, quoted messages
>> from Dave below:
>>
>> """
>> with virsh dump, there's no way of even knowing that KASLR
>> has randomized the kernel __START_KERNEL_map region, because there is no
>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
>> vmcoreinfo data to compare against the vmlinux file symbol value.
>> Unless virsh dump can export some basic virtual memory data, which
>> they say it can't, I don't see how KASLR can ever be supported.
>> """
>>
>> I assume virsh dump is using qemu guest memory dump facility so it
>> should be first addressed in qemu. Thus post this query to qemu devel
>> list. If this is not correct please let me know.
>>
>> Could you qemu dump people make it work? Or we can not support virt dump
>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
>>
> 
> When the -kernel command line option is used, then it may be possible
> to extract some information that could be used to supplement the memory
> dump that dump-guest-memory provides. However, that would be a specific
> use. In general, QEMU knows nothing about the guest kernel. It doesn't
> know where it is in the disk image, and it doesn't even know if it's
> Linux.
> 
> Is there anything a guest userspace application could probe from e.g.
> /proc that would work? If so, then the guest agent could gain a new
> feature providing that.

I fully agree. This is exactly what I suggested too, independently, in
the downstream thread, before arriving at this upstream thread. Let me
quote that email:

On 11/09/16 12:09, Laszlo Ersek wrote:
> [...] the dump-guest-memory QEMU command supports an option called
> "paging". Here's its documentation, from the "qapi-schema.json" source
> file:
>
>> # @paging: if true, do paging to get guest's memory mapping. This allows
>> #          using gdb to process the core file.
>> #
>> #          IMPORTANT: this option can make QEMU allocate several gigabytes
>> #                     of RAM. This can happen for a large guest, or a
>> #                     malicious guest pretending to be large.
>> #
>> #          Also, paging=true has the following limitations:
>> #
>> #             1. The guest may be in a catastrophic state or can have corrupted
>> #                memory, which cannot be trusted
>> #             2. The guest can be in real-mode even if paging is enabled. For
>> #                example, the guest uses ACPI to sleep, and ACPI sleep state
>> #                goes in real-mode
>> #             3. Currently only supported on i386 and x86_64.
>> #
>
> "virsh dump --memory-only" sets paging=false, for obvious reasons.
>
> [...] the dump-guest-memory command provides a raw snapshot of the
> virtual machine's memory (and of the registers of the VCPUs); it is
> not enlightened about the guest.
>
> If the additional information you are looking for can be retrieved
> within the running Linux guest, using an appropriately privieleged
> userspace process, then I would recommend considering an extension to
> the qemu guest agent. The management layer (libvirt, [...]) could
> first invoke the guest agent (a process with root privileges running
> in the guest) from the host side, through virtio-serial. The new guest
> agent command would return the information necessary to deal with
> KASLR. Then the management layer would initiate the dump like always.
> Finally, the extra information would be combined with (or placed
> beside) the dump file in some way.
>
> So, this proposal would affect the guest agent and the management
> layer (= libvirt).

Given that we already dislike "paging=true", enlightening
dump-guest-memory with even more guest-specific insight is the wrong
approach, IMO. That kind of knowledge belongs to the guest agent.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 11:26   ` Laszlo Ersek
@ 2016-11-09 11:37     ` Daniel P. Berrange
  2016-11-09 11:48       ` Andrew Jones
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-09 11:37 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Andrew Jones, Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> On 11/09/16 11:40, Andrew Jones wrote:
> > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> >> Hi,
> >>
> >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> >> addresses, we had some effort to support kexec/kdump so that crash
> >> utility can still works in case crashed kernel has kaslr enabled.
> >>
> >> But according to Dave Anderson virsh dump does not work, quoted messages
> >> from Dave below:
> >>
> >> """
> >> with virsh dump, there's no way of even knowing that KASLR
> >> has randomized the kernel __START_KERNEL_map region, because there is no
> >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> >> vmcoreinfo data to compare against the vmlinux file symbol value.
> >> Unless virsh dump can export some basic virtual memory data, which
> >> they say it can't, I don't see how KASLR can ever be supported.
> >> """
> >>
> >> I assume virsh dump is using qemu guest memory dump facility so it
> >> should be first addressed in qemu. Thus post this query to qemu devel
> >> list. If this is not correct please let me know.
> >>
> >> Could you qemu dump people make it work? Or we can not support virt dump
> >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> >>
> > 
> > When the -kernel command line option is used, then it may be possible
> > to extract some information that could be used to supplement the memory
> > dump that dump-guest-memory provides. However, that would be a specific
> > use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > know where it is in the disk image, and it doesn't even know if it's
> > Linux.
> > 
> > Is there anything a guest userspace application could probe from e.g.
> > /proc that would work? If so, then the guest agent could gain a new
> > feature providing that.
> 
> I fully agree. This is exactly what I suggested too, independently, in
> the downstream thread, before arriving at this upstream thread. Let me
> quote that email:
> 
> On 11/09/16 12:09, Laszlo Ersek wrote:
> > [...] the dump-guest-memory QEMU command supports an option called
> > "paging". Here's its documentation, from the "qapi-schema.json" source
> > file:
> >
> >> # @paging: if true, do paging to get guest's memory mapping. This allows
> >> #          using gdb to process the core file.
> >> #
> >> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> >> #                     of RAM. This can happen for a large guest, or a
> >> #                     malicious guest pretending to be large.
> >> #
> >> #          Also, paging=true has the following limitations:
> >> #
> >> #             1. The guest may be in a catastrophic state or can have corrupted
> >> #                memory, which cannot be trusted
> >> #             2. The guest can be in real-mode even if paging is enabled. For
> >> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> >> #                goes in real-mode
> >> #             3. Currently only supported on i386 and x86_64.
> >> #
> >
> > "virsh dump --memory-only" sets paging=false, for obvious reasons.
> >
> > [...] the dump-guest-memory command provides a raw snapshot of the
> > virtual machine's memory (and of the registers of the VCPUs); it is
> > not enlightened about the guest.
> >
> > If the additional information you are looking for can be retrieved
> > within the running Linux guest, using an appropriately privieleged
> > userspace process, then I would recommend considering an extension to
> > the qemu guest agent. The management layer (libvirt, [...]) could
> > first invoke the guest agent (a process with root privileges running
> > in the guest) from the host side, through virtio-serial. The new guest
> > agent command would return the information necessary to deal with
> > KASLR. Then the management layer would initiate the dump like always.
> > Finally, the extra information would be combined with (or placed
> > beside) the dump file in some way.
> >
> > So, this proposal would affect the guest agent and the management
> > layer (= libvirt).
> 
> Given that we already dislike "paging=true", enlightening
> dump-guest-memory with even more guest-specific insight is the wrong
> approach, IMO. That kind of knowledge belongs to the guest agent.

If you're trying to debug a hung/panicked guest, then using a guest
agent to fetch info is a complete non-starter as it'll be dead.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 11:37     ` Daniel P. Berrange
@ 2016-11-09 11:48       ` Andrew Jones
  2016-11-09 11:58         ` Daniel P. Berrange
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Jones @ 2016-11-09 11:48 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > On 11/09/16 11:40, Andrew Jones wrote:
> > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > >> Hi,
> > >>
> > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > >> addresses, we had some effort to support kexec/kdump so that crash
> > >> utility can still works in case crashed kernel has kaslr enabled.
> > >>
> > >> But according to Dave Anderson virsh dump does not work, quoted messages
> > >> from Dave below:
> > >>
> > >> """
> > >> with virsh dump, there's no way of even knowing that KASLR
> > >> has randomized the kernel __START_KERNEL_map region, because there is no
> > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > >> vmcoreinfo data to compare against the vmlinux file symbol value.
> > >> Unless virsh dump can export some basic virtual memory data, which
> > >> they say it can't, I don't see how KASLR can ever be supported.
> > >> """
> > >>
> > >> I assume virsh dump is using qemu guest memory dump facility so it
> > >> should be first addressed in qemu. Thus post this query to qemu devel
> > >> list. If this is not correct please let me know.
> > >>
> > >> Could you qemu dump people make it work? Or we can not support virt dump
> > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > >>
> > > 
> > > When the -kernel command line option is used, then it may be possible
> > > to extract some information that could be used to supplement the memory
> > > dump that dump-guest-memory provides. However, that would be a specific
> > > use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > > know where it is in the disk image, and it doesn't even know if it's
> > > Linux.
> > > 
> > > Is there anything a guest userspace application could probe from e.g.
> > > /proc that would work? If so, then the guest agent could gain a new
> > > feature providing that.
> > 
> > I fully agree. This is exactly what I suggested too, independently, in
> > the downstream thread, before arriving at this upstream thread. Let me
> > quote that email:
> > 
> > On 11/09/16 12:09, Laszlo Ersek wrote:
> > > [...] the dump-guest-memory QEMU command supports an option called
> > > "paging". Here's its documentation, from the "qapi-schema.json" source
> > > file:
> > >
> > >> # @paging: if true, do paging to get guest's memory mapping. This allows
> > >> #          using gdb to process the core file.
> > >> #
> > >> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > >> #                     of RAM. This can happen for a large guest, or a
> > >> #                     malicious guest pretending to be large.
> > >> #
> > >> #          Also, paging=true has the following limitations:
> > >> #
> > >> #             1. The guest may be in a catastrophic state or can have corrupted
> > >> #                memory, which cannot be trusted
> > >> #             2. The guest can be in real-mode even if paging is enabled. For
> > >> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > >> #                goes in real-mode
> > >> #             3. Currently only supported on i386 and x86_64.
> > >> #
> > >
> > > "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > >
> > > [...] the dump-guest-memory command provides a raw snapshot of the
> > > virtual machine's memory (and of the registers of the VCPUs); it is
> > > not enlightened about the guest.
> > >
> > > If the additional information you are looking for can be retrieved
> > > within the running Linux guest, using an appropriately privieleged
> > > userspace process, then I would recommend considering an extension to
> > > the qemu guest agent. The management layer (libvirt, [...]) could
> > > first invoke the guest agent (a process with root privileges running
> > > in the guest) from the host side, through virtio-serial. The new guest
> > > agent command would return the information necessary to deal with
> > > KASLR. Then the management layer would initiate the dump like always.
> > > Finally, the extra information would be combined with (or placed
> > > beside) the dump file in some way.
> > >
> > > So, this proposal would affect the guest agent and the management
> > > layer (= libvirt).
> > 
> > Given that we already dislike "paging=true", enlightening
> > dump-guest-memory with even more guest-specific insight is the wrong
> > approach, IMO. That kind of knowledge belongs to the guest agent.
> 
> If you're trying to debug a hung/panicked guest, then using a guest
> agent to fetch info is a complete non-starter as it'll be dead.

So don't wait. Management software can make this query immediately
after the guest agent goes live. The information needed won't change.

drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 11:48       ` Andrew Jones
@ 2016-11-09 11:58         ` Daniel P. Berrange
  2016-11-09 12:20           ` Andrew Jones
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-09 11:58 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > > On 11/09/16 11:40, Andrew Jones wrote:
> > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > > >> Hi,
> > > >>
> > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > > >> addresses, we had some effort to support kexec/kdump so that crash
> > > >> utility can still works in case crashed kernel has kaslr enabled.
> > > >>
> > > >> But according to Dave Anderson virsh dump does not work, quoted messages
> > > >> from Dave below:
> > > >>
> > > >> """
> > > >> with virsh dump, there's no way of even knowing that KASLR
> > > >> has randomized the kernel __START_KERNEL_map region, because there is no
> > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > > >> vmcoreinfo data to compare against the vmlinux file symbol value.
> > > >> Unless virsh dump can export some basic virtual memory data, which
> > > >> they say it can't, I don't see how KASLR can ever be supported.
> > > >> """
> > > >>
> > > >> I assume virsh dump is using qemu guest memory dump facility so it
> > > >> should be first addressed in qemu. Thus post this query to qemu devel
> > > >> list. If this is not correct please let me know.
> > > >>
> > > >> Could you qemu dump people make it work? Or we can not support virt dump
> > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > > >>
> > > > 
> > > > When the -kernel command line option is used, then it may be possible
> > > > to extract some information that could be used to supplement the memory
> > > > dump that dump-guest-memory provides. However, that would be a specific
> > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > > > know where it is in the disk image, and it doesn't even know if it's
> > > > Linux.
> > > > 
> > > > Is there anything a guest userspace application could probe from e.g.
> > > > /proc that would work? If so, then the guest agent could gain a new
> > > > feature providing that.
> > > 
> > > I fully agree. This is exactly what I suggested too, independently, in
> > > the downstream thread, before arriving at this upstream thread. Let me
> > > quote that email:
> > > 
> > > On 11/09/16 12:09, Laszlo Ersek wrote:
> > > > [...] the dump-guest-memory QEMU command supports an option called
> > > > "paging". Here's its documentation, from the "qapi-schema.json" source
> > > > file:
> > > >
> > > >> # @paging: if true, do paging to get guest's memory mapping. This allows
> > > >> #          using gdb to process the core file.
> > > >> #
> > > >> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > > >> #                     of RAM. This can happen for a large guest, or a
> > > >> #                     malicious guest pretending to be large.
> > > >> #
> > > >> #          Also, paging=true has the following limitations:
> > > >> #
> > > >> #             1. The guest may be in a catastrophic state or can have corrupted
> > > >> #                memory, which cannot be trusted
> > > >> #             2. The guest can be in real-mode even if paging is enabled. For
> > > >> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > > >> #                goes in real-mode
> > > >> #             3. Currently only supported on i386 and x86_64.
> > > >> #
> > > >
> > > > "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > > >
> > > > [...] the dump-guest-memory command provides a raw snapshot of the
> > > > virtual machine's memory (and of the registers of the VCPUs); it is
> > > > not enlightened about the guest.
> > > >
> > > > If the additional information you are looking for can be retrieved
> > > > within the running Linux guest, using an appropriately privieleged
> > > > userspace process, then I would recommend considering an extension to
> > > > the qemu guest agent. The management layer (libvirt, [...]) could
> > > > first invoke the guest agent (a process with root privileges running
> > > > in the guest) from the host side, through virtio-serial. The new guest
> > > > agent command would return the information necessary to deal with
> > > > KASLR. Then the management layer would initiate the dump like always.
> > > > Finally, the extra information would be combined with (or placed
> > > > beside) the dump file in some way.
> > > >
> > > > So, this proposal would affect the guest agent and the management
> > > > layer (= libvirt).
> > > 
> > > Given that we already dislike "paging=true", enlightening
> > > dump-guest-memory with even more guest-specific insight is the wrong
> > > approach, IMO. That kind of knowledge belongs to the guest agent.
> > 
> > If you're trying to debug a hung/panicked guest, then using a guest
> > agent to fetch info is a complete non-starter as it'll be dead.
> 
> So don't wait. Management software can make this query immediately
> after the guest agent goes live. The information needed won't change.

That doesn't help with trying to diagnose a crash during boot up, since
the guest agent isn't running till fairly late. I'm also concerned that
the QEMU guest agent is likely to be far from widely deployed in guests,
so reliance on the guest agent will mean the dump facility is no longer
reliably available.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 11:58         ` Daniel P. Berrange
@ 2016-11-09 12:20           ` Andrew Jones
  2016-11-09 14:47             ` Daniel P. Berrange
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Jones @ 2016-11-09 12:20 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> > On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > > > On 11/09/16 11:40, Andrew Jones wrote:
> > > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > > > >> Hi,
> > > > >>
> > > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > > > >> addresses, we had some effort to support kexec/kdump so that crash
> > > > >> utility can still works in case crashed kernel has kaslr enabled.
> > > > >>
> > > > >> But according to Dave Anderson virsh dump does not work, quoted messages
> > > > >> from Dave below:
> > > > >>
> > > > >> """
> > > > >> with virsh dump, there's no way of even knowing that KASLR
> > > > >> has randomized the kernel __START_KERNEL_map region, because there is no
> > > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > > > >> vmcoreinfo data to compare against the vmlinux file symbol value.
> > > > >> Unless virsh dump can export some basic virtual memory data, which
> > > > >> they say it can't, I don't see how KASLR can ever be supported.
> > > > >> """
> > > > >>
> > > > >> I assume virsh dump is using qemu guest memory dump facility so it
> > > > >> should be first addressed in qemu. Thus post this query to qemu devel
> > > > >> list. If this is not correct please let me know.
> > > > >>
> > > > >> Could you qemu dump people make it work? Or we can not support virt dump
> > > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > > > >>
> > > > > 
> > > > > When the -kernel command line option is used, then it may be possible
> > > > > to extract some information that could be used to supplement the memory
> > > > > dump that dump-guest-memory provides. However, that would be a specific
> > > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > > > > know where it is in the disk image, and it doesn't even know if it's
> > > > > Linux.
> > > > > 
> > > > > Is there anything a guest userspace application could probe from e.g.
> > > > > /proc that would work? If so, then the guest agent could gain a new
> > > > > feature providing that.
> > > > 
> > > > I fully agree. This is exactly what I suggested too, independently, in
> > > > the downstream thread, before arriving at this upstream thread. Let me
> > > > quote that email:
> > > > 
> > > > On 11/09/16 12:09, Laszlo Ersek wrote:
> > > > > [...] the dump-guest-memory QEMU command supports an option called
> > > > > "paging". Here's its documentation, from the "qapi-schema.json" source
> > > > > file:
> > > > >
> > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows
> > > > >> #          using gdb to process the core file.
> > > > >> #
> > > > >> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > > > >> #                     of RAM. This can happen for a large guest, or a
> > > > >> #                     malicious guest pretending to be large.
> > > > >> #
> > > > >> #          Also, paging=true has the following limitations:
> > > > >> #
> > > > >> #             1. The guest may be in a catastrophic state or can have corrupted
> > > > >> #                memory, which cannot be trusted
> > > > >> #             2. The guest can be in real-mode even if paging is enabled. For
> > > > >> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > > > >> #                goes in real-mode
> > > > >> #             3. Currently only supported on i386 and x86_64.
> > > > >> #
> > > > >
> > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > > > >
> > > > > [...] the dump-guest-memory command provides a raw snapshot of the
> > > > > virtual machine's memory (and of the registers of the VCPUs); it is
> > > > > not enlightened about the guest.
> > > > >
> > > > > If the additional information you are looking for can be retrieved
> > > > > within the running Linux guest, using an appropriately privieleged
> > > > > userspace process, then I would recommend considering an extension to
> > > > > the qemu guest agent. The management layer (libvirt, [...]) could
> > > > > first invoke the guest agent (a process with root privileges running
> > > > > in the guest) from the host side, through virtio-serial. The new guest
> > > > > agent command would return the information necessary to deal with
> > > > > KASLR. Then the management layer would initiate the dump like always.
> > > > > Finally, the extra information would be combined with (or placed
> > > > > beside) the dump file in some way.
> > > > >
> > > > > So, this proposal would affect the guest agent and the management
> > > > > layer (= libvirt).
> > > > 
> > > > Given that we already dislike "paging=true", enlightening
> > > > dump-guest-memory with even more guest-specific insight is the wrong
> > > > approach, IMO. That kind of knowledge belongs to the guest agent.
> > > 
> > > If you're trying to debug a hung/panicked guest, then using a guest
> > > agent to fetch info is a complete non-starter as it'll be dead.
> > 
> > So don't wait. Management software can make this query immediately
> > after the guest agent goes live. The information needed won't change.
> 
> That doesn't help with trying to diagnose a crash during boot up, since
> the guest agent isn't running till fairly late. I'm also concerned that
> the QEMU guest agent is likely to be far from widely deployed in guests,
> so reliance on the guest agent will mean the dump facility is no longer
> reliably available.
>

It'd still be reliably available and useable during early boot, just like
it is now, for kernels that don't use KASLR. This proposal is only
attempting to *also* address KASLR kernels, for which there is currently
no support whatsoever. Call it a best-effort.

Of course we can get support for [probably] early boot and
guest-agent-less guests using KASLR too if we introduce a paravirt
solution, requiring guest kernel and KVM changes. Is it worth it?

Thanks,
drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young
  2016-11-09  3:17 ` Dave Young
  2016-11-09 10:40 ` Andrew Jones
@ 2016-11-09 14:32 ` Dave Anderson
  2 siblings, 0 replies; 30+ messages in thread
From: Dave Anderson @ 2016-11-09 14:32 UTC (permalink / raw)
  To: Dave Young; +Cc: wency, qiaonuohan, lersek, qemu-devel, bhe



----- Original Message -----
> Hi,
> 
> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> addresses, we had some effort to support kexec/kdump so that crash
> utility can still works in case crashed kernel has kaslr enabled.
> 
> But according to Dave Anderson virsh dump does not work, quoted messages
> from Dave below:
> 
> """
> with virsh dump, there's no way of even knowing that KASLR
> has randomized the kernel __START_KERNEL_map region, because there is no
> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> vmcoreinfo data to compare against the vmlinux file symbol value.
> Unless virsh dump can export some basic virtual memory data, which
> they say it can't, I don't see how KASLR can ever be supported.
> """

We also need the x86_64 phys_base value.

As it is right now, virsh dump vmcores work by luck.  It is presumed that
the __START_KERNEL_map region is unmodified (i.e., what's in the vmlinux file),
and the phys_base value is guessed by checking phys_base values from 
-16MB to +16MB in 1MB chunks.  If the phys_base value is not one of those
32 possible values, the crash session will fail.

Dave


> 
> I assume virsh dump is using qemu guest memory dump facility so it
> should be first addressed in qemu. Thus post this query to qemu devel
> list. If this is not correct please let me know.
> 
> Could you qemu dump people make it work? Or we can not support virt dump
> as long as KASLR being enabled. Latest Fedora kernel has enabled it in
> x86_64.
> 
> Thanks
> Dave
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09  5:02     ` Dave Young
  2016-11-09  7:42       ` Wen Congyang
@ 2016-11-09 14:36       ` Dave Anderson
  2016-11-09 14:42         ` Daniel P. Berrange
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Anderson @ 2016-11-09 14:36 UTC (permalink / raw)
  To: Dave Young; +Cc: Wen Congyang, lersek, qemu-devel, bhe



----- Original Message -----
> On 11/09/16 at 11:58am, Wen Congyang wrote:
> > On 11/09/2016 11:17 AM, Dave Young wrote:
> > > Drop qiaonuohan, seems the mail address is wrong..
> > > 
> > > On 11/09/16 at 11:01am, Dave Young wrote:
> > >> Hi,
> > >>
> > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > >> addresses, we had some effort to support kexec/kdump so that crash
> > >> utility can still works in case crashed kernel has kaslr enabled.
> > >>
> > >> But according to Dave Anderson virsh dump does not work, quoted messages
> > >> from Dave below:
> > >>
> > >> """
> > >> with virsh dump, there's no way of even knowing that KASLR
> > >> has randomized the kernel __START_KERNEL_map region, because there is no
> > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > >> vmcoreinfo data to compare against the vmlinux file symbol value.
> > >> Unless virsh dump can export some basic virtual memory data, which
> > >> they say it can't, I don't see how KASLR can ever be supported.
> > >> """
> > >>
> > >> I assume virsh dump is using qemu guest memory dump facility so it
> > >> should be first addressed in qemu. Thus post this query to qemu devel
> > >> list. If this is not correct please let me know.
> > 
> > IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump'
> > uses migration to dump.
> 
> Do they need different fixes? Dave, I guess you mean --memory-only, but
> could you clarify and confirm it?

As I understand it, the "--memory-only" option uses a new "dump-guest-memory"
QEMU monitor command that creates an ELF kdump vmcore clone. 

Dave


> 
> > 
> > I think I should study kaslr first...
> 
> Thanks for taking care of it.
> 
> > 
> > Thanks
> > Wen Congyang
> > 
> > >>
> > >> Could you qemu dump people make it work? Or we can not support virt dump
> > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in
> > >> x86_64.
> > >>
> > >> Thanks
> > >> Dave
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 14:36       ` Dave Anderson
@ 2016-11-09 14:42         ` Daniel P. Berrange
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-09 14:42 UTC (permalink / raw)
  To: Dave Anderson; +Cc: Dave Young, bhe, lersek, qemu-devel

On Wed, Nov 09, 2016 at 09:36:08AM -0500, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
> > On 11/09/16 at 11:58am, Wen Congyang wrote:
> > > On 11/09/2016 11:17 AM, Dave Young wrote:
> > > > Drop qiaonuohan, seems the mail address is wrong..
> > > > 
> > > > On 11/09/16 at 11:01am, Dave Young wrote:
> > > >> Hi,
> > > >>
> > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > > >> addresses, we had some effort to support kexec/kdump so that crash
> > > >> utility can still works in case crashed kernel has kaslr enabled.
> > > >>
> > > >> But according to Dave Anderson virsh dump does not work, quoted messages
> > > >> from Dave below:
> > > >>
> > > >> """
> > > >> with virsh dump, there's no way of even knowing that KASLR
> > > >> has randomized the kernel __START_KERNEL_map region, because there is no
> > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > > >> vmcoreinfo data to compare against the vmlinux file symbol value.
> > > >> Unless virsh dump can export some basic virtual memory data, which
> > > >> they say it can't, I don't see how KASLR can ever be supported.
> > > >> """
> > > >>
> > > >> I assume virsh dump is using qemu guest memory dump facility so it
> > > >> should be first addressed in qemu. Thus post this query to qemu devel
> > > >> list. If this is not correct please let me know.
> > > 
> > > IIRC, 'virsh dump --memory-only' uses dump-guest-memory, and 'virsh dump'
> > > uses migration to dump.
> > 
> > Do they need different fixes? Dave, I guess you mean --memory-only, but
> > could you clarify and confirm it?
> 
> As I understand it, the "--memory-only" option uses a new "dump-guest-memory"
> QEMU monitor command that creates an ELF kdump vmcore clone.

IIRC, the use of the traditional 'virsh dump' (which just splats out the
QEMU migration data stream) is no longer supported with crash and everyone
should be using the --memory-only flag to ensure the ELF format core.

IOW, I think we can just ignore the historical migration based dump and
focus exclusively on the dump-guest-memory based impl.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 12:20           ` Andrew Jones
@ 2016-11-09 14:47             ` Daniel P. Berrange
  2016-11-09 15:38               ` Laszlo Ersek
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-09 14:47 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Laszlo Ersek, Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> > On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> > > On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > > > On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > > > > On 11/09/16 11:40, Andrew Jones wrote:
> > > > > > On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > > > > >> Hi,
> > > > > >>
> > > > > >> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > > > > >> addresses, we had some effort to support kexec/kdump so that crash
> > > > > >> utility can still works in case crashed kernel has kaslr enabled.
> > > > > >>
> > > > > >> But according to Dave Anderson virsh dump does not work, quoted messages
> > > > > >> from Dave below:
> > > > > >>
> > > > > >> """
> > > > > >> with virsh dump, there's no way of even knowing that KASLR
> > > > > >> has randomized the kernel __START_KERNEL_map region, because there is no
> > > > > >> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > > > > >> vmcoreinfo data to compare against the vmlinux file symbol value.
> > > > > >> Unless virsh dump can export some basic virtual memory data, which
> > > > > >> they say it can't, I don't see how KASLR can ever be supported.
> > > > > >> """
> > > > > >>
> > > > > >> I assume virsh dump is using qemu guest memory dump facility so it
> > > > > >> should be first addressed in qemu. Thus post this query to qemu devel
> > > > > >> list. If this is not correct please let me know.
> > > > > >>
> > > > > >> Could you qemu dump people make it work? Or we can not support virt dump
> > > > > >> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > > > > >>
> > > > > > 
> > > > > > When the -kernel command line option is used, then it may be possible
> > > > > > to extract some information that could be used to supplement the memory
> > > > > > dump that dump-guest-memory provides. However, that would be a specific
> > > > > > use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > > > > > know where it is in the disk image, and it doesn't even know if it's
> > > > > > Linux.
> > > > > > 
> > > > > > Is there anything a guest userspace application could probe from e.g.
> > > > > > /proc that would work? If so, then the guest agent could gain a new
> > > > > > feature providing that.
> > > > > 
> > > > > I fully agree. This is exactly what I suggested too, independently, in
> > > > > the downstream thread, before arriving at this upstream thread. Let me
> > > > > quote that email:
> > > > > 
> > > > > On 11/09/16 12:09, Laszlo Ersek wrote:
> > > > > > [...] the dump-guest-memory QEMU command supports an option called
> > > > > > "paging". Here's its documentation, from the "qapi-schema.json" source
> > > > > > file:
> > > > > >
> > > > > >> # @paging: if true, do paging to get guest's memory mapping. This allows
> > > > > >> #          using gdb to process the core file.
> > > > > >> #
> > > > > >> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > > > > >> #                     of RAM. This can happen for a large guest, or a
> > > > > >> #                     malicious guest pretending to be large.
> > > > > >> #
> > > > > >> #          Also, paging=true has the following limitations:
> > > > > >> #
> > > > > >> #             1. The guest may be in a catastrophic state or can have corrupted
> > > > > >> #                memory, which cannot be trusted
> > > > > >> #             2. The guest can be in real-mode even if paging is enabled. For
> > > > > >> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > > > > >> #                goes in real-mode
> > > > > >> #             3. Currently only supported on i386 and x86_64.
> > > > > >> #
> > > > > >
> > > > > > "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > > > > >
> > > > > > [...] the dump-guest-memory command provides a raw snapshot of the
> > > > > > virtual machine's memory (and of the registers of the VCPUs); it is
> > > > > > not enlightened about the guest.
> > > > > >
> > > > > > If the additional information you are looking for can be retrieved
> > > > > > within the running Linux guest, using an appropriately privieleged
> > > > > > userspace process, then I would recommend considering an extension to
> > > > > > the qemu guest agent. The management layer (libvirt, [...]) could
> > > > > > first invoke the guest agent (a process with root privileges running
> > > > > > in the guest) from the host side, through virtio-serial. The new guest
> > > > > > agent command would return the information necessary to deal with
> > > > > > KASLR. Then the management layer would initiate the dump like always.
> > > > > > Finally, the extra information would be combined with (or placed
> > > > > > beside) the dump file in some way.
> > > > > >
> > > > > > So, this proposal would affect the guest agent and the management
> > > > > > layer (= libvirt).
> > > > > 
> > > > > Given that we already dislike "paging=true", enlightening
> > > > > dump-guest-memory with even more guest-specific insight is the wrong
> > > > > approach, IMO. That kind of knowledge belongs to the guest agent.
> > > > 
> > > > If you're trying to debug a hung/panicked guest, then using a guest
> > > > agent to fetch info is a complete non-starter as it'll be dead.
> > > 
> > > So don't wait. Management software can make this query immediately
> > > after the guest agent goes live. The information needed won't change.
> > 
> > That doesn't help with trying to diagnose a crash during boot up, since
> > the guest agent isn't running till fairly late. I'm also concerned that
> > the QEMU guest agent is likely to be far from widely deployed in guests,
> > so reliance on the guest agent will mean the dump facility is no longer
> > reliably available.
> >
> 
> It'd still be reliably available and useable during early boot, just like
> it is now, for kernels that don't use KASLR. This proposal is only
> attempting to *also* address KASLR kernels, for which there is currently
> no support whatsoever. Call it a best-effort.
> 
> Of course we can get support for [probably] early boot and
> guest-agent-less guests using KASLR too if we introduce a paravirt
> solution, requiring guest kernel and KVM changes. Is it worth it?

There's a standard for persistent storage that is intended to allow
the kernel to dump out data at time of crash:

   https://lwn.net/Articles/434821/

and there's some recent patches to provide a QEMU backend. Could we
leverage that facility to get the data we need from the guest kernel ?

Instead of only using pstore at time of crash, the kernel could see
that its running on KVM, and write out the paging data to pstore. So
when QEMU later generates a core dump, it can grab the corresponding
data from pstore backend ?

Still requires an extra device, to be configured, but at lesat we
would not have to invent yet another paravirt device ourselves, just
use the existing framework.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 10:40 ` Andrew Jones
  2016-11-09 11:26   ` Laszlo Ersek
@ 2016-11-09 15:28   ` Dave Anderson
  2016-11-14 10:41     ` Paolo Bonzini
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Anderson @ 2016-11-09 15:28 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Dave Young, wency, qiaonuohan, lersek, qemu-devel, bhe



----- Original Message -----
> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > Hi,
> > 
> > Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > addresses, we had some effort to support kexec/kdump so that crash
> > utility can still works in case crashed kernel has kaslr enabled.
> > 
> > But according to Dave Anderson virsh dump does not work, quoted messages
> > from Dave below:
> > 
> > """
> > with virsh dump, there's no way of even knowing that KASLR
> > has randomized the kernel __START_KERNEL_map region, because there is no
> > virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > vmcoreinfo data to compare against the vmlinux file symbol value.
> > Unless virsh dump can export some basic virtual memory data, which
> > they say it can't, I don't see how KASLR can ever be supported.
> > """
> > 
> > I assume virsh dump is using qemu guest memory dump facility so it
> > should be first addressed in qemu. Thus post this query to qemu devel
> > list. If this is not correct please let me know.
> > 
> > Could you qemu dump people make it work? Or we can not support virt dump
> > as long as KASLR being enabled. Latest Fedora kernel has enabled it in
> > x86_64.
> >
> 
> When the -kernel command line option is used, then it may be possible
> to extract some information that could be used to supplement the memory
> dump that dump-guest-memory provides. However, that would be a specific
> use. In general, QEMU knows nothing about the guest kernel. It doesn't
> know where it is in the disk image, and it doesn't even know if it's
> Linux.
> 
> Is there anything a guest userspace application could probe from e.g.
> /proc that would work? If so, then the guest agent could gain a new
> feature providing that.
> 
> Thanks,
> drew

I'm not sure whether this "guest userspace agent" is still in play here,
but if there were such a thing, it could theoretically do the same
thing that crash currently does when running on a live system.

Two basic necessities are are needed, whether running live or against
a dumpfile:

(1) the CONFIG_RANDOMIZE_BASE relocation value that modifies the
    kernel virtual address range compiled into the vmlinux file, which
    starts at the hardwired __START_KERNEL_map address.

(2) the contents of the kernel's "phys_base" symbol.

Both of those are available or calculatable from the contents of
a kdump header.  However, on a live system, it's done like this:

- /proc/kallsyms is queried for the symbol value of "_text", which would
  be relocated if KASLR is in play.  That value is compared against the
  "_text" symbol value compiled into the vmlinux file to determine the
  relocation value generated by CONFIG_RANDOMIZE_BASE.

Given that relocation value, and before any kernel memory is accessed,
crash goes in a backdoor into its embedded gdb module, and modifies
the data structures of all kernel symbols, applying the relocation
value.

Once that's done, in order to read kernel symbols from the 
statically-mapped kernel region based at __START_KERNEL_map, it 
translates a (possibly relocated) kernel virtual address into a
physical address like this:

  physical-address = virtual-address - __START_KERNEL_map + phys_base

But it's a chicken-and-egg deal, because the contents of the "phys_base"
symbol are needed to calculate the physical address, but it can't
read the "phys_base" symbol contents without first knowing its contents.

So on a live system, the "phys_base" is calculated by reading
the "Kernel Code:" value from /proc/iomem, and then doing this:

  phys_base = [Kernel Code: value] - ["_text" symbol value] - __START_KERNEL_map

So theoretically, the guest agent could read /proc/iomem and /proc/kallsyms
for the information required.  (I think...)

Dave
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 14:47             ` Daniel P. Berrange
@ 2016-11-09 15:38               ` Laszlo Ersek
  2016-11-09 16:01                 ` Daniel P. Berrange
  2016-11-14  5:32                 ` Dave Young
  0 siblings, 2 replies; 30+ messages in thread
From: Laszlo Ersek @ 2016-11-09 15:38 UTC (permalink / raw)
  To: Daniel P. Berrange, Andrew Jones
  Cc: Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On 11/09/16 15:47, Daniel P. Berrange wrote:
> On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
>> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
>>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
>>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
>>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
>>>>>> On 11/09/16 11:40, Andrew Jones wrote:
>>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
>>>>>>>> addresses, we had some effort to support kexec/kdump so that crash
>>>>>>>> utility can still works in case crashed kernel has kaslr enabled.
>>>>>>>>
>>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages
>>>>>>>> from Dave below:
>>>>>>>>
>>>>>>>> """
>>>>>>>> with virsh dump, there's no way of even knowing that KASLR
>>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no
>>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
>>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
>>>>>>>> Unless virsh dump can export some basic virtual memory data, which
>>>>>>>> they say it can't, I don't see how KASLR can ever be supported.
>>>>>>>> """
>>>>>>>>
>>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it
>>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel
>>>>>>>> list. If this is not correct please let me know.
>>>>>>>>
>>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump
>>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
>>>>>>>>
>>>>>>>
>>>>>>> When the -kernel command line option is used, then it may be possible
>>>>>>> to extract some information that could be used to supplement the memory
>>>>>>> dump that dump-guest-memory provides. However, that would be a specific
>>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't
>>>>>>> know where it is in the disk image, and it doesn't even know if it's
>>>>>>> Linux.
>>>>>>>
>>>>>>> Is there anything a guest userspace application could probe from e.g.
>>>>>>> /proc that would work? If so, then the guest agent could gain a new
>>>>>>> feature providing that.
>>>>>>
>>>>>> I fully agree. This is exactly what I suggested too, independently, in
>>>>>> the downstream thread, before arriving at this upstream thread. Let me
>>>>>> quote that email:
>>>>>>
>>>>>> On 11/09/16 12:09, Laszlo Ersek wrote:
>>>>>>> [...] the dump-guest-memory QEMU command supports an option called
>>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source
>>>>>>> file:
>>>>>>>
>>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows
>>>>>>>> #          using gdb to process the core file.
>>>>>>>> #
>>>>>>>> #          IMPORTANT: this option can make QEMU allocate several gigabytes
>>>>>>>> #                     of RAM. This can happen for a large guest, or a
>>>>>>>> #                     malicious guest pretending to be large.
>>>>>>>> #
>>>>>>>> #          Also, paging=true has the following limitations:
>>>>>>>> #
>>>>>>>> #             1. The guest may be in a catastrophic state or can have corrupted
>>>>>>>> #                memory, which cannot be trusted
>>>>>>>> #             2. The guest can be in real-mode even if paging is enabled. For
>>>>>>>> #                example, the guest uses ACPI to sleep, and ACPI sleep state
>>>>>>>> #                goes in real-mode
>>>>>>>> #             3. Currently only supported on i386 and x86_64.
>>>>>>>> #
>>>>>>>
>>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons.
>>>>>>>
>>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the
>>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is
>>>>>>> not enlightened about the guest.
>>>>>>>
>>>>>>> If the additional information you are looking for can be retrieved
>>>>>>> within the running Linux guest, using an appropriately privieleged
>>>>>>> userspace process, then I would recommend considering an extension to
>>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could
>>>>>>> first invoke the guest agent (a process with root privileges running
>>>>>>> in the guest) from the host side, through virtio-serial. The new guest
>>>>>>> agent command would return the information necessary to deal with
>>>>>>> KASLR. Then the management layer would initiate the dump like always.
>>>>>>> Finally, the extra information would be combined with (or placed
>>>>>>> beside) the dump file in some way.
>>>>>>>
>>>>>>> So, this proposal would affect the guest agent and the management
>>>>>>> layer (= libvirt).
>>>>>>
>>>>>> Given that we already dislike "paging=true", enlightening
>>>>>> dump-guest-memory with even more guest-specific insight is the wrong
>>>>>> approach, IMO. That kind of knowledge belongs to the guest agent.
>>>>>
>>>>> If you're trying to debug a hung/panicked guest, then using a guest
>>>>> agent to fetch info is a complete non-starter as it'll be dead.

Yes, I realized this a while after posting...

>>>> So don't wait. Management software can make this query immediately
>>>> after the guest agent goes live. The information needed won't change.

... and then figured this would solve the problem.

>>> That doesn't help with trying to diagnose a crash during boot up, since
>>> the guest agent isn't running till fairly late. I'm also concerned that
>>> the QEMU guest agent is likely to be far from widely deployed in guests,

I have no hard data, but from the recent Fedora and RHEL-7 guest
installations I've done, it seems like qga is installed automatically.
(Not sure if that's because Anaconda realizes it's installing the OS in
a VM.) Once I made sure there was an appropriate virtio-serial config in
the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
immediately.

>>> so reliance on the guest agent will mean the dump facility is no longer
>>> reliably available.
>>>
>>
>> It'd still be reliably available and useable during early boot, just like
>> it is now, for kernels that don't use KASLR. This proposal is only
>> attempting to *also* address KASLR kernels, for which there is currently
>> no support whatsoever. Call it a best-effort.
>>
>> Of course we can get support for [probably] early boot and
>> guest-agent-less guests using KASLR too if we introduce a paravirt
>> solution, requiring guest kernel and KVM changes. Is it worth it?
> 
> There's a standard for persistent storage that is intended to allow
> the kernel to dump out data at time of crash:
> 
>    https://lwn.net/Articles/434821/
> 
> and there's some recent patches to provide a QEMU backend. Could we
> leverage that facility to get the data we need from the guest kernel ?
> 
> Instead of only using pstore at time of crash, the kernel could see
> that its running on KVM, and write out the paging data to pstore. So
> when QEMU later generates a core dump, it can grab the corresponding
> data from pstore backend ?
> 
> Still requires an extra device, to be configured, but at lesat we
> would not have to invent yet another paravirt device ourselves, just
> use the existing framework.

Not disagreeing, I'd just like to point out that the kernel can also
crash before the extra device (the pstore driver) is configured
(especially if the driver is built as a module).

Laszlo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 15:38               ` Laszlo Ersek
@ 2016-11-09 16:01                 ` Daniel P. Berrange
  2016-11-14 10:27                   ` Paolo Bonzini
  2016-11-14  5:32                 ` Dave Young
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-09 16:01 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Andrew Jones, Dave Young, qiaonuohan, bhe, anderson, qemu-devel

On Wed, Nov 09, 2016 at 04:38:36PM +0100, Laszlo Ersek wrote:
> On 11/09/16 15:47, Daniel P. Berrange wrote:
> >>> That doesn't help with trying to diagnose a crash during boot up, since
> >>> the guest agent isn't running till fairly late. I'm also concerned that
> >>> the QEMU guest agent is likely to be far from widely deployed in guests,
> 
> I have no hard data, but from the recent Fedora and RHEL-7 guest
> installations I've done, it seems like qga is installed automatically.
> (Not sure if that's because Anaconda realizes it's installing the OS in
> a VM.) Once I made sure there was an appropriate virtio-serial config in
> the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
> immediately.

I'm thinking about cloud deployment where people rarely use Anaconda
directly - they'll use a pre-built cloud image, or customize the basic
cloud image. Neither Fedora or Ubuntu include the qemu guest agent in
their cloud images AFAICT, so very few OpenStack deployments will have
QEMU guest agent in at this time.

Of course we could try to get distros to embed qemu guets agent by
default, but its not clear if we'd succeed, given how aggressive
they are at stripping stuff out to create the smallest practical
images.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 15:38               ` Laszlo Ersek
  2016-11-09 16:01                 ` Daniel P. Berrange
@ 2016-11-14  5:32                 ` Dave Young
  2016-11-14  9:47                   ` Andrew Jones
  2016-11-14 10:10                   ` Daniel P. Berrange
  1 sibling, 2 replies; 30+ messages in thread
From: Dave Young @ 2016-11-14  5:32 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Daniel P. Berrange, Andrew Jones, qiaonuohan, bhe, anderson, qemu-devel

On 11/09/16 at 04:38pm, Laszlo Ersek wrote:
> On 11/09/16 15:47, Daniel P. Berrange wrote:
> > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
> >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> >>>>>> On 11/09/16 11:40, Andrew Jones wrote:
> >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash
> >>>>>>>> utility can still works in case crashed kernel has kaslr enabled.
> >>>>>>>>
> >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages
> >>>>>>>> from Dave below:
> >>>>>>>>
> >>>>>>>> """
> >>>>>>>> with virsh dump, there's no way of even knowing that KASLR
> >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no
> >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
> >>>>>>>> Unless virsh dump can export some basic virtual memory data, which
> >>>>>>>> they say it can't, I don't see how KASLR can ever be supported.
> >>>>>>>> """
> >>>>>>>>
> >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it
> >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel
> >>>>>>>> list. If this is not correct please let me know.
> >>>>>>>>
> >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump
> >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> >>>>>>>>
> >>>>>>>
> >>>>>>> When the -kernel command line option is used, then it may be possible
> >>>>>>> to extract some information that could be used to supplement the memory
> >>>>>>> dump that dump-guest-memory provides. However, that would be a specific
> >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't
> >>>>>>> know where it is in the disk image, and it doesn't even know if it's
> >>>>>>> Linux.
> >>>>>>>
> >>>>>>> Is there anything a guest userspace application could probe from e.g.
> >>>>>>> /proc that would work? If so, then the guest agent could gain a new
> >>>>>>> feature providing that.
> >>>>>>
> >>>>>> I fully agree. This is exactly what I suggested too, independently, in
> >>>>>> the downstream thread, before arriving at this upstream thread. Let me
> >>>>>> quote that email:
> >>>>>>
> >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote:
> >>>>>>> [...] the dump-guest-memory QEMU command supports an option called
> >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source
> >>>>>>> file:
> >>>>>>>
> >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows
> >>>>>>>> #          using gdb to process the core file.
> >>>>>>>> #
> >>>>>>>> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> >>>>>>>> #                     of RAM. This can happen for a large guest, or a
> >>>>>>>> #                     malicious guest pretending to be large.
> >>>>>>>> #
> >>>>>>>> #          Also, paging=true has the following limitations:
> >>>>>>>> #
> >>>>>>>> #             1. The guest may be in a catastrophic state or can have corrupted
> >>>>>>>> #                memory, which cannot be trusted
> >>>>>>>> #             2. The guest can be in real-mode even if paging is enabled. For
> >>>>>>>> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> >>>>>>>> #                goes in real-mode
> >>>>>>>> #             3. Currently only supported on i386 and x86_64.
> >>>>>>>> #
> >>>>>>>
> >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons.
> >>>>>>>
> >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the
> >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is
> >>>>>>> not enlightened about the guest.
> >>>>>>>
> >>>>>>> If the additional information you are looking for can be retrieved
> >>>>>>> within the running Linux guest, using an appropriately privieleged
> >>>>>>> userspace process, then I would recommend considering an extension to
> >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could
> >>>>>>> first invoke the guest agent (a process with root privileges running
> >>>>>>> in the guest) from the host side, through virtio-serial. The new guest
> >>>>>>> agent command would return the information necessary to deal with
> >>>>>>> KASLR. Then the management layer would initiate the dump like always.
> >>>>>>> Finally, the extra information would be combined with (or placed
> >>>>>>> beside) the dump file in some way.
> >>>>>>>
> >>>>>>> So, this proposal would affect the guest agent and the management
> >>>>>>> layer (= libvirt).
> >>>>>>
> >>>>>> Given that we already dislike "paging=true", enlightening
> >>>>>> dump-guest-memory with even more guest-specific insight is the wrong
> >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent.
> >>>>>
> >>>>> If you're trying to debug a hung/panicked guest, then using a guest
> >>>>> agent to fetch info is a complete non-starter as it'll be dead.
> 
> Yes, I realized this a while after posting...
> 
> >>>> So don't wait. Management software can make this query immediately
> >>>> after the guest agent goes live. The information needed won't change.
> 
> ... and then figured this would solve the problem.
> 
> >>> That doesn't help with trying to diagnose a crash during boot up, since
> >>> the guest agent isn't running till fairly late. I'm also concerned that
> >>> the QEMU guest agent is likely to be far from widely deployed in guests,
> 
> I have no hard data, but from the recent Fedora and RHEL-7 guest
> installations I've done, it seems like qga is installed automatically.
> (Not sure if that's because Anaconda realizes it's installing the OS in
> a VM.) Once I made sure there was an appropriate virtio-serial config in
> the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
> immediately.
> 
> >>> so reliance on the guest agent will mean the dump facility is no longer
> >>> reliably available.
> >>>
> >>
> >> It'd still be reliably available and useable during early boot, just like
> >> it is now, for kernels that don't use KASLR. This proposal is only
> >> attempting to *also* address KASLR kernels, for which there is currently
> >> no support whatsoever. Call it a best-effort.
> >>
> >> Of course we can get support for [probably] early boot and
> >> guest-agent-less guests using KASLR too if we introduce a paravirt
> >> solution, requiring guest kernel and KVM changes. Is it worth it?
> > 
> > There's a standard for persistent storage that is intended to allow
> > the kernel to dump out data at time of crash:
> > 
> >    https://lwn.net/Articles/434821/
> > 
> > and there's some recent patches to provide a QEMU backend. Could we
> > leverage that facility to get the data we need from the guest kernel ?
> > 
> > Instead of only using pstore at time of crash, the kernel could see
> > that its running on KVM, and write out the paging data to pstore. So
> > when QEMU later generates a core dump, it can grab the corresponding
> > data from pstore backend ?
> > 
> > Still requires an extra device, to be configured, but at lesat we
> > would not have to invent yet another paravirt device ourselves, just
> > use the existing framework.
> 
> Not disagreeing, I'd just like to point out that the kernel can also
> crash before the extra device (the pstore driver) is configured
> (especially if the driver is built as a module).

Boot phase crash is also a problem for kdump, but hopefully the boot
phase crash will be found early and get fixed early. The run time
problems are harder, it will still be helpful.

I'm not a virt expert, but from my feeling comparint guest agent and
pstore I would vote for guest agent, it is ready to work on now, no?
For pstore I'm not sure how to make a pstore device for all guests. I
know uefi guest can use its nvram, but introducing some general pstore
sounds hard..

Thanks
Dave

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14  5:32                 ` Dave Young
@ 2016-11-14  9:47                   ` Andrew Jones
  2016-11-16  2:48                     ` Dave Young
  2016-11-14 10:10                   ` Daniel P. Berrange
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Jones @ 2016-11-14  9:47 UTC (permalink / raw)
  To: Dave Young; +Cc: Laszlo Ersek, bhe, qemu-devel, qiaonuohan, anderson

On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote:
> On 11/09/16 at 04:38pm, Laszlo Ersek wrote:
> > On 11/09/16 15:47, Daniel P. Berrange wrote:
> > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
> > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > >>>>>> On 11/09/16 11:40, Andrew Jones wrote:
> > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash
> > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled.
> > >>>>>>>>
> > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages
> > >>>>>>>> from Dave below:
> > >>>>>>>>
> > >>>>>>>> """
> > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR
> > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no
> > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
> > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which
> > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported.
> > >>>>>>>> """
> > >>>>>>>>
> > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it
> > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel
> > >>>>>>>> list. If this is not correct please let me know.
> > >>>>>>>>
> > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump
> > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> When the -kernel command line option is used, then it may be possible
> > >>>>>>> to extract some information that could be used to supplement the memory
> > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific
> > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > >>>>>>> know where it is in the disk image, and it doesn't even know if it's
> > >>>>>>> Linux.
> > >>>>>>>
> > >>>>>>> Is there anything a guest userspace application could probe from e.g.
> > >>>>>>> /proc that would work? If so, then the guest agent could gain a new
> > >>>>>>> feature providing that.
> > >>>>>>
> > >>>>>> I fully agree. This is exactly what I suggested too, independently, in
> > >>>>>> the downstream thread, before arriving at this upstream thread. Let me
> > >>>>>> quote that email:
> > >>>>>>
> > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote:
> > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called
> > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source
> > >>>>>>> file:
> > >>>>>>>
> > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows
> > >>>>>>>> #          using gdb to process the core file.
> > >>>>>>>> #
> > >>>>>>>> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > >>>>>>>> #                     of RAM. This can happen for a large guest, or a
> > >>>>>>>> #                     malicious guest pretending to be large.
> > >>>>>>>> #
> > >>>>>>>> #          Also, paging=true has the following limitations:
> > >>>>>>>> #
> > >>>>>>>> #             1. The guest may be in a catastrophic state or can have corrupted
> > >>>>>>>> #                memory, which cannot be trusted
> > >>>>>>>> #             2. The guest can be in real-mode even if paging is enabled. For
> > >>>>>>>> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > >>>>>>>> #                goes in real-mode
> > >>>>>>>> #             3. Currently only supported on i386 and x86_64.
> > >>>>>>>> #
> > >>>>>>>
> > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > >>>>>>>
> > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the
> > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is
> > >>>>>>> not enlightened about the guest.
> > >>>>>>>
> > >>>>>>> If the additional information you are looking for can be retrieved
> > >>>>>>> within the running Linux guest, using an appropriately privieleged
> > >>>>>>> userspace process, then I would recommend considering an extension to
> > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could
> > >>>>>>> first invoke the guest agent (a process with root privileges running
> > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest
> > >>>>>>> agent command would return the information necessary to deal with
> > >>>>>>> KASLR. Then the management layer would initiate the dump like always.
> > >>>>>>> Finally, the extra information would be combined with (or placed
> > >>>>>>> beside) the dump file in some way.
> > >>>>>>>
> > >>>>>>> So, this proposal would affect the guest agent and the management
> > >>>>>>> layer (= libvirt).
> > >>>>>>
> > >>>>>> Given that we already dislike "paging=true", enlightening
> > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong
> > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent.
> > >>>>>
> > >>>>> If you're trying to debug a hung/panicked guest, then using a guest
> > >>>>> agent to fetch info is a complete non-starter as it'll be dead.
> > 
> > Yes, I realized this a while after posting...
> > 
> > >>>> So don't wait. Management software can make this query immediately
> > >>>> after the guest agent goes live. The information needed won't change.
> > 
> > ... and then figured this would solve the problem.
> > 
> > >>> That doesn't help with trying to diagnose a crash during boot up, since
> > >>> the guest agent isn't running till fairly late. I'm also concerned that
> > >>> the QEMU guest agent is likely to be far from widely deployed in guests,
> > 
> > I have no hard data, but from the recent Fedora and RHEL-7 guest
> > installations I've done, it seems like qga is installed automatically.
> > (Not sure if that's because Anaconda realizes it's installing the OS in
> > a VM.) Once I made sure there was an appropriate virtio-serial config in
> > the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
> > immediately.
> > 
> > >>> so reliance on the guest agent will mean the dump facility is no longer
> > >>> reliably available.
> > >>>
> > >>
> > >> It'd still be reliably available and useable during early boot, just like
> > >> it is now, for kernels that don't use KASLR. This proposal is only
> > >> attempting to *also* address KASLR kernels, for which there is currently
> > >> no support whatsoever. Call it a best-effort.
> > >>
> > >> Of course we can get support for [probably] early boot and
> > >> guest-agent-less guests using KASLR too if we introduce a paravirt
> > >> solution, requiring guest kernel and KVM changes. Is it worth it?
> > > 
> > > There's a standard for persistent storage that is intended to allow
> > > the kernel to dump out data at time of crash:
> > > 
> > >    https://lwn.net/Articles/434821/
> > > 
> > > and there's some recent patches to provide a QEMU backend. Could we
> > > leverage that facility to get the data we need from the guest kernel ?
> > > 
> > > Instead of only using pstore at time of crash, the kernel could see
> > > that its running on KVM, and write out the paging data to pstore. So
> > > when QEMU later generates a core dump, it can grab the corresponding
> > > data from pstore backend ?
> > > 
> > > Still requires an extra device, to be configured, but at lesat we
> > > would not have to invent yet another paravirt device ourselves, just
> > > use the existing framework.
> > 
> > Not disagreeing, I'd just like to point out that the kernel can also
> > crash before the extra device (the pstore driver) is configured
> > (especially if the driver is built as a module).
> 
> Boot phase crash is also a problem for kdump, but hopefully the boot
> phase crash will be found early and get fixed early. The run time
> problems are harder, it will still be helpful.
> 
> I'm not a virt expert, but from my feeling comparint guest agent and
> pstore I would vote for guest agent, it is ready to work on now, no?
> For pstore I'm not sure how to make a pstore device for all guests. I
> know uefi guest can use its nvram, but introducing some general pstore
> sounds hard..
>

Nothing is stopping us from doing both, eventually. Care should be taken
on the management side to make it general enough. It should be designed
such that it can use guest-agent now, but in no way is bound to guest-
agent. We can decide later if we want to replace guest-agent with some
paravirt solution.

Nothing is blocking guest-agent patches now, that I know of.

Thanks,
drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14  5:32                 ` Dave Young
  2016-11-14  9:47                   ` Andrew Jones
@ 2016-11-14 10:10                   ` Daniel P. Berrange
  2016-11-14 10:28                     ` Paolo Bonzini
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-14 10:10 UTC (permalink / raw)
  To: Dave Young
  Cc: Laszlo Ersek, Andrew Jones, qiaonuohan, bhe, anderson, qemu-devel

On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote:
> On 11/09/16 at 04:38pm, Laszlo Ersek wrote:
> > On 11/09/16 15:47, Daniel P. Berrange wrote:
> > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
> > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > >>>>>> On 11/09/16 11:40, Andrew Jones wrote:
> > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash
> > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled.
> > >>>>>>>>
> > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages
> > >>>>>>>> from Dave below:
> > >>>>>>>>
> > >>>>>>>> """
> > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR
> > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no
> > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
> > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which
> > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported.
> > >>>>>>>> """
> > >>>>>>>>
> > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it
> > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel
> > >>>>>>>> list. If this is not correct please let me know.
> > >>>>>>>>
> > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump
> > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> When the -kernel command line option is used, then it may be possible
> > >>>>>>> to extract some information that could be used to supplement the memory
> > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific
> > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > >>>>>>> know where it is in the disk image, and it doesn't even know if it's
> > >>>>>>> Linux.
> > >>>>>>>
> > >>>>>>> Is there anything a guest userspace application could probe from e.g.
> > >>>>>>> /proc that would work? If so, then the guest agent could gain a new
> > >>>>>>> feature providing that.
> > >>>>>>
> > >>>>>> I fully agree. This is exactly what I suggested too, independently, in
> > >>>>>> the downstream thread, before arriving at this upstream thread. Let me
> > >>>>>> quote that email:
> > >>>>>>
> > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote:
> > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called
> > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source
> > >>>>>>> file:
> > >>>>>>>
> > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows
> > >>>>>>>> #          using gdb to process the core file.
> > >>>>>>>> #
> > >>>>>>>> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > >>>>>>>> #                     of RAM. This can happen for a large guest, or a
> > >>>>>>>> #                     malicious guest pretending to be large.
> > >>>>>>>> #
> > >>>>>>>> #          Also, paging=true has the following limitations:
> > >>>>>>>> #
> > >>>>>>>> #             1. The guest may be in a catastrophic state or can have corrupted
> > >>>>>>>> #                memory, which cannot be trusted
> > >>>>>>>> #             2. The guest can be in real-mode even if paging is enabled. For
> > >>>>>>>> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > >>>>>>>> #                goes in real-mode
> > >>>>>>>> #             3. Currently only supported on i386 and x86_64.
> > >>>>>>>> #
> > >>>>>>>
> > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > >>>>>>>
> > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the
> > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is
> > >>>>>>> not enlightened about the guest.
> > >>>>>>>
> > >>>>>>> If the additional information you are looking for can be retrieved
> > >>>>>>> within the running Linux guest, using an appropriately privieleged
> > >>>>>>> userspace process, then I would recommend considering an extension to
> > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could
> > >>>>>>> first invoke the guest agent (a process with root privileges running
> > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest
> > >>>>>>> agent command would return the information necessary to deal with
> > >>>>>>> KASLR. Then the management layer would initiate the dump like always.
> > >>>>>>> Finally, the extra information would be combined with (or placed
> > >>>>>>> beside) the dump file in some way.
> > >>>>>>>
> > >>>>>>> So, this proposal would affect the guest agent and the management
> > >>>>>>> layer (= libvirt).
> > >>>>>>
> > >>>>>> Given that we already dislike "paging=true", enlightening
> > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong
> > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent.
> > >>>>>
> > >>>>> If you're trying to debug a hung/panicked guest, then using a guest
> > >>>>> agent to fetch info is a complete non-starter as it'll be dead.
> > 
> > Yes, I realized this a while after posting...
> > 
> > >>>> So don't wait. Management software can make this query immediately
> > >>>> after the guest agent goes live. The information needed won't change.
> > 
> > ... and then figured this would solve the problem.
> > 
> > >>> That doesn't help with trying to diagnose a crash during boot up, since
> > >>> the guest agent isn't running till fairly late. I'm also concerned that
> > >>> the QEMU guest agent is likely to be far from widely deployed in guests,
> > 
> > I have no hard data, but from the recent Fedora and RHEL-7 guest
> > installations I've done, it seems like qga is installed automatically.
> > (Not sure if that's because Anaconda realizes it's installing the OS in
> > a VM.) Once I made sure there was an appropriate virtio-serial config in
> > the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
> > immediately.
> > 
> > >>> so reliance on the guest agent will mean the dump facility is no longer
> > >>> reliably available.
> > >>>
> > >>
> > >> It'd still be reliably available and useable during early boot, just like
> > >> it is now, for kernels that don't use KASLR. This proposal is only
> > >> attempting to *also* address KASLR kernels, for which there is currently
> > >> no support whatsoever. Call it a best-effort.
> > >>
> > >> Of course we can get support for [probably] early boot and
> > >> guest-agent-less guests using KASLR too if we introduce a paravirt
> > >> solution, requiring guest kernel and KVM changes. Is it worth it?
> > > 
> > > There's a standard for persistent storage that is intended to allow
> > > the kernel to dump out data at time of crash:
> > > 
> > >    https://lwn.net/Articles/434821/
> > > 
> > > and there's some recent patches to provide a QEMU backend. Could we
> > > leverage that facility to get the data we need from the guest kernel ?
> > > 
> > > Instead of only using pstore at time of crash, the kernel could see
> > > that its running on KVM, and write out the paging data to pstore. So
> > > when QEMU later generates a core dump, it can grab the corresponding
> > > data from pstore backend ?
> > > 
> > > Still requires an extra device, to be configured, but at lesat we
> > > would not have to invent yet another paravirt device ourselves, just
> > > use the existing framework.
> > 
> > Not disagreeing, I'd just like to point out that the kernel can also
> > crash before the extra device (the pstore driver) is configured
> > (especially if the driver is built as a module).
> 
> Boot phase crash is also a problem for kdump, but hopefully the boot
> phase crash will be found early and get fixed early. The run time
> problems are harder, it will still be helpful.
> 
> I'm not a virt expert, but from my feeling comparint guest agent and
> pstore I would vote for guest agent, it is ready to work on now, no?
> For pstore I'm not sure how to make a pstore device for all guests. I
> know uefi guest can use its nvram, but introducing some general pstore
> sounds hard..

There's already patches posted to create a virtio-pstore device for
QEMU, which is what led me to suggest this as an option:

  https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 16:01                 ` Daniel P. Berrange
@ 2016-11-14 10:27                   ` Paolo Bonzini
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2016-11-14 10:27 UTC (permalink / raw)
  To: Daniel P. Berrange, Laszlo Ersek
  Cc: Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson, Dave Young



On 09/11/2016 17:01, Daniel P. Berrange wrote:
> On Wed, Nov 09, 2016 at 04:38:36PM +0100, Laszlo Ersek wrote:
>> On 11/09/16 15:47, Daniel P. Berrange wrote:
>>>>> That doesn't help with trying to diagnose a crash during boot up, since
>>>>> the guest agent isn't running till fairly late. I'm also concerned that
>>>>> the QEMU guest agent is likely to be far from widely deployed in guests,
>>
>> I have no hard data, but from the recent Fedora and RHEL-7 guest
>> installations I've done, it seems like qga is installed automatically.
>> (Not sure if that's because Anaconda realizes it's installing the OS in
>> a VM.) Once I made sure there was an appropriate virtio-serial config in
>> the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
>> immediately.
> 
> I'm thinking about cloud deployment where people rarely use Anaconda
> directly - they'll use a pre-built cloud image, or customize the basic
> cloud image. Neither Fedora or Ubuntu include the qemu guest agent in
> their cloud images AFAICT, so very few OpenStack deployments will have
> QEMU guest agent in at this time.

That's a bug in my opinion.  The guest agent is necessary in order to
avoid losing data in snapshots.  We should not only get distros to embed
the guest agent, but also to include the relevant freeze/thaw hooks.

Paolo

> Of course we could try to get distros to embed qemu guets agent by
> default, but its not clear if we'd succeed, given how aggressive
> they are at stripping stuff out to create the smallest practical
> images.
> 
> Regards,
> Daniel
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14 10:10                   ` Daniel P. Berrange
@ 2016-11-14 10:28                     ` Paolo Bonzini
  2016-11-14 10:33                       ` Daniel P. Berrange
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2016-11-14 10:28 UTC (permalink / raw)
  To: Daniel P. Berrange, Dave Young
  Cc: Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson, Laszlo Ersek



On 14/11/2016 11:10, Daniel P. Berrange wrote:
> There's already patches posted to create a virtio-pstore device for
> QEMU, which is what led me to suggest this as an option:
> 
>   https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html

It's also possible to use UEFI as a pstore backend.

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14 10:28                     ` Paolo Bonzini
@ 2016-11-14 10:33                       ` Daniel P. Berrange
  2016-11-14 11:08                         ` Laszlo Ersek
  2016-11-14 11:55                         ` Paolo Bonzini
  0 siblings, 2 replies; 30+ messages in thread
From: Daniel P. Berrange @ 2016-11-14 10:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dave Young, Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson,
	Laszlo Ersek

On Mon, Nov 14, 2016 at 11:28:04AM +0100, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 11:10, Daniel P. Berrange wrote:
> > There's already patches posted to create a virtio-pstore device for
> > QEMU, which is what led me to suggest this as an option:
> > 
> >   https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html
> 
> It's also possible to use UEFI as a pstore backend.

Presumably that'll also require some QEMU patches to provide storage
for UEFI's pstore ?

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-09 15:28   ` Dave Anderson
@ 2016-11-14 10:41     ` Paolo Bonzini
  2016-11-15 14:41       ` Dave Anderson
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2016-11-14 10:41 UTC (permalink / raw)
  To: Dave Anderson, Andrew Jones
  Cc: bhe, Dave Young, qemu-devel, qiaonuohan, lersek



On 09/11/2016 16:28, Dave Anderson wrote:
> I'm not sure whether this "guest userspace agent" is still in play here,
> but if there were such a thing, it could theoretically do the same
> thing that crash currently does when running on a live system.
> 
> Both of those are available or calculatable from the contents of
> a kdump header.  However, on a live system, it's done like this:
> 
> - /proc/kallsyms is queried for the symbol value of "_text", which would
>   be relocated if KASLR is in play.  That value is compared against the
>   "_text" symbol value compiled into the vmlinux file to determine the
>   relocation value generated by CONFIG_RANDOMIZE_BASE.
> 
> [...] in order to read kernel symbols from the 
> statically-mapped kernel region based at __START_KERNEL_map, it 
> translates a (possibly relocated) kernel virtual address into a
> physical address like this:
> 
>   physical-address = virtual-address - __START_KERNEL_map + phys_base
> 
> But it's a chicken-and-egg deal, because the contents of the "phys_base"
> symbol are needed to calculate the physical address, but it can't
> read the "phys_base" symbol contents without first knowing its contents.
> 
> So on a live system, the "phys_base" is calculated by reading
> the "Kernel Code:" value from /proc/iomem, and then doing this:
> 
>   phys_base = [Kernel Code: value] - ["_text" symbol value] - __START_KERNEL_map
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Should there be parentheses around this?  The physical-address formula
above is equivalent to

    phys_base = physical-address - (virtual-address - __START_KERNEL_map)

> 
> So theoretically, the guest agent could read /proc/iomem and /proc/kallsyms
> for the information required.  (I think...)

Then yes, the guest-agent could add a command get-kernel-text-start with an output like:

{ 'virtual': 0xffffffff86000000, 'physical': 0xb6000000 }

and libvirt can expose it to crash.  In this case, phys_base would be 0xb0000000
if I did the math right, and the relocation value is obtained by comparing the
"virtual" address with the vmlinux "_text".

IIRC the guest agent runs as root, so reading /proc/iomem is not a problem.

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14 10:33                       ` Daniel P. Berrange
@ 2016-11-14 11:08                         ` Laszlo Ersek
  2016-11-14 11:55                         ` Paolo Bonzini
  1 sibling, 0 replies; 30+ messages in thread
From: Laszlo Ersek @ 2016-11-14 11:08 UTC (permalink / raw)
  To: Daniel P. Berrange, Paolo Bonzini
  Cc: Dave Young, Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson

On 11/14/16 11:33, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2016 at 11:28:04AM +0100, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 11:10, Daniel P. Berrange wrote:
>>> There's already patches posted to create a virtio-pstore device for
>>> QEMU, which is what led me to suggest this as an option:
>>>
>>>   https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html
>>
>> It's also possible to use UEFI as a pstore backend.
> 
> Presumably that'll also require some QEMU patches to provide storage
> for UEFI's pstore ?

Using UEFI non-volatile variables as a pstore backend is a guest kernel
feature, and it already works transparently with OVMF utilizing QEMU's
pflash device. If memory serves, the data to be written are broken into
1KB chunks, and saved as separate UEFI variables under a dedicated
namespace GUID.

https://bugzilla.redhat.com/show_bug.cgi?id=828497

(Private BZ -- I apologize to the non-RedHatter subscribers that read this.)

(Also, not everyone has been enthusiastic about this feature:
<https://bugzilla.redhat.com/show_bug.cgi?id=919485>.)

Anyway, when I say "it works", I mean it works for the direct purpose of
storing data (like saving dmesg at panic), and for retrieving data, from
within the guest. (At a subsequent guest boot, possibly.) This is the
scope of pstore in general, AIUI (see "Documentation/ABI/testing/pstore").

However, host-side insight into the OVMF/edk2 varstore format remains
something we don't, and shouldn't, implement. In this regard, the UEFI
variables that happen to contain pstore data are no different from other
kinds of UEFI variables; they are equally opaque from the host side.
(Unless we want to implement and maintain a large utility that reflects
and tracks the multi-layer variable driver stack in edk2. "Unless" is
rhetorical, we don't want that.)

If host-side access is needed to the guest's phys-base / virt-base, then
my first preference would be the guest agent (interrogated at guest
startup), and my second preference would be virtio-pstore. I reckon
virtio-pstore will take a new guest driver, and I suppose the host-side
on-disk format is being designed for easy parsing.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14 10:33                       ` Daniel P. Berrange
  2016-11-14 11:08                         ` Laszlo Ersek
@ 2016-11-14 11:55                         ` Paolo Bonzini
  1 sibling, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2016-11-14 11:55 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Dave Young, Andrew Jones, bhe, qemu-devel, qiaonuohan, anderson,
	Laszlo Ersek



On 14/11/2016 11:33, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2016 at 11:28:04AM +0100, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 11:10, Daniel P. Berrange wrote:
>>> There's already patches posted to create a virtio-pstore device for
>>> QEMU, which is what led me to suggest this as an option:
>>>
>>>   https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00381.html
>>
>> It's also possible to use UEFI as a pstore backend.
> 
> Presumably that'll also require some QEMU patches to provide storage
> for UEFI's pstore ?

That's just the UEFI variable store.  But for some reason Fedora doesn't
set CONFIG_EFI_VARS, so the next possibility is to use ACPI ERST.  This
would not require any change to guests, unlike virtio-pstore.

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14 10:41     ` Paolo Bonzini
@ 2016-11-15 14:41       ` Dave Anderson
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Anderson @ 2016-11-15 14:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Jones, bhe, Dave Young, qemu-devel, qiaonuohan, lersek



----- Original Message -----
> 
> 
> On 09/11/2016 16:28, Dave Anderson wrote:
> > I'm not sure whether this "guest userspace agent" is still in play here,
> > but if there were such a thing, it could theoretically do the same
> > thing that crash currently does when running on a live system.
> > 
> > Both of those are available or calculatable from the contents of
> > a kdump header.  However, on a live system, it's done like this:
> > 
> > - /proc/kallsyms is queried for the symbol value of "_text", which would
> >   be relocated if KASLR is in play.  That value is compared against the
> >   "_text" symbol value compiled into the vmlinux file to determine the
> >   relocation value generated by CONFIG_RANDOMIZE_BASE.
> > 
> > [...] in order to read kernel symbols from the
> > statically-mapped kernel region based at __START_KERNEL_map, it
> > translates a (possibly relocated) kernel virtual address into a
> > physical address like this:
> > 
> >   physical-address = virtual-address - __START_KERNEL_map + phys_base
> > 
> > But it's a chicken-and-egg deal, because the contents of the "phys_base"
> > symbol are needed to calculate the physical address, but it can't
> > read the "phys_base" symbol contents without first knowing its contents.
> > 
> > So on a live system, the "phys_base" is calculated by reading
> > the "Kernel Code:" value from /proc/iomem, and then doing this:
> > 
> >   phys_base = [Kernel Code: value] - ["_text" symbol value] - __START_KERNEL_map
>                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Should there be parentheses around this?  

Yes, sorry, that's correct -- that's what the code does, and what I meant to express...

Dave


> The physical-address formula above is equivalent to
> 
>     phys_base = physical-address - (virtual-address - __START_KERNEL_map)
> 
> > 
> > So theoretically, the guest agent could read /proc/iomem and /proc/kallsyms
> > for the information required.  (I think...)
> 
> Then yes, the guest-agent could add a command get-kernel-text-start with an output like:
> 
> { 'virtual': 0xffffffff86000000, 'physical': 0xb6000000 }
> 
> and libvirt can expose it to crash.  In this case, phys_base would be 0xb0000000
> if I did the math right, and the relocation value is obtained by comparing the
> "virtual" address with the vmlinux "_text".
> 
> IIRC the guest agent runs as root, so reading /proc/iomem is not a problem.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support
  2016-11-14  9:47                   ` Andrew Jones
@ 2016-11-16  2:48                     ` Dave Young
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Young @ 2016-11-16  2:48 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Laszlo Ersek, bhe, qemu-devel, qiaonuohan, anderson

On 11/14/16 at 10:47am, Andrew Jones wrote:
> On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote:
> > On 11/09/16 at 04:38pm, Laszlo Ersek wrote:
> > > On 11/09/16 15:47, Daniel P. Berrange wrote:
> > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote:
> > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote:
> > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote:
> > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote:
> > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote:
> > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote:
> > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote:
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory
> > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash
> > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled.
> > > >>>>>>>>
> > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages
> > > >>>>>>>> from Dave below:
> > > >>>>>>>>
> > > >>>>>>>> """
> > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR
> > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no
> > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump
> > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value.
> > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which
> > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported.
> > > >>>>>>>> """
> > > >>>>>>>>
> > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it
> > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel
> > > >>>>>>>> list. If this is not correct please let me know.
> > > >>>>>>>>
> > > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump
> > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64.
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> When the -kernel command line option is used, then it may be possible
> > > >>>>>>> to extract some information that could be used to supplement the memory
> > > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific
> > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't
> > > >>>>>>> know where it is in the disk image, and it doesn't even know if it's
> > > >>>>>>> Linux.
> > > >>>>>>>
> > > >>>>>>> Is there anything a guest userspace application could probe from e.g.
> > > >>>>>>> /proc that would work? If so, then the guest agent could gain a new
> > > >>>>>>> feature providing that.
> > > >>>>>>
> > > >>>>>> I fully agree. This is exactly what I suggested too, independently, in
> > > >>>>>> the downstream thread, before arriving at this upstream thread. Let me
> > > >>>>>> quote that email:
> > > >>>>>>
> > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote:
> > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called
> > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source
> > > >>>>>>> file:
> > > >>>>>>>
> > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows
> > > >>>>>>>> #          using gdb to process the core file.
> > > >>>>>>>> #
> > > >>>>>>>> #          IMPORTANT: this option can make QEMU allocate several gigabytes
> > > >>>>>>>> #                     of RAM. This can happen for a large guest, or a
> > > >>>>>>>> #                     malicious guest pretending to be large.
> > > >>>>>>>> #
> > > >>>>>>>> #          Also, paging=true has the following limitations:
> > > >>>>>>>> #
> > > >>>>>>>> #             1. The guest may be in a catastrophic state or can have corrupted
> > > >>>>>>>> #                memory, which cannot be trusted
> > > >>>>>>>> #             2. The guest can be in real-mode even if paging is enabled. For
> > > >>>>>>>> #                example, the guest uses ACPI to sleep, and ACPI sleep state
> > > >>>>>>>> #                goes in real-mode
> > > >>>>>>>> #             3. Currently only supported on i386 and x86_64.
> > > >>>>>>>> #
> > > >>>>>>>
> > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons.
> > > >>>>>>>
> > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the
> > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is
> > > >>>>>>> not enlightened about the guest.
> > > >>>>>>>
> > > >>>>>>> If the additional information you are looking for can be retrieved
> > > >>>>>>> within the running Linux guest, using an appropriately privieleged
> > > >>>>>>> userspace process, then I would recommend considering an extension to
> > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could
> > > >>>>>>> first invoke the guest agent (a process with root privileges running
> > > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest
> > > >>>>>>> agent command would return the information necessary to deal with
> > > >>>>>>> KASLR. Then the management layer would initiate the dump like always.
> > > >>>>>>> Finally, the extra information would be combined with (or placed
> > > >>>>>>> beside) the dump file in some way.
> > > >>>>>>>
> > > >>>>>>> So, this proposal would affect the guest agent and the management
> > > >>>>>>> layer (= libvirt).
> > > >>>>>>
> > > >>>>>> Given that we already dislike "paging=true", enlightening
> > > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong
> > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent.
> > > >>>>>
> > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest
> > > >>>>> agent to fetch info is a complete non-starter as it'll be dead.
> > > 
> > > Yes, I realized this a while after posting...
> > > 
> > > >>>> So don't wait. Management software can make this query immediately
> > > >>>> after the guest agent goes live. The information needed won't change.
> > > 
> > > ... and then figured this would solve the problem.
> > > 
> > > >>> That doesn't help with trying to diagnose a crash during boot up, since
> > > >>> the guest agent isn't running till fairly late. I'm also concerned that
> > > >>> the QEMU guest agent is likely to be far from widely deployed in guests,
> > > 
> > > I have no hard data, but from the recent Fedora and RHEL-7 guest
> > > installations I've done, it seems like qga is installed automatically.
> > > (Not sure if that's because Anaconda realizes it's installing the OS in
> > > a VM.) Once I made sure there was an appropriate virtio-serial config in
> > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake)
> > > immediately.
> > > 
> > > >>> so reliance on the guest agent will mean the dump facility is no longer
> > > >>> reliably available.
> > > >>>
> > > >>
> > > >> It'd still be reliably available and useable during early boot, just like
> > > >> it is now, for kernels that don't use KASLR. This proposal is only
> > > >> attempting to *also* address KASLR kernels, for which there is currently
> > > >> no support whatsoever. Call it a best-effort.
> > > >>
> > > >> Of course we can get support for [probably] early boot and
> > > >> guest-agent-less guests using KASLR too if we introduce a paravirt
> > > >> solution, requiring guest kernel and KVM changes. Is it worth it?
> > > > 
> > > > There's a standard for persistent storage that is intended to allow
> > > > the kernel to dump out data at time of crash:
> > > > 
> > > >    https://lwn.net/Articles/434821/
> > > > 
> > > > and there's some recent patches to provide a QEMU backend. Could we
> > > > leverage that facility to get the data we need from the guest kernel ?
> > > > 
> > > > Instead of only using pstore at time of crash, the kernel could see
> > > > that its running on KVM, and write out the paging data to pstore. So
> > > > when QEMU later generates a core dump, it can grab the corresponding
> > > > data from pstore backend ?
> > > > 
> > > > Still requires an extra device, to be configured, but at lesat we
> > > > would not have to invent yet another paravirt device ourselves, just
> > > > use the existing framework.
> > > 
> > > Not disagreeing, I'd just like to point out that the kernel can also
> > > crash before the extra device (the pstore driver) is configured
> > > (especially if the driver is built as a module).
> > 
> > Boot phase crash is also a problem for kdump, but hopefully the boot
> > phase crash will be found early and get fixed early. The run time
> > problems are harder, it will still be helpful.
> > 
> > I'm not a virt expert, but from my feeling comparint guest agent and
> > pstore I would vote for guest agent, it is ready to work on now, no?
> > For pstore I'm not sure how to make a pstore device for all guests. I
> > know uefi guest can use its nvram, but introducing some general pstore
> > sounds hard..
> >
> 
> Nothing is stopping us from doing both, eventually. Care should be taken
> on the management side to make it general enough. It should be designed
> such that it can use guest-agent now, but in no way is bound to guest-
> agent. We can decide later if we want to replace guest-agent with some
> paravirt solution.
> 
> Nothing is blocking guest-agent patches now, that I know of.

Sounds a good idea, Drew.

Thanks
Dave
> 
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-11-16  2:48 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-09  3:01 [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support Dave Young
2016-11-09  3:17 ` Dave Young
2016-11-09  3:58   ` Wen Congyang
2016-11-09  5:02     ` Dave Young
2016-11-09  7:42       ` Wen Congyang
2016-11-09  8:25         ` Dave Young
2016-11-09 14:36       ` Dave Anderson
2016-11-09 14:42         ` Daniel P. Berrange
2016-11-09 10:40 ` Andrew Jones
2016-11-09 11:26   ` Laszlo Ersek
2016-11-09 11:37     ` Daniel P. Berrange
2016-11-09 11:48       ` Andrew Jones
2016-11-09 11:58         ` Daniel P. Berrange
2016-11-09 12:20           ` Andrew Jones
2016-11-09 14:47             ` Daniel P. Berrange
2016-11-09 15:38               ` Laszlo Ersek
2016-11-09 16:01                 ` Daniel P. Berrange
2016-11-14 10:27                   ` Paolo Bonzini
2016-11-14  5:32                 ` Dave Young
2016-11-14  9:47                   ` Andrew Jones
2016-11-16  2:48                     ` Dave Young
2016-11-14 10:10                   ` Daniel P. Berrange
2016-11-14 10:28                     ` Paolo Bonzini
2016-11-14 10:33                       ` Daniel P. Berrange
2016-11-14 11:08                         ` Laszlo Ersek
2016-11-14 11:55                         ` Paolo Bonzini
2016-11-09 15:28   ` Dave Anderson
2016-11-14 10:41     ` Paolo Bonzini
2016-11-15 14:41       ` Dave Anderson
2016-11-09 14:32 ` Dave Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.