From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36729) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6qGi-0006mn-95 for qemu-devel@nongnu.org; Tue, 15 Nov 2016 21:48:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6qGf-0007W1-6Q for qemu-devel@nongnu.org; Tue, 15 Nov 2016 21:48:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48242) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6qGe-0007Vr-Uw for qemu-devel@nongnu.org; Tue, 15 Nov 2016 21:48:09 -0500 Date: Wed, 16 Nov 2016 10:48:00 +0800 From: Dave Young Message-ID: <20161116024800.GB3686@dhcp-128-65.nay.redhat.com> References: <20161109104059.bvw5h4k4v77pw2rl@kamzik.brq.redhat.com> <9144d6b1-a1c9-e727-4673-9df10b227fdb@redhat.com> <20161109113735.GF22181@redhat.com> <20161109114809.cawi6tpsxwn5vfql@kamzik.brq.redhat.com> <20161109115819.GG22181@redhat.com> <20161109122051.ztllxmhwsalds2qw@kamzik.brq.redhat.com> <20161109144740.GI22181@redhat.com> <20161114053256.GA16939@dhcp-128-65.nay.redhat.com> <20161114094700.7gqldftx5kiu34rn@kamzik.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161114094700.7gqldftx5kiu34rn@kamzik.brq.redhat.com> Subject: Re: [Qemu-devel] virsh dump (qemu guest memory dump?): KASLR enabled linux guest support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrew Jones Cc: Laszlo Ersek , bhe@redhat.com, qemu-devel@nongnu.org, qiaonuohan@cn.fujitsu.com, anderson@redhat.com On 11/14/16 at 10:47am, Andrew Jones wrote: > On Mon, Nov 14, 2016 at 01:32:56PM +0800, Dave Young wrote: > > On 11/09/16 at 04:38pm, Laszlo Ersek wrote: > > > On 11/09/16 15:47, Daniel P. Berrange wrote: > > > > On Wed, Nov 09, 2016 at 01:20:51PM +0100, Andrew Jones wrote: > > > >> On Wed, Nov 09, 2016 at 11:58:19AM +0000, Daniel P. Berrange wrote: > > > >>> On Wed, Nov 09, 2016 at 12:48:09PM +0100, Andrew Jones wrote: > > > >>>> On Wed, Nov 09, 2016 at 11:37:35AM +0000, Daniel P. Berrange wrote: > > > >>>>> On Wed, Nov 09, 2016 at 12:26:17PM +0100, Laszlo Ersek wrote: > > > >>>>>> On 11/09/16 11:40, Andrew Jones wrote: > > > >>>>>>> On Wed, Nov 09, 2016 at 11:01:46AM +0800, Dave Young wrote: > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> Latest linux kernel enabled kaslr to randomiz phys/virt memory > > > >>>>>>>> addresses, we had some effort to support kexec/kdump so that crash > > > >>>>>>>> utility can still works in case crashed kernel has kaslr enabled. > > > >>>>>>>> > > > >>>>>>>> But according to Dave Anderson virsh dump does not work, quoted messages > > > >>>>>>>> from Dave below: > > > >>>>>>>> > > > >>>>>>>> """ > > > >>>>>>>> with virsh dump, there's no way of even knowing that KASLR > > > >>>>>>>> has randomized the kernel __START_KERNEL_map region, because there is no > > > >>>>>>>> virtual address information -- e.g., like "SYMBOL(_stext)" in the kdump > > > >>>>>>>> vmcoreinfo data to compare against the vmlinux file symbol value. > > > >>>>>>>> Unless virsh dump can export some basic virtual memory data, which > > > >>>>>>>> they say it can't, I don't see how KASLR can ever be supported. > > > >>>>>>>> """ > > > >>>>>>>> > > > >>>>>>>> I assume virsh dump is using qemu guest memory dump facility so it > > > >>>>>>>> should be first addressed in qemu. Thus post this query to qemu devel > > > >>>>>>>> list. If this is not correct please let me know. > > > >>>>>>>> > > > >>>>>>>> Could you qemu dump people make it work? Or we can not support virt dump > > > >>>>>>>> as long as KASLR being enabled. Latest Fedora kernel has enabled it in x86_64. > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> When the -kernel command line option is used, then it may be possible > > > >>>>>>> to extract some information that could be used to supplement the memory > > > >>>>>>> dump that dump-guest-memory provides. However, that would be a specific > > > >>>>>>> use. In general, QEMU knows nothing about the guest kernel. It doesn't > > > >>>>>>> know where it is in the disk image, and it doesn't even know if it's > > > >>>>>>> Linux. > > > >>>>>>> > > > >>>>>>> Is there anything a guest userspace application could probe from e.g. > > > >>>>>>> /proc that would work? If so, then the guest agent could gain a new > > > >>>>>>> feature providing that. > > > >>>>>> > > > >>>>>> I fully agree. This is exactly what I suggested too, independently, in > > > >>>>>> the downstream thread, before arriving at this upstream thread. Let me > > > >>>>>> quote that email: > > > >>>>>> > > > >>>>>> On 11/09/16 12:09, Laszlo Ersek wrote: > > > >>>>>>> [...] the dump-guest-memory QEMU command supports an option called > > > >>>>>>> "paging". Here's its documentation, from the "qapi-schema.json" source > > > >>>>>>> file: > > > >>>>>>> > > > >>>>>>>> # @paging: if true, do paging to get guest's memory mapping. This allows > > > >>>>>>>> # using gdb to process the core file. > > > >>>>>>>> # > > > >>>>>>>> # IMPORTANT: this option can make QEMU allocate several gigabytes > > > >>>>>>>> # of RAM. This can happen for a large guest, or a > > > >>>>>>>> # malicious guest pretending to be large. > > > >>>>>>>> # > > > >>>>>>>> # Also, paging=true has the following limitations: > > > >>>>>>>> # > > > >>>>>>>> # 1. The guest may be in a catastrophic state or can have corrupted > > > >>>>>>>> # memory, which cannot be trusted > > > >>>>>>>> # 2. The guest can be in real-mode even if paging is enabled. For > > > >>>>>>>> # example, the guest uses ACPI to sleep, and ACPI sleep state > > > >>>>>>>> # goes in real-mode > > > >>>>>>>> # 3. Currently only supported on i386 and x86_64. > > > >>>>>>>> # > > > >>>>>>> > > > >>>>>>> "virsh dump --memory-only" sets paging=false, for obvious reasons. > > > >>>>>>> > > > >>>>>>> [...] the dump-guest-memory command provides a raw snapshot of the > > > >>>>>>> virtual machine's memory (and of the registers of the VCPUs); it is > > > >>>>>>> not enlightened about the guest. > > > >>>>>>> > > > >>>>>>> If the additional information you are looking for can be retrieved > > > >>>>>>> within the running Linux guest, using an appropriately privieleged > > > >>>>>>> userspace process, then I would recommend considering an extension to > > > >>>>>>> the qemu guest agent. The management layer (libvirt, [...]) could > > > >>>>>>> first invoke the guest agent (a process with root privileges running > > > >>>>>>> in the guest) from the host side, through virtio-serial. The new guest > > > >>>>>>> agent command would return the information necessary to deal with > > > >>>>>>> KASLR. Then the management layer would initiate the dump like always. > > > >>>>>>> Finally, the extra information would be combined with (or placed > > > >>>>>>> beside) the dump file in some way. > > > >>>>>>> > > > >>>>>>> So, this proposal would affect the guest agent and the management > > > >>>>>>> layer (= libvirt). > > > >>>>>> > > > >>>>>> Given that we already dislike "paging=true", enlightening > > > >>>>>> dump-guest-memory with even more guest-specific insight is the wrong > > > >>>>>> approach, IMO. That kind of knowledge belongs to the guest agent. > > > >>>>> > > > >>>>> If you're trying to debug a hung/panicked guest, then using a guest > > > >>>>> agent to fetch info is a complete non-starter as it'll be dead. > > > > > > Yes, I realized this a while after posting... > > > > > > >>>> So don't wait. Management software can make this query immediately > > > >>>> after the guest agent goes live. The information needed won't change. > > > > > > ... and then figured this would solve the problem. > > > > > > >>> That doesn't help with trying to diagnose a crash during boot up, since > > > >>> the guest agent isn't running till fairly late. I'm also concerned that > > > >>> the QEMU guest agent is likely to be far from widely deployed in guests, > > > > > > I have no hard data, but from the recent Fedora and RHEL-7 guest > > > installations I've done, it seems like qga is installed automatically. > > > (Not sure if that's because Anaconda realizes it's installing the OS in > > > a VM.) Once I made sure there was an appropriate virtio-serial config in > > > the domain XMLs, I could talk to the agents (mainly for fstrim's sake) > > > immediately. > > > > > > >>> so reliance on the guest agent will mean the dump facility is no longer > > > >>> reliably available. > > > >>> > > > >> > > > >> It'd still be reliably available and useable during early boot, just like > > > >> it is now, for kernels that don't use KASLR. This proposal is only > > > >> attempting to *also* address KASLR kernels, for which there is currently > > > >> no support whatsoever. Call it a best-effort. > > > >> > > > >> Of course we can get support for [probably] early boot and > > > >> guest-agent-less guests using KASLR too if we introduce a paravirt > > > >> solution, requiring guest kernel and KVM changes. Is it worth it? > > > > > > > > There's a standard for persistent storage that is intended to allow > > > > the kernel to dump out data at time of crash: > > > > > > > > https://lwn.net/Articles/434821/ > > > > > > > > and there's some recent patches to provide a QEMU backend. Could we > > > > leverage that facility to get the data we need from the guest kernel ? > > > > > > > > Instead of only using pstore at time of crash, the kernel could see > > > > that its running on KVM, and write out the paging data to pstore. So > > > > when QEMU later generates a core dump, it can grab the corresponding > > > > data from pstore backend ? > > > > > > > > Still requires an extra device, to be configured, but at lesat we > > > > would not have to invent yet another paravirt device ourselves, just > > > > use the existing framework. > > > > > > Not disagreeing, I'd just like to point out that the kernel can also > > > crash before the extra device (the pstore driver) is configured > > > (especially if the driver is built as a module). > > > > Boot phase crash is also a problem for kdump, but hopefully the boot > > phase crash will be found early and get fixed early. The run time > > problems are harder, it will still be helpful. > > > > I'm not a virt expert, but from my feeling comparint guest agent and > > pstore I would vote for guest agent, it is ready to work on now, no? > > For pstore I'm not sure how to make a pstore device for all guests. I > > know uefi guest can use its nvram, but introducing some general pstore > > sounds hard.. > > > > Nothing is stopping us from doing both, eventually. Care should be taken > on the management side to make it general enough. It should be designed > such that it can use guest-agent now, but in no way is bound to guest- > agent. We can decide later if we want to replace guest-agent with some > paravirt solution. > > Nothing is blocking guest-agent patches now, that I know of. Sounds a good idea, Drew. Thanks Dave > > Thanks, > drew