From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mihai =?UTF-8?Q?Don=C8=9Bu?= <mdontu@bitdefender.com>
Subject: Re: [RFC PATCH 00/19] Guest introspection
Date: Tue, 27 Jun 2017 19:12:48 +0300
Message-ID: <1498579968.10334.37.camel@bitdefender.com>
References: <20170616134348.17725-1-alazar@bitdefender.com>
         <1befe4ae-9c9c-eb3e-589d-775b23ba3152@siemens.com>
         <1497626293.10504.9.camel@bitdefender.com>
         <4c48494d-ee5e-ddd5-ce92-c29c25316ca3@siemens.com>
         <20170619093928.GA17304@stefanha-x1.localdomain>
         <1497970721.139b8aD.12149@host>
         <20170621110407.GE16183@stefanha-x1.localdomain>
         <645181790.10962776.1498051547977.JavaMail.zimbra@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Cc: alazar@bitdefender.com, Jan Kiszka <jan.kiszka@siemens.com>,
        Radim =?UTF-8?Q?Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com>,
        kvm@vger.kernel.org, Stefan Hajnoczi <stefanha@gmail.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx01.bbu.dsd.mx.bitdefender.com ([91.199.104.161]:57818 "EHLO
        mx01.bbu.dsd.mx.bitdefender.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1751668AbdF0QMu (ORCPT
        <rfc822;kvm@vger.kernel.org>); Tue, 27 Jun 2017 12:12:50 -0400
Received: from smtp02.buh.bitdefender.net (smtp.bitdefender.biz [10.17.80.76])
        by mx-sr.buh.bitdefender.com (Postfix) with ESMTP id 741727FC40
        for <kvm@vger.kernel.org>; Tue, 27 Jun 2017 19:12:48 +0300 (EEST)
In-Reply-To: <645181790.10962776.1498051547977.JavaMail.zimbra@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Wed, 2017-06-21 at 09:25 -0400, Paolo Bonzini wrote:
> On 21/06/2017 13:04, Stefan Hajnoczi wrote:
> > On Tue, Jun 20, 2017 at 05:58:41PM +0300, alazar@bitdefender.com wrote:
> > > Moving the vsock to userland will change this:
> > > 
> > >                                      -----------------------------
> > >                  /----- /dev/kvm -->| new_tool (guest on/off/list)|<-- vsock -->\
> > >                  |                   -----------------------------              |
> > >                  |                                                              |
> > >  ----------------v-                  -----------------------------              |
> > > >                  |<-- /dev/kvm -->| qemu        VM1             |<-- vsock -->|
> > > >                  |                |-------                      |             |
> > > >                  |                | Linux |                     |             |
> > > > KVM              |                 -----------------------------              |
> > > >                  |<-- /dev/kvm -->| qemu        VM2             |<-- vsock -->|
> > > >                  |                |---------                    |             |
> > > >                  |                | Windows |                   |             |
> > > >                  |                 -----------------------------              |
> > > >                  |<-- /dev/kvm -->| qemu        VM3      /----->|<-- vsock -->/
> > > >           -------|                |---------------------v----   |
> > > >          | kvmi  |                | guest introspection tool |  |
> > > 
> > >  ------------------                  -----------------------------
> > > 
> > > There will be a need for a new tool (and/or libvirt modified) to get
> > > the guest events (on/off/list) and change the VM1, VM2 invocations (to
> > > make them connect with the introspection tool
> 
> This kind of event should be provided directly by QEMU to the guest
> introspection tool---see below.
> 
> > > This might also be a
> > > problem with products having the host locked down (eg. RHEV).
> > 
> > I think that is desirable in fact.  kvmi should be an explicit feature
> > that is controlled by the management tools.  This way the policy can be
> > decided by the administrator.  Libvirt changes will be necessary.
> > 
> > Some KVM users do not want kvmi.  Think of the new memory encryption
> > hardware support that is coming out - the point is to prevent the
> > hypervisor from looking inside the VMs!  What you are doing is the
> > opposite of that.

Apologies for the late reply.

> I think Stefan has made quite a point here.  The policy manager for
> kvmi should definitely be on the host, not on the introspector machine.
> There can be multiple introspectors, some on the host and some on an
> appliance, though I suppose a limit of one introspector per VM is
> acceptable.

The host should, indeed, control whether the introspection feature
should be made available. I can see this being a checkbox in, say,
virt-manager.

Assuming the feature is enabled, the only policy we are interested in
is whether our application should indeed try and introspect a guest,
and this is connected to libvirt. For example, our management solution
will query libvirt about running VM-s and then, depending on
configuration made by an administrator, will tell our application which
VM-s to actually introspect. This is where the UUID comes into play:
the management solution refers to VM-s by their UUID and in turn the
application must be able to convert those to an actual handle (a file
descriptor or something else).

The flow you described below seems to make room for this: during the
initial handshake, qemu could tell our application the UUID of the
guest and we'd keep a map of sorts. No need to put that in kernel.

> And this should be the starting point of the design.
> 
> Compared to Stefan's proposed command line:
> 
>   qemu --chardev socket,id=chardev0,type=vsock,port=1234,server,nowait \
>        --guest-introspection chardev=chardev0,allowed-cids=10
> 
> I would do it in the opposite direction.  The introspector is the one that
> presents a server socket; QEMU connects to the introspection VM, possibly
> does some handshaking, and passes the file descriptor to KVM.  With another
> small change, replacing --guest-introspection with the generic --object, that
> gives the following:
> 
>   qemu --chardev socket,id=chardev0,type=vsock,cid=10,port=1234,nowait \
>        --object introspection chardev=chardev0,allow=all,id=kvmi \
>        --accel kvm,introspection=kvmi
> 
> The policy is specified via kvmi-{allow,deny} parameters and passed to KVM
> via ioctls together with the socket file descriptor.

I understand from this that the policy controls whether a certain VM
can be introspected. I'd imagine that it will be default "false" and
set to "true" respectively whenever an "introspection" object is
specified.

> This lets you reuse common POSIX concepts and simplify the kernel code.
> KVMI_EVENT_GUEST_ON is just POLLIN on the server socket (plus handshaking
> on the client socket); KVMI_EVENT_GUEST_OFF is POLLHUP on the client socket.
> There's no need for KVM to know a UUID, as the introspection application
> can just have your usual poll() event loop or thread, and look up the VM
> from the file descriptor.
> 
> QEMU supports socket reconnection, so you don't need KVMI_GET_GUESTS either.
> If KVM cannot write to the socket, it should exit to userspace with a new
> KVM_EXIT_KVMI vmexit (which can have multiple subcodes, one of them being
> KVM_EXIT_KVMI_SOCKET_ERROR).

If I understand all of the above correctly, qemu will initiate the
connection to the introspection tool and after a handshake pass the
file descritor to KVM thus making further communication take place only
between the tool and the host kernel (no need to pass through the host 
user space).

> Of course the link need not even be VSOCK-based.  It can be a Unix socket
> as Stefan has already mentioned, which is always nice when debugging or
> writing unit tests.  I assume you'll want later some VMFUNC-based access
> to the guest's memory; local introspection tools could use an alternative
> way via file descriptor passing, similar to what is used already by vhost-user.
> And dually, a hypothetical vhost-user server living in a VM could use VMFUNC
> to access guest memory without being able to do all the kind of ugly traps
> that your current usecase does.  This is another reason why policy has to
> be in userspace.
> 
> Also, as a matter of fact: this series does not include either documentation
> or unit tests.  That's seriously bad.
> 
> Patch 1 should explain the socket protocol in English and only affect
> Documentation/ and possibly arch/x86/include/uapi.  There's no way that
> I can review 2000 lines of code without even knowing what it is supposed
> to be like.  In fact, for the next RFC, perhaps you should only submit
> patch 1. :)

Noted! Thank you,

-- 
Mihai Donțu