kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Colp <patrick.colp@oracle.com>
To: "Adalbert Lazăr" <alazar@bitdefender.com>, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Mihai Donțu" <mdontu@bitdefender.com>,
	"Nicușor Cîțu" <ncitu@bitdefender.com>,
	"Mircea Cîrjaliu" <mcirjaliu@bitdefender.com>,
	"Marian Rotariu" <mrotariu@bitdefender.com>
Subject: Re: [RFC PATCH v4 08/18] kvm: add the VM introspection subsystem
Date: Fri, 22 Dec 2017 10:12:35 -0500	[thread overview]
Message-ID: <6c329fc6-4be8-8119-1516-2e41a106662e@oracle.com> (raw)
In-Reply-To: <1513951900.E02F46f7.12019@host>

On 2017-12-22 09:11 AM, Adalbert Laz����r wrote:
> We've made changes in all the places pointed by you, but read below.
> Thanks again,
> Adalbert
> 
> On Fri, 22 Dec 2017 02:34:45 -0500, Patrick Colp <patrick.colp@oracle.com> wrote:
>> On 2017-12-18 02:06 PM, Adalber Lazăr wrote:
>>> From: Adalbert Lazar <alazar@bitdefender.com>
>>>
>>> This subsystem is split into three source files:
>>>    - kvmi_msg.c - ABI and socket related functions
>>>    - kvmi_mem.c - handle map/unmap requests from the introspector
>>>    - kvmi.c - all the other
>>>
>>> The new data used by this subsystem is attached to the 'kvm' and
>>> 'kvm_vcpu' structures as opaque pointers (to 'kvmi' and 'kvmi_vcpu'
>>> structures).
>>>
>>> Besides the KVMI system, this patch exports the
>>> kvm_vcpu_ioctl_x86_get_xsave() and the mm_find_pmd() functions,
>>> adds a new vCPU request (KVM_REQ_INTROSPECTION) and a new VM ioctl
>>> (KVM_INTROSPECTION) used to pass the connection file handle from QEMU.
>>>
>>> Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
>>> Signed-off-by: Adalbert Lazăr <alazar@bitdefender.com>
>>> Signed-off-by: Nicușor Cîțu <ncitu@bitdefender.com>
>>> Signed-off-by: Mircea Cîrjaliu <mcirjaliu@bitdefender.com>
>>> Signed-off-by: Marian Rotariu <mrotariu@bitdefender.com>
>>> ---
>>>    arch/x86/include/asm/kvm_host.h |    1 +
>>>    arch/x86/kvm/Makefile           |    1 +
>>>    arch/x86/kvm/x86.c              |    4 +-
>>>    include/linux/kvm_host.h        |    4 +
>>>    include/linux/kvmi.h            |   32 +
>>>    include/linux/mm.h              |    3 +
>>>    include/trace/events/kvmi.h     |  174 +++++
>>>    include/uapi/linux/kvm.h        |    8 +
>>>    mm/internal.h                   |    5 -
>>>    virt/kvm/kvmi.c                 | 1410 +++++++++++++++++++++++++++++++++++++++
>>>    virt/kvm/kvmi_int.h             |  121 ++++
>>>    virt/kvm/kvmi_mem.c             |  730 ++++++++++++++++++++
>>>    virt/kvm/kvmi_msg.c             | 1134 +++++++++++++++++++++++++++++++
>>>    13 files changed, 3620 insertions(+), 7 deletions(-)
>>>    create mode 100644 include/linux/kvmi.h
>>>    create mode 100644 include/trace/events/kvmi.h
>>>    create mode 100644 virt/kvm/kvmi.c
>>>    create mode 100644 virt/kvm/kvmi_int.h
>>>    create mode 100644 virt/kvm/kvmi_mem.c
>>>    create mode 100644 virt/kvm/kvmi_msg.c
>>>
>>> +int kvmi_set_mem_access(struct kvm *kvm, u64 gpa, u8 access)
>>> +{
>>> +	struct kvmi_mem_access *m;
>>> +	struct kvmi_mem_access *__m;
>>> +	struct kvmi *ikvm = IKVM(kvm);
>>> +	gfn_t gfn = gpa_to_gfn(gpa);
>>> +
>>> +	if (kvm_is_error_hva(gfn_to_hva_safe(kvm, gfn)))
>>> +		kvm_err("Invalid gpa %llx (or memslot not available yet)", gpa);
>>
>> If there's an error, should this not return or something instead of
>> continuing as if nothing is wrong?
> 
> It was a debug message masqueraded as an error message to be logged in dmesg.
> The page will be tracked when the memslot becomes available.

I began to wonder if that's what was going on afterwards (I saw 
kvm_err() used in some other places where it was more obvious that it 
was debug messages).

>>> +static bool alloc_kvmi(struct kvm *kvm)
>>> +{
>>> +	bool done;
>>> +
>>> +	mutex_lock(&kvm->lock);
>>> +	done = (
>>> +		maybe_delayed_init() == 0    &&
>>> +		IKVM(kvm)            == NULL &&
>>> +		__alloc_kvmi(kvm)    == true
>>> +	);
>>> +	mutex_unlock(&kvm->lock);
>>> +
>>> +	return done;
>>> +}
>>> +
>>> +static void alloc_all_kvmi_vcpu(struct kvm *kvm)
>>> +{
>>> +	struct kvm_vcpu *vcpu;
>>> +	int i;
>>> +
>>> +	mutex_lock(&kvm->lock);
>>> +	kvm_for_each_vcpu(i, vcpu, kvm)
>>> +		if (!IKVM(vcpu))
>>> +			__alloc_vcpu_kvmi(vcpu);
>>> +	mutex_unlock(&kvm->lock);
>>> +}
>>> +
>>> +static bool setup_socket(struct kvm *kvm, struct kvm_introspection *qemu)
>>> +{
>>> +	struct kvmi *ikvm = IKVM(kvm);
>>> +
>>> +	if (is_introspected(ikvm)) {
>>> +		kvm_err("Guest already introspected\n");
>>> +		return false;
>>> +	}
>>> +
>>> +	if (!kvmi_msg_init(ikvm, qemu->fd))
>>> +		return false;
>>
>> kvmi_msg_init assumes that ikvm is not NULL -- it makes no check and
>> then does "WRITE_ONCE(ikvm->sock, sock)". is_introspected() does check
>> if ikvm is NULL, but if it is, it returns false, which would still end
>> up here. There should be a check that ikvm is not NULL before this if
>> statement.
> 
> setup_socket() is called only when 'ikvm' is not NULL.

Ah, right. Forgot to check that :)

> 
> is_introspected() checks 'ikvm' because it is called from other contexts.
> The real check is ikvm->sock (to see if the 'command channel' is 'active').

Yes, that makes more sense.

> 
>>> +
>>> +	ikvm->cmd_allow_mask = -1; /* TODO: qemu->commands; */
>>> +	ikvm->event_allow_mask = -1; /* TODO: qemu->events; */
>>> +
>>> +	alloc_all_kvmi_vcpu(kvm);
>>> +	queue_work(wq, &ikvm->work);
>>> +
>>> +	return true;
>>> +}
>>> +
>>> +/*
>>> + * When called from outside a page fault handler, this call should
>>> + * return ~0ull
>>> + */
>>> +static u64 kvmi_mmu_fault_gla(struct kvm_vcpu *vcpu, gpa_t gpa)
>>> +{
>>> +	u64 gla;
>>> +	u64 gla_val;
>>> +	u64 v;
>>> +
>>> +	if (!vcpu->arch.gpa_available)
>>> +		return ~0ull;
>>> +
>>> +	gla = kvm_mmu_fault_gla(vcpu);
>>> +	if (gla == ~0ull)
>>> +		return gla;
>>> +	gla_val = gla;
>>> +
>>> +	/* Handle the potential overflow by returning ~0ull */
>>> +	if (vcpu->arch.gpa_val > gpa) {
>>> +		v = vcpu->arch.gpa_val - gpa;
>>> +		if (v > gla)
>>> +			gla = ~0ull;
>>> +		else
>>> +			gla -= v;
>>> +	} else {
>>> +		v = gpa - vcpu->arch.gpa_val;
>>> +		if (v > (U64_MAX - gla))
>>> +			gla = ~0ull;
>>> +		else
>>> +			gla += v;
>>> +	}
>>> +
>>> +	return gla;
>>> +}
>>> +
>>> +static bool kvmi_track_preread(struct kvm_vcpu *vcpu, gpa_t gpa,
>>> +			       u8 *new,
>>> +			       int bytes,
>>> +			       struct kvm_page_track_notifier_node *node,
>>> +			       bool *data_ready)
>>> +{
>>> +	u64 gla;
>>> +	struct kvmi_vcpu *ivcpu = IVCPU(vcpu);
>>> +	bool ret = true;
>>> +
>>> +	if (kvm_mmu_nested_guest_page_fault(vcpu))
>>> +		return ret;
>>> +	gla = kvmi_mmu_fault_gla(vcpu, gpa);
>>> +	ret = kvmi_page_fault_event(vcpu, gpa, gla, KVMI_PAGE_ACCESS_R);
>>
>> Should you not check the value of ret here before proceeding?
>>
> 
> Indeed. These 'track' functions are new additions and aren't integrated
> well with kvmi_page_fault_event(). We'll change this. The code is ugly
> but 'safe' (ctx_size will be non-zero only with ret == true).
> 

Ah, yes, I can see that now. Not the most obvious interaction.

>>> +	if (ivcpu && ivcpu->ctx_size > 0) {
>>> +		int s = min_t(int, bytes, ivcpu->ctx_size);
>>> +
>>> +		memcpy(new, ivcpu->ctx_data, s);
>>> +		ivcpu->ctx_size = 0;
>>> +
>>> +		if (*data_ready)
>>> +			kvm_err("Override custom data");
>>> +
>>> +		*data_ready = true;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +bool kvmi_hook(struct kvm *kvm, struct kvm_introspection *qemu)
>>> +{
>>> +	kvm_info("Hooking vm with fd: %d\n", qemu->fd);
>>> +
>>> +	kvm_page_track_register_notifier(kvm, &kptn_node);
>>> +
>>> +	return (alloc_kvmi(kvm) && setup_socket(kvm, qemu));
>>
>> Is this safe? It could return false if the alloc fails (in which case
>> the caller has to do nothing) or if setting up the socket fails (in
>> which case the caller needs to free the allocated kvmi).
>>
> 
> If the socket fails for any reason (eg. the introspection tool is
> stopped == socket closed) 'the plan' is to signal QEMU to reconnect
> (and call kvmi_hook() again) or else let the introspected VM continue (and
> try to reconnect asynchronously).
> 
> I see that kvm_page_track_register_notifier() should not be called more
> than once.
> 
> Maybe we should rename this to kvmi_rehook() or kvmi_reconnect().

I assume that a kvmi_rehook() function would then not call 
kvm_page_track_register_notifier() or at least have some check to make 
sure it only calls it once?

One approach would be to have separate kvmi_hook() and kvmi_rehook() 
functions. Another possibility is to have kvmi_hook() take an extra 
argument that's a boolean to specify if it's the first attempt at 
hooking or not.

>>> +bool kvmi_breakpoint_event(struct kvm_vcpu *vcpu, u64 gva)
>>> +{
>>> +	u32 action;
>>> +	u64 gpa;
>>> +
>>> +	if (!is_event_enabled(vcpu->kvm, KVMI_EVENT_BREAKPOINT))
>>> +		/* qemu will automatically reinject the breakpoint */
>>> +		return false;
>>> +
>>> +	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
>>> +
>>> +	if (gpa == UNMAPPED_GVA)
>>> +		kvm_err("%s: invalid gva: %llx", __func__, gva);
>>
>> If the gpa is unmapped, shouldn't it return false rather than proceeding?
>>
> 
> This was just a debug message. I'm not sure if is possible for 'gpa'
> to be unmapped. Even so, the introspection tool should still be notified.
> 
>>> +
>>> +	action = kvmi_msg_send_bp(vcpu, gpa);
>>> +
>>> +	switch (action) {
>>> +	case KVMI_EVENT_ACTION_CONTINUE:
>>> +		break;
>>> +	case KVMI_EVENT_ACTION_RETRY:
>>> +		/* rip was most likely adjusted past the INT 3 instruction */
>>> +		return true;
>>> +	default:
>>> +		handle_common_event_actions(vcpu, action);
>>> +	}
>>> +
>>> +	/* qemu will automatically reinject the breakpoint */
>>> +	return false;
>>> +}
>>> +EXPORT_SYMBOL(kvmi_breakpoint_event);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-12-22 15:12 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-18 19:06 [RFC PATCH v4 00/18] VM introspection Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 01/18] kvm: add documentation and ABI/API headers for the VM introspection subsystem Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 02/18] add memory map/unmap support for VM introspection on the guest side Adalber Lazăr
2017-12-21 21:17   ` Patrick Colp
2017-12-22 10:44     ` Mircea CIRJALIU-MELIU
2017-12-22 14:30       ` Patrick Colp
2017-12-18 19:06 ` [RFC PATCH v4 03/18] kvm: x86: add kvm_arch_msr_intercept() Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 04/18] kvm: x86: add kvm_mmu_nested_guest_page_fault() and kvmi_mmu_fault_gla() Adalber Lazăr
2017-12-21 21:29   ` Patrick Colp
2017-12-22 11:50     ` Mihai Donțu
2017-12-18 19:06 ` [RFC PATCH v4 05/18] kvm: x86: add kvm_arch_vcpu_set_regs() Adalber Lazăr
2017-12-21 21:39   ` Patrick Colp
2017-12-22  9:29     ` alazar
2017-12-18 19:06 ` [RFC PATCH v4 06/18] kvm: vmx: export the availability of EPT views Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 07/18] kvm: page track: add support for preread, prewrite and preexec Adalber Lazăr
2017-12-21 22:01   ` Patrick Colp
2017-12-22 10:01     ` alazar
2017-12-18 19:06 ` [RFC PATCH v4 08/18] kvm: add the VM introspection subsystem Adalber Lazăr
2017-12-22  7:34   ` Patrick Colp
2017-12-22 14:11     ` Adalbert Lazăr
2017-12-22 15:12       ` Patrick Colp [this message]
2017-12-22 15:51         ` alazar
2017-12-22 16:26           ` Patrick Colp
2017-12-22 16:02   ` Paolo Bonzini
2017-12-22 16:18     ` Mircea CIRJALIU-MELIU
2017-12-22 16:35       ` Paolo Bonzini
2017-12-22 16:09   ` Paolo Bonzini
2017-12-22 16:34     ` Mircea CIRJALIU-MELIU
2017-12-18 19:06 ` [RFC PATCH v4 09/18] kvm: hook in " Adalber Lazăr
2017-12-22 16:36   ` Patrick Colp
2017-12-18 19:06 ` [RFC PATCH v4 10/18] kvm: x86: handle the new vCPU request (KVM_REQ_INTROSPECTION) Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 11/18] kvm: x86: hook in the page tracking Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 12/18] kvm: x86: hook in kvmi_breakpoint_event() Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 13/18] kvm: x86: hook in kvmi_descriptor_event() Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 14/18] kvm: x86: hook in kvmi_cr_event() Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 15/18] kvm: x86: hook in kvmi_xsetbv_event() Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 16/18] kvm: x86: hook in kvmi_msr_event() Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 17/18] kvm: x86: handle the introspection hypercalls Adalber Lazăr
2017-12-18 19:06 ` [RFC PATCH v4 18/18] kvm: x86: hook in kvmi_trap_event() Adalber Lazăr
2018-01-03  3:34 ` [RFC PATCH v4 00/18] VM introspection Xiao Guangrong
2018-01-03 14:32   ` Mihai Donțu
2018-01-03 18:52 ` Adalbert Lazăr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c329fc6-4be8-8119-1516-2e41a106662e@oracle.com \
    --to=patrick.colp@oracle.com \
    --cc=alazar@bitdefender.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcirjaliu@bitdefender.com \
    --cc=mdontu@bitdefender.com \
    --cc=mrotariu@bitdefender.com \
    --cc=ncitu@bitdefender.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).