From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <George.Dunlap@eu.citrix.com>
Subject: Re: [PATCH for-4.5 v6 06/16] xen: Convert vmware_port
 to xentrace usage
Date: Thu, 25 Sep 2014 16:14:30 +0100
Message-ID: <CAFLBxZaH-iRLyFTN+7na0a6NxFmQsi9hnTgwhdCSf201-D24Pg@mail.gmail.com>
References: <1411236447-7435-1-git-send-email-dslutz@verizon.com>
	<1411236447-7435-7-git-send-email-dslutz@verizon.com>
	<5422FF1D.8000308@eu.citrix.com> <5423167A.1070901@terremark.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <5423167A.1070901@terremark.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Don Slutz <dslutz@verizon.com>
Cc: Tim Deegan <tim@xen.org>, Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Eddie Dong <eddie.dong@intel.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Jan Beulich <jbeulich@suse.com>, Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>, Jun Nakajima <jun.nakajima@intel.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
List-Id: xen-devel@lists.xenproject.org

On Wed, Sep 24, 2014 at 8:07 PM, Don Slutz <dslutz@verizon.com> wrote:
>
> On 09/24/14 13:27, George Dunlap wrote:
>>
>> On 09/20/2014 07:07 PM, Don Slutz wrote:
>>>
>>> Reduce the VMPORT_DBG_LOG calls.
>>
>>
>> You should also have mentioned that you added hew HVMTRACE macros which
>> will log the TSC value.
>>
>> The reason the HVMTRACE macros don't log the TSC values is that for the
>> most part you can get all the timing information you need from the TSC on
>> the vmexit and vmenter.  Looking at where you've added the TSC values, I
>> don't really see how it adds anything except bloat to the log.  Is there a
>> reason you need to know exactly when these different things happened,
>> instead of just being able to bracket them between VMENTER and VMEXITs?
>>
>
> I did want a way to know how long the VMware code was taking.  I am
> not sure this is required.
>
> For example:
>
> CPU1  2899550319282 (+    4170)  VMEXIT      [ exitcode = 0x00000000, rIP  =
> 0x00007fad13ffec8c ]
> CPU1  2899550320086 (+     804)  TRAP_GP     [ inst_len = 1 edx = 0x00005658
> exitinfo1 = 0x0000000000000000 exitinfo2 = 0x0000000000000000 ]
> CPU1  2899550325054 (+    4968)  VMPORT_READ_BEFORE  [ eax = 0x564d5868 ebx
> = 0x000001b3 ecx = 0x0003001e edx = 0x00005658 esi = 0x00000000 edi =
> 0x000001b3 ]
> CPU1  2899550326050 (+     996)  VMPORT_READ_AFTER  [ eax = 0x564d5868 ebx =
> 0x000001b3 ecx = 0x0001001e edx = 0x00005658 esi = 0x00000000 edi =
> 0x000001b3 ]
> CPU1  2899550326722 (+     672)  vlapic_accept_pic_intr [ i8259_target = 1,
> accept_pic_int = 0 ]
> CPU1  2899550327454 (+     732)  VMENTRY
>
>
> But I am happy to drop the new log TSC macros.

The tracing function itself is not free -- the trace_var() function
probably executes 5x the amount of code that is actually executed
between the READ_BEFORE and READ_AFTER (given it's a switch statement
and each one is basically a handful of variable assignments).  It's
not unlikely that most of the 996 us there is in the trace function
itself.

>>> diff --git a/xen/arch/x86/hvm/vmware/vmport.c
>>> b/xen/arch/x86/hvm/vmware/vmport.c
>>> index 811c303..962ee32 100644
>>> --- a/xen/arch/x86/hvm/vmware/vmport.c
>>> +++ b/xen/arch/x86/hvm/vmware/vmport.c
>
>> We normally don't log both BEFORE and AFTER states of things like
>> hypercalls -- just logging the outcome of what the hypervisor did should be
>> sufficient, shouldn't it?
>
>
> Not that clear with this poorly build hyper-call.
>
>> Do you really need to know the value of things that got clobbered?
>
>
> When checking on the complex state machine that is the "RPC" code
> it was very helpful.  With that code moving to QEMU, the before and
> after is not so important.
>
> Here is a real example:
>
> CPU2  865821836508576 (+    2562)  VMEXIT      [ exitcode = 0x00000000, rIP
> = 0x00007f68a8b17c8c ]
> CPU2  865821836509362 (+     786)  TRAP_GP     [ inst_len = 1 edx =
> 0x00025658 exitinfo1 = 0x0000000000000000 exitinfo2 = 0x0000000000000000 ]
> CPU2  865821836514132 (+    4770)  VMPORT_READ_BEFORE  [ eax = 0x564d5868
> ebx = 0x00000034 ecx = 0x0002001e edx = 0x00025658 esi = 0x00000000 edi =
> 0x000001be ]
> CPU2  865821836597832 (+   83700)  VMPORT_READ_AFTER  [ eax = 0x564d5868 ebx
> = 0x00000034 ecx = 0x0001001e edx = 0x00025658 esi = 0x00000000 edi =
> 0x000001be ]
> CPU2  865821836598756 (+     924)  vlapic_accept_pic_intr [ i8259_target =
> 1, accept_pic_int = 0 ]
> CPU2  865821836605602 (+    6846)  vlapic_accept_pic_intr [ i8259_target =
> 1, accept_pic_int = 0 ]
> CPU2  865821836606436 (+     834)  VMENTRY
> CPU2  865821836609712 (+    3276)  VMEXIT      [ exitcode = 0x00000000, rIP
> = 0x00007f68a8b17c8c ]
> CPU2  865821836610654 (+     942)  TRAP_GP     [ inst_len = 1 edx =
> 0x00025658 exitinfo1 = 0x0000000000000000 exitinfo2 = 0x0000000000000000 ]
> CPU2  865821836616828 (+    6174)  VMPORT_READ_BEFORE  [ eax = 0x564d5868
> ebx = 0x00000034 ecx = 0x0003001e edx = 0x00025658 esi = 0x00000000 edi =
> 0x000001be ]
> CPU2  865821836617800 (+     972)  VMPORT_READ_AFTER  [ eax = 0x564d5868 ebx
> = 0x00000011 ecx = 0x0003001e edx = 0x00015658 esi = 0x00000000 edi =
> 0x000001be ]
> CPU2  865821836618664 (+     864)  vlapic_accept_pic_intr [ i8259_target =
> 1, accept_pic_int = 0 ]
> CPU2  865821836619444 (+     780)  VMENTRY
>
> Note that in the one "RPC" call,
>
>
> grep VMPORT_READ_BEFORE ~/zz-xentrace-vmware3-0.out | wc
>    1592   39800  256312
>
> It took 1592 #GP traps to handle it, and 9643628760 tsc cycles.

Right, so what I started to say yesterday: It looks like most of the
trace points you're adding here is to help you debug the functionality
of the hypervisor.  That certainly makes sense for you to do during
development.  But what we want in upstream is something that will help
us in production.  For that to be useful, we need the logging to be as
efficient as possible. Every additional HVM trace point potentially
adds more data to someone else's HVM trace. So we don't want
extraneous information, and we don't want to log something that we can
infer from something else.

In general, it's enough to give information about the decisions that
Xen is making to infer what the previous state is; and then giving
information about what Xen did in response (i.e., return values,
injection of traps, &c) to help figure out how the guest responded.
In this case, I'd probably trace:
1. vmport hypercall, handled command
 - cmd
 - return values of modified registers.  Ideally only the registers
that are modified, but just taking a big batch would be OK for now.
 Note: No need for rip here, as it will be collected at the VMEXIT
2. vmport hypercall, unhandled command
 - Just the unimplemented fail
3. In the vmport_gp_check(), if it successfully decodes an IO instruction:
 - direction (read/write)
 - size of the access

I might consider logging something the failure path of
*_vmexit_gp_intercept(), with information that might help you figure
out why it didn't make it to the vmcall; but on the whole I think I'd
probably leave that off.

Hopefully all that would give you enough information to figure out
where the problem was and how to reproduce the behavior locally; and
once you can reproduce it locally, you could add in debugging traces
(which wouldn't be upstreamed) to help you figure out why it wasn't
taking the path you expected.

Does that make sense?

If you really want more traces, I might consider allowing them in the
code but off by default; but I think you probably won't need more
information from running production systems.

 -George