All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.5 v7 0/7]  Xen VMware tools support
@ 2014-10-02 21:30 Don Slutz
  2014-10-02 21:30 ` [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves Don Slutz
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

Changes v6 to v7:
  summary of changes.

  George Dunlap:
    Any doc about this?
      Added reference to:
        https://sites.google.com/site/chitchatvmback/backdoor
      Last updated: Feb. 2008

  George Dunlap & Jan Beulich
    Too much logging and tracing.
      Dropped a lot of it.  This includes vmport_debug=

  Ian Campbell:
    Any reason RPC code cannot be done in QEMU?
      Not that I know of, so dropped all parts of RPC code.
    Default handling of hvm.vga.kind bad.
      Fixed.
    Default of vmware_port should be based on vmware_hw.
      Done. 

  Tim Deegan:
    CPL check of GETHZ needs to be fixed somewhere.
      Added check for CPL == 0 (assuming this is what VMware is
      checking.  Matches the testing.

  Ian Campbell, Andrew Cooper, George Dunlap, Boris Ostrovsky,
   & Jan Beulich
     Various minor fixes.
    
  Per patch notes:
    #1 "xen: Add support for VMware cpuid leaves":
      Prevent setting of HVM_PARAM_VIRIDIAN if HVM_PARAM_VMWARE_HW set.
    #4 "xen: Add vmware_port support":
      More on AMD in the commit message.
      Switch to only change 32bit part of registers, what VMware
        does.
    #6 "Add xentrace to vmware_port":
      Dropped some of the new traces.
      Added HVMTRACE_ND7.
    #7 "Add xen-hvm-param":
       Was a later patch.  Still optional.
       Fixed formatting.
       Adjust for drop of VMware RPC.

Comments on v3, v4, v5, v6:
  George Dunlap:
    Is there any reason not to merge 05/16 with 03/16?
      The reason I have is that v3 03/16 only contains new files. 2
      from VMware and 1 to allow use of the VMware files.  I added
      xen/arch/x86/hvm/vmware/includeCheck.h at the request of
      Konrad Wilk.

      This patch has many style issues and white space issues.  So I
      want it as a separate patch so as to be clear on what files do
      not meet the coding style.  And why and where they came from.

Changes v5 to v6:
  Boris Ostrovsky & Jan Beulich
    #4 "xen: Add vmware_port support":
    #6 "xen: Convert vmware_port to xentrace usage":
    There is an issue with reading instruction bytes more then once.
      Dropped the attempt to use svm_nextrip_insn_length via
      __get_instruction_length (added in v2).  Just always look
      at upto 15 bytes on AMD.

Changes v4 to v5:
  Re tagged the optional patches.

  Added debug=y build checking that vmx is defining
  VM_EXIT_INTR_ERROR_CODE.

  Boris Ostrovsky:
    #1 "xen: Add support for VMware cpuid leaves":
      Given how is_viridian and is_vmware are defined I think '||' is more
      appropriate.
        Fixed.
    #4 "xen: Add is_vmware_port_enabled":
      we should make sure that svm_vmexit_gp_intercept is not executed for
      any other guest.
        Added an ASSERT on is_vmware_port_enabled.
      magic integers?
        Added #define for them.
    #6 "xen: Convert vmware_port to xentrace usage":
      exitinfo1 is used twice.
        Fixed.
    #7 "tools: Convert vmware_port to xentrace usage":
      'bytes = 0x%(2)d' or 'bytes = %(2)d' ?
        Fixed.
    #8 "xen: Add limited support of VMware's hyper-call rpc":
      PV vs. HVM vs. PVH. So probably 'if(is_hvm_vcpu)'?
        I see no reason to exclude PVH.   Will change to has_hvm_container_vcpu
    #11 "Add live migration of VMware's hyper-call":
      You ASSERTed that vg->key_len is 1 so you may not need the 'if'.
        That is a ASSERT(sizeof, not just ASSERT -- not changed.
      Use real errno, not -1.
        Fixed.
      No ASSERT in vmport_load_domain_ctxt
        Added.

  Jan Beulich & Boris Ostrovsky:
    #8 "xen: Add limited support of VMware's hyper-call rpc":
      The names of all three functions are bogus.
        removed static support routines.
        Also changed in #1.

  Andrew Cooper:
    #2 "tools: Add vmware_hw support":
      Anything looking for Xen according to the Xen cpuid instructions...
        Adjusted doc to new wording.
    #4 "xen: Add is_vmware_port_enabled":
      I am fairly certain that you need some brackets here.
        Added brackets.

  Jan Beulich & Andrew Cooper:
    #1 "xen: Add support for VMware cpuid leaves":
      This hunk is unrelated, but is perhaps something better fixed.
        Added to commit message.
      include <xen/types.h> (IIRC) please.
        Done.
      At least 1 pair of brackets please, especially as the placement of
      brackets affects the result of this particular calculation.
        Switch to "1000000ull / APIC_BUS_CYCLE_NS"      


Changes v3 to v4:
  Ian Campbell:
    Report on both viridian and vmware_hw set.
    Added LIBXL_VGA_INTERFACE_TYPE_VMWARE (vga=vmware).

  Andrew Cooper:
    Add doc for hypervisor-cpuid.

  Boris Ostrovsky:
    Changing regs->error_code may not be a good idea.
      Dropped this.
    
  Jan Beulich & Boris Ostrovsky:
    Only enable vmwxit for GP when vmware_port is set.
      Done.


Changes v2 to v3:

  Add optional unit test tools.
  Re-worked split of changes.

  Jan Beulich:
    for #0:
      I don't think you should be adding a new fine in hvm/ _and_ a new
      subdirectory.
        Moved all files to hvm/vmware that contain code.
    for old #1 (now #1 & #2):
      Is there really a point in enabling both Viridian and VMware extensions?
        I still think so.
      hvmloader change: This needs an explanation
        Dropped as not need now.
      Can you make vmware_hw similar to Viridian, returning success when
      setting the value to what it already is.
        Done.
      You don't seem to be using sub_idx: ...
        Dropped.
      Extra changes...
        Dropped.
    for old #2 (now #3):
      ... these guards have the (theoretical at this point) risk of clashing
      ... the patch is obviously incomplete without this header...
        Did not fix any of these issues.  I will stick with this needs
        to be a 2nd patch that changes the include files to better fit
        in Xen coding.  For now these files are in a sub directory
        which is not part of the normal include search.
        Moved the includeCheck.h file into this patch.
    for old #3 (now #4, #5, #6, #7, #8, #9, #10, #11)
      As I think was said on v1 already - this should be split into smaller
      pieces ...
        Done.
      All this would very likely better go into a separate function placed in
      vmport.c.
        Moved most of the code into vmport.c or vmport_rpc.c.
      In any event I'm rather uncomfortable about vmware_port getting
      enabled unconditionally, ...
        Added vmware_port (done in new patches #4, #5) as an xl.cfg
        option.
      You'll have to go through and fix coding style issues.
        I think I have found all these, but since they do not stand out
        for me, let me know of any left.
      "MAKE_INSTR(IN," name is ambiguous.
        Added all 4 opcodes for in and out that can access this port: INB_DX,
        INL_DX, OUTB_DX, OUTL_DX.
      A VMX-specific function shouldn't be named this way...
        Added new common routine vmport_gp_check() that is called from
        both vmx.c and svm.c which is where all the logic about checking
        for IN ans OUT is done.
        Also fixed naming and added static.
      Ah, here we go (as to using HVM_DBG_LOG()): Isn't this _way_ too
      fine grained?
        I have reduced the number of bits used.  Partialy by switching
        some to xentrace (new patch #6 and #7).
      Right, and zero is an indication that it wasn't found. Also I just
      noticed there's a gdprintk() in that event, which for all other ...
        Made the gdprintk() optional.

End of v3 changes.

This is a small part of the changes needed to allow running Linux
and windows (and others) guests that were built on VMware and run
run them unchanged on Xen.

This small part is the start of Xen support of VMware backdoor I/O
port which is how VMware tools (a standard addition installed on a
guest) communicates to the hypervisor.

I picked this subset to start with because it only has changes in
Xen.

Some of this code is already in QEMU and so KVM has some of this
already.  QEMU supported backdoor commands include VMware mouse
support.  A later patch set exists that links these changes, new
code and Xen changes to QEMU to provide VMware mouse support under
Xen.  The important part is that VMware mouse is an absolute
position mouse and so network delays do not effect usage of the
virtual mouse.

For example from the guest:

[root@C63-min-tools ~]# vmtoolsd --cmd "info-get guestinfo.joejoel"
No value found
[root@C63-min-tools ~]# vmtoolsd --cmd "info-set guestinfo.joejoel short"

[root@C63-min-tools ~]# vmtoolsd --cmd "info-get guestinfo.joejoel"
short
[root@C63-min-tools ~]# vmtoolsd --cmd "info-set guestinfo.joejoel long222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000joel"

[root@C63-min-tools ~]# vmtoolsd --cmd "info-get guestinfo.key1"
data1
[root@C63-min-tools ~]# vmtoolsd --cmd "info-get guestinfo.key2"
No value found
[root@C63-min-tools ~]# vmtoolsd --cmd "info-get guestinfo.key2"
data2
[root@C63-min-tools ~]# 


Most of this code has been reverse engineered by looking at
source code for Linux and open VMware tools.

http://open-vm-tools.sourceforge.net


changes RFC to v2:

Jan Beulich:
  Add xen/arch/x86/hvm/vmware.c for cpuid_vmware_leaves
  Fewer patches

Andrew Cooper:
  use the proper constant for apic_khz
  Follow 839b966e3f587bbb1a0d954230fb3904330dccb6 style changes.
  Changed HVM_PARAM_VMWARE_HW to write once (make is_vmware_domain()
    more static).
  Dropped vmport status stuff.
  Added checks for xzalloc() having failed.
  You should include backdoor_def.h ...
     Every thing I tried did not work better.  So I did not
     change VMPORT_PORT and BDOOR_PORT being the same value.
     I did not try and adjust VMware's include file backdoor_def.h
     to working in other xen source files.
  Switching to s_time_t is not valid. get_sec() is defined:
    unsigned long get_sec(void);
  and so my uses of it should be using unsigned long.  However
  since that is not a fixed width type, I used the uint64_t
  data type which is almost the same, but does allow the 32 bit
  build of libxc, libxl to do the correct thing.


Konrad Rzeszutek Wilk:
  Please don't include the address. It should be, etc
      about the Vmware provided include files.
    I went with no changes to these files.  Even if the files should
    be changed to match xen coding style, etc I still feel that the
    original ones should be added via a patch, and then adjusted in a
    2nd patch.
  Can you use XenBus?
    I would say no.  XenBus (and XenStore) is about domain to domain
    communication.  This is about VMware's hyper-call and providing
    access to VMware's guest info very low speed access.

Olaf Hering:
   Dropped changing of bios-strings.  Still needs some documentation
   about this may be needed to do in a tool stack or set of commands.


Boris Ostrovsky:
  Use svm_nextrip_insn_length()
    Looks like __get_instruction_length() does this, so switched to
    __get_instruction_length().
 
RFC:

See

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

for info on detecting VMware.

Linux does not follow this exactly.  It checks for CPUID 1st.  If
that fails, it checks for SMBIOS containing "VMware" (not VMware- or
VMW).

So this patch set provides:

        SMBIOS -- Add string VMware-
        CPUID -- Add VMware's CPUID (Note: currently HyperV (viridian support) breaks this check.)
        Add the magic VMware port
            Allow VMware tools poweroff and reboot
            Enable access to VMware's guest info
            Provide the VMware tools build number


Don Slutz (7):
  xen: Add support for VMware cpuid leaves
  tools: Add vmware_hw support
  vmware: Add VMware provided include files.
  xen: Add vmware_port support
  tools: Add vmware_port support
  Add xentrace to vmware_port
  Add xen-hvm-param

 .gitignore                             |   1 +
 docs/man/xl.cfg.pod.5                  |  36 +++-
 docs/misc/hypervisor-cpuid.markdown    |  30 ++++
 tools/libxc/xc_domain_restore.c        |  14 ++
 tools/libxc/xc_domain_save.c           |  11 ++
 tools/libxc/xg_save_restore.h          |   2 +
 tools/libxl/libxl.h                    |  15 ++
 tools/libxl/libxl_create.c             |  21 ++-
 tools/libxl/libxl_dm.c                 |  12 +-
 tools/libxl/libxl_dom.c                |   2 +
 tools/libxl/libxl_internal.h           |   3 +-
 tools/libxl/libxl_types.idl            |   3 +
 tools/libxl/xl_cmdimpl.c               |   7 +-
 tools/misc/Makefile                    |   7 +-
 tools/misc/xen-hvm-param.c             | 169 +++++++++++++++++++
 tools/xentrace/formats                 |   5 +
 xen/arch/x86/domain.c                  |   2 +
 xen/arch/x86/hvm/Makefile              |   3 +-
 xen/arch/x86/hvm/hvm.c                 |  36 ++++
 xen/arch/x86/hvm/svm/emulate.c         |   2 +-
 xen/arch/x86/hvm/svm/svm.c             |  30 ++++
 xen/arch/x86/hvm/svm/vmcb.c            |   2 +
 xen/arch/x86/hvm/vmware/Makefile       |   2 +
 xen/arch/x86/hvm/vmware/backdoor_def.h | 167 +++++++++++++++++++
 xen/arch/x86/hvm/vmware/cpuid.c        |  89 ++++++++++
 xen/arch/x86/hvm/vmware/includeCheck.h |  17 ++
 xen/arch/x86/hvm/vmware/vmport.c       | 296 +++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c            |   2 +
 xen/arch/x86/hvm/vmx/vmx.c             |  63 ++++++-
 xen/arch/x86/hvm/vmx/vvmx.c            |   3 +
 xen/arch/x86/traps.c                   |   8 +-
 xen/common/domctl.c                    |   3 +
 xen/include/asm-x86/hvm/domain.h       |   3 +
 xen/include/asm-x86/hvm/hvm.h          |   3 +
 xen/include/asm-x86/hvm/io.h           |   2 +-
 xen/include/asm-x86/hvm/svm/emulate.h  |   1 +
 xen/include/asm-x86/hvm/trace.h        |  22 +++
 xen/include/asm-x86/hvm/vmport.h       |  52 ++++++
 xen/include/asm-x86/hvm/vmware.h       |  33 ++++
 xen/include/public/domctl.h            |   3 +
 xen/include/public/hvm/params.h        |   5 +-
 xen/include/public/trace.h             |   3 +
 xen/include/xen/sched.h                |   3 +
 43 files changed, 1172 insertions(+), 21 deletions(-)
 create mode 100644 docs/misc/hypervisor-cpuid.markdown
 create mode 100644 tools/misc/xen-hvm-param.c
 create mode 100644 xen/arch/x86/hvm/vmware/Makefile
 create mode 100644 xen/arch/x86/hvm/vmware/backdoor_def.h
 create mode 100644 xen/arch/x86/hvm/vmware/cpuid.c
 create mode 100644 xen/arch/x86/hvm/vmware/includeCheck.h
 create mode 100644 xen/arch/x86/hvm/vmware/vmport.c
 create mode 100644 xen/include/asm-x86/hvm/vmport.h
 create mode 100644 xen/include/asm-x86/hvm/vmware.h

-- 
1.8.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2015-01-15 16:42   ` Jan Beulich
  2014-10-02 21:30 ` [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support Don Slutz
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

This is done by adding HVM_PARAM_VMWARE_HW. It is set to the VMware
virtual hardware version.

Currently 0, 3-4, 6-11 are good values.  However the
code only checks for == 0 or != 0 or >= 7.

If non-zero then
  Return VMware's cpuid leaves.  If >= 7 return data, else
  return 0.

The support of hypervisor cpuid leaves has not been agreed to.

MicroSoft Hyper-V (AKA viridian) currently must be at 0x40000000.

VMware currently must be at 0x40000000.

KVM currently must be at 0x40000000 (from Seabios).

Xen can be found at the first otherwise unused 0x100 aligned
offset between 0x40000000 and 0x40010000.

http://download.microsoft.com/download/F/B/0/FB0D01A3-8E3A-4F5F-AA59-08C8026D3B8A/requirements-for-implementing-microsoft-hypervisor-interface.docx

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

http://lwn.net/Articles/301888/
  Attempted to get this cleaned up.

So based on this, I picked the order:

Xen at 0x40000000 or
Viridian or VMware at 0x40000000 and Xen at 0x40000100

If both Viridian and VMware selected, report an error.

Since I need to change xen/arch/x86/hvm/Makefile; also add
a newline at end of file.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v7:
      Prevent setting of HVM_PARAM_VIRIDIAN if HVM_PARAM_VMWARE_HW set.
v5:
      Given how is_viridian and is_vmware are defined I think '||' is more
      appropriate.
        Fixed.
      The names of all three functions are bogus.
        removed static support routines.
      This hunk is unrelated, but is perhaps something better fixed.
        Added to commit message.
      include <xen/types.h> (IIRC) please.
        Done.
      At least 1 pair of brackets please, especially as the placement of
      brackets affects the result of this particular calculation.
        Switch to "1000000ull / APIC_BUS_CYCLE_NS"      

 xen/arch/x86/hvm/Makefile        |  3 +-
 xen/arch/x86/hvm/hvm.c           | 32 +++++++++++++++
 xen/arch/x86/hvm/vmware/Makefile |  1 +
 xen/arch/x86/hvm/vmware/cpuid.c  | 89 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c             |  8 +++-
 xen/include/asm-x86/hvm/hvm.h    |  3 ++
 xen/include/asm-x86/hvm/vmware.h | 33 +++++++++++++++
 xen/include/public/hvm/params.h  |  5 ++-
 8 files changed, 170 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/x86/hvm/vmware/Makefile
 create mode 100644 xen/arch/x86/hvm/vmware/cpuid.c
 create mode 100644 xen/include/asm-x86/hvm/vmware.h

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..77598a6 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -1,5 +1,6 @@
 subdir-y += svm
 subdir-y += vmx
+subdir-y += vmware
 
 obj-y += asid.o
 obj-y += emulate.o
@@ -22,4 +23,4 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
+obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 681ae5c..4039061 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -60,6 +60,7 @@
 #include <asm/hvm/cacheattr.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/vmware.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
 #include <public/sched.h>
@@ -4203,6 +4204,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
     if ( cpuid_viridian_leaves(input, eax, ebx, ecx, edx) )
         return;
 
+    if ( cpuid_vmware_leaves(input, eax, ebx, ecx, edx) )
+        return;
+
     if ( cpuid_hypervisor_leaves(input, count, eax, ebx, ecx, edx) )
         return;
 
@@ -5536,6 +5540,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( curr_d == d )
                     break;
 
+                if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] )
+                {
+                    rc = -EXDEV;
+                    break;
+                }
                 if ( a.value != d->arch.hvm_domain.params[a.index] )
                 {
                     rc = -EEXIST;
@@ -5684,6 +5693,29 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
                 break;
             }
+            case HVM_PARAM_VMWARE_HW:
+                /*
+                 * This should only ever be set non-zero one time by
+                 * the tools and is read only by the guest.
+                 */
+                if ( curr_d == d )
+                {
+                    rc = -EPERM;
+                    break;
+                }
+                if ( d->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN] )
+                {
+                    rc = -EXDEV;
+                    break;
+                }
+                if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] &&
+                     d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] !=
+                     a.value )
+                {
+                    rc = -EEXIST;
+                    break;
+                }
+                break;
             }
 
             if ( rc == 0 ) 
diff --git a/xen/arch/x86/hvm/vmware/Makefile b/xen/arch/x86/hvm/vmware/Makefile
new file mode 100644
index 0000000..3fb2e0b
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/Makefile
@@ -0,0 +1 @@
+obj-y += cpuid.o
diff --git a/xen/arch/x86/hvm/vmware/cpuid.c b/xen/arch/x86/hvm/vmware/cpuid.c
new file mode 100644
index 0000000..29f6213
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/cpuid.c
@@ -0,0 +1,89 @@
+/*
+ * arch/x86/hvm/vmware/cpuid.c
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/vmware.h>
+
+/*
+ * VMware hardware version 7 defines some of these cpuid levels,
+ * below is a brief description about those.
+ *
+ *     Leaf 0x40000000, Hypervisor CPUID information
+ * # EAX: The maximum input value for hypervisor CPUID info (0x40000010).
+ * # EBX, ECX, EDX: Hypervisor vendor ID signature. E.g. "VMwareVMware"
+ *
+ *     Leaf 0x40000010, Timing information.
+ * # EAX: (Virtual) TSC frequency in kHz.
+ * # EBX: (Virtual) Bus (local apic timer) frequency in kHz.
+ * # ECX, EDX: RESERVED
+ */
+
+int cpuid_vmware_leaves(uint32_t idx, uint32_t *eax, uint32_t *ebx,
+                        uint32_t *ecx, uint32_t *edx)
+{
+    struct domain *d = current->domain;
+
+    if ( !is_vmware_domain(d) )
+        return 0;
+
+    switch ( idx - 0x40000000 )
+    {
+    case 0x0:
+        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
+        {
+            *eax = 0x40000010;  /* Largest leaf */
+            *ebx = 0x61774d56;  /* "VMwa" */
+            *ecx = 0x4d566572;  /* "reVM" */
+            *edx = 0x65726177;  /* "ware" */
+            break;
+        }
+        /* fallthrough */
+    case 0x10:
+        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
+        {
+            /* (Virtual) TSC frequency in kHz. */
+            *eax =  d->arch.tsc_khz;
+            /* (Virtual) Bus (local apic timer) frequency in kHz. */
+            *ebx = 1000000ull / APIC_BUS_CYCLE_NS;
+            *ecx = 0;          /* Reserved */
+            *edx = 0;          /* Reserved */
+            break;
+        }
+        /* fallthrough */
+    case 0x1 ... 0xf:
+        *eax = 0;          /* Reserved */
+        *ebx = 0;          /* Reserved */
+        *ecx = 0;          /* Reserved */
+        *edx = 0;          /* Reserved */
+        break;
+
+    default:
+        return 0;
+    }
+
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 10fc2ca..90542f9 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -685,8 +685,12 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx,
                uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
 {
     struct domain *d = current->domain;
-    /* Optionally shift out of the way of Viridian architectural leaves. */
-    uint32_t base = is_viridian_domain(d) ? 0x40000100 : 0x40000000;
+    /*
+     * Optionally shift out of the way of Viridian or VMware
+     * architectural leaves.
+     */
+    uint32_t base = is_viridian_domain(d) || is_vmware_domain(d) ?
+        0x40000100 : 0x40000000;
     uint32_t limit, dummy;
 
     idx -= base;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 0d94c48..0910147 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -346,6 +346,9 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
 #define is_viridian_domain(d) \
     (is_hvm_domain(d) && (viridian_feature_mask(d) & HVMPV_base_freq))
 
+#define is_vmware_domain(_d)                                             \
+ (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW]))
+
 void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
                                uint32_t *eax, uint32_t *ebx,
                                uint32_t *ecx, uint32_t *edx);
diff --git a/xen/include/asm-x86/hvm/vmware.h b/xen/include/asm-x86/hvm/vmware.h
new file mode 100644
index 0000000..8390173
--- /dev/null
+++ b/xen/include/asm-x86/hvm/vmware.h
@@ -0,0 +1,33 @@
+/*
+ * asm-x86/hvm/vmware.h
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ASM_X86_HVM_VMWARE_H__
+#define ASM_X86_HVM_VMWARE_H__
+
+#include <xen/types.h>
+
+int cpuid_vmware_leaves(uint32_t idx, uint32_t *eax, uint32_t *ebx,
+                        uint32_t *ecx, uint32_t *edx);
+
+#endif /* ASM_X86_HVM_VMWARE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 3c51072..c893dc5 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -189,6 +189,9 @@
 /* Location of the VM Generation ID in guest physical address space. */
 #define HVM_PARAM_VM_GENERATION_ID_ADDR 34
 
-#define HVM_NR_PARAMS          35
+/* Params for VMware */
+#define HVM_PARAM_VMWARE_HW                 35
+
+#define HVM_NR_PARAMS          36
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
  2014-10-02 21:30 ` [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2014-10-02 22:21   ` Andrew Cooper
  2014-10-02 21:30 ` [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files Don Slutz
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

This is used to set HVM_PARAM_VMWARE_HW. It is set to the VMware
virtual hardware version.

Currently 0, 3-4, 6-11 are good values.  However the code only
checks for == 0 or != 0.

If non-zero then
  default VGA to VMware's VGA.

Also now allows vga=vmware

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v7:
    Default handling of hvm.vga.kind bad.
      Fixed.
    Default of vmware_port should be based on vmware_hw.
      Done. 

v5:
      Anything looking for Xen according to the Xen cpuid instructions...
        Adjusted doc to new wording.

 docs/man/xl.cfg.pod.5               | 25 +++++++++++++++++++++++--
 docs/misc/hypervisor-cpuid.markdown | 30 ++++++++++++++++++++++++++++++
 tools/libxc/xc_domain_restore.c     | 14 ++++++++++++++
 tools/libxc/xc_domain_save.c        | 11 +++++++++++
 tools/libxc/xg_save_restore.h       |  2 ++
 tools/libxl/libxl.h                 | 10 ++++++++++
 tools/libxl/libxl_create.c          | 12 +++++++++---
 tools/libxl/libxl_dm.c              |  8 ++++++++
 tools/libxl/libxl_dom.c             |  2 ++
 tools/libxl/libxl_types.idl         |  2 ++
 tools/libxl/xl_cmdimpl.c            |  6 +++++-
 11 files changed, 116 insertions(+), 6 deletions(-)
 create mode 100644 docs/misc/hypervisor-cpuid.markdown

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8bba21c..6628cfc 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1197,6 +1197,23 @@ The viridian option can be specified as a boolean. A value of true (1)
 is equivalent to the list [ "defaults" ], and a value of false (0) is
 equivalent to an empty list.
 
+=item B<vmware_hw=NUMBER>
+
+Turns on or off the exposure of VMware cpuid.  The number is
+VMware's hardware version number, where 0 is off.  If not zero it
+changes the default VGA to VMware's VGA.
+
+The hardware version number (vmware_hw) come from VMware config files.
+
+=over 4
+
+In a .vmx it is virtualHW.version
+
+In a .ovf it is part of the value of vssd:VirtualSystemType.
+For vssd:VirtualSystemType == vmx-07, vmware_hw = 7.
+
+=back
+
 =back
 
 =head3 Emulated VGA Graphics Device
@@ -1233,10 +1250,14 @@ later (e.g. Windows XP onwards) then you should enable this.
 stdvga supports more video ram and bigger resolutions than Cirrus.
 This option is deprecated, use vga="stdvga" instead.
 
+The deprecated B<stdvga=0> prevents the usage of vmware by default
+if B<vmware_hw> is non-zero. 
+
 =item B<vga="STRING">
 
-Selects the emulated video card (none|stdvga|cirrus).
-The default is cirrus.
+Selects the emulated video card (none|stdvga|cirrus|vmware).
+The default is cirrus unless B<vmware_hw> is non-zero in which case it
+is vmware.
 
 =item B<vnc=BOOLEAN>
 
diff --git a/docs/misc/hypervisor-cpuid.markdown b/docs/misc/hypervisor-cpuid.markdown
new file mode 100644
index 0000000..964a5f4
--- /dev/null
+++ b/docs/misc/hypervisor-cpuid.markdown
@@ -0,0 +1,30 @@
+Hypervisor Cpuid
+================
+
+There is no agreed standard for the use of hypervisor cpuid leaves.
+
+Other than the range 0x40000000 to 0x400000ff can be used by
+hypervisors.
+
+MicroSoft Hyper-V (AKA viridian) leaves currently must be at
+0x40000000.
+
+VMware leaves currently must be at 0x40000000.
+
+KVM leaves currently must be at 0x40000000 (from Seabios).
+
+Xen leaves can be found at the first otherwise unused 0x100 aligned
+offset between 0x40000000 and 0x40010000.
+
+http://download.microsoft.com/download/F/B/0/FB0D01A3-8E3A-4F5F-AA59-08C8026D3B8A/requirements-for-implementing-microsoft-hypervisor-interface.docx
+
+http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
+
+http://lwn.net/Articles/301888/
+  Attempted to get this cleaned up.
+
+So if Viridian or VMware_hw is selected, return their format for the
+range 0x40000000 to 0x400000ff. And return Xen format for the range
+0x40000100 to 0x400001ff.
+
+Otherwise return Xen format for the range 0x40000000 to 0x400000ff.
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index d8bd9b3..d262fa0 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -743,6 +743,7 @@ typedef struct {
     uint64_t vm_generationid_addr;
     uint64_t ioreq_server_pfn;
     uint64_t nr_ioreq_server_pages;
+    uint64_t vmware_hw;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -927,6 +928,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         }
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_VMWARE_HW:
+        /* Skip padding 4 bytes then read the vmware hw version. */
+        if ( RDEXACT(fd, &buf->vmware_hw, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->vmware_hw, sizeof(uint64_t)) )
+        {
+            PERROR("error read the vmware_hw value");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     case XC_SAVE_ID_TOOLSTACK:
         {
             if ( RDEXACT(fd, &buf->tdata.len, sizeof(buf->tdata.len)) )
@@ -1774,6 +1785,9 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
         }
     }
 
+    if (pagebuf.vmware_hw != 0)
+        xc_set_hvm_param(xch, dom, HVM_PARAM_VMWARE_HW, pagebuf.vmware_hw);
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_hvm_param_set(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..76dc307 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1750,6 +1750,17 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the ioreq server gmfn count");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_VMWARE_HW;
+        chunk.data = 0;
+        xc_hvm_param_get(xch, dom, HVM_PARAM_VMWARE_HW, &chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the vmware_hw value");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index bdd9009..d185ba9 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -262,6 +262,8 @@
 /* These are a pair; it is an error for one to exist without the other */
 #define XC_SAVE_ID_HVM_IOREQ_SERVER_PFN -19
 #define XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES -20
+/* VMware data */
+#define XC_SAVE_ID_HVM_VMWARE_HW      -21
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 2700cc1..09faa04 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -158,6 +158,16 @@
 #define LIBXL_BUILDINFO_HVM_VIRIDIAN_ENABLE_DISABLE_WIDTH 64
 
 /*
+ * The libxl_vga_interface_type has the type for vmware.
+ */
+#define LIBXL_HAVE_LIBXL_VGA_INTERFACE_TYPE_VMWARE 1
+
+/*
+ * libxl_domain_build_info has the u.hvm.vmware_hw field.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_VMWARE_HW 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f7f178e..9f4e03c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -227,8 +227,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
             b_info->shadow_memkb = 0;
 
-        if (!b_info->u.hvm.vga.kind)
-            b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        if (!b_info->u.hvm.vga.kind) {
+            if (b_info->u.hvm.vmware_hw)
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
+            else
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        }
 
         switch (b_info->device_model_version) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
@@ -428,13 +432,15 @@ int libxl__domain_build(libxl__gc *gc,
         vments[4] = "start_time";
         vments[5] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000);
 
-        localents = libxl__calloc(gc, 7, sizeof(char *));
+        localents = libxl__calloc(gc, 9, sizeof(char *));
         localents[0] = "platform/acpi";
         localents[1] = libxl_defbool_val(info->u.hvm.acpi) ? "1" : "0";
         localents[2] = "platform/acpi_s3";
         localents[3] = libxl_defbool_val(info->u.hvm.acpi_s3) ? "1" : "0";
         localents[4] = "platform/acpi_s4";
         localents[5] = libxl_defbool_val(info->u.hvm.acpi_s4) ? "1" : "0";
+        localents[6] = "platform/vmware_hw";
+        localents[7] = libxl__sprintf(gc, "%"PRId64, info->u.hvm.vmware_hw);
 
         break;
     case LIBXL_DOMAIN_TYPE_PV:
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 0018113..8bd6414 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -243,6 +243,9 @@ static char ** libxl__build_device_model_args_old(libxl__gc *gc,
         case LIBXL_VGA_INTERFACE_TYPE_NONE:
             flexarray_append_pair(dm_args, "-vga", "none");
             break;
+        case LIBXL_VGA_INTERFACE_TYPE_VMWARE:
+            flexarray_append_pair(dm_args, "-vga", "vmware");
+            break;
         }
 
         if (b_info->u.hvm.boot) {
@@ -555,6 +558,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc,
             break;
         case LIBXL_VGA_INTERFACE_TYPE_NONE:
             break;
+        case LIBXL_VGA_INTERFACE_TYPE_VMWARE:
+            flexarray_append_pair(dm_args, "-device",
+                GCSPRINTF("vmware-svga,vgamem_mb=%d",
+                libxl__sizekb_to_mb(b_info->video_memkb)));
+            break;
         }
 
         if (b_info->u.hvm.boot) {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d63ae1b..b0f0513 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -290,6 +290,8 @@ static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
 #if defined(__i386__) || defined(__x86_64__)
     xc_hvm_param_set(handle, domid, HVM_PARAM_HPET_ENABLED,
                     libxl_defbool_val(info->u.hvm.hpet));
+    xc_set_hvm_param(handle, domid, HVM_PARAM_VMWARE_HW,
+                     info->u.hvm.vmware_hw);
 #endif
     xc_hvm_param_set(handle, domid, HVM_PARAM_TIMER_MODE, timer_mode(info));
     xc_hvm_param_set(handle, domid, HVM_PARAM_VPT_ALIGN,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index bbb03e2..5d25b77 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -175,6 +175,7 @@ libxl_vga_interface_type = Enumeration("vga_interface_type", [
     (1, "CIRRUS"),
     (2, "STD"),
     (3, "NONE"),
+    (4, "VMWARE"),
     ], init_val = "LIBXL_VGA_INTERFACE_TYPE_CIRRUS")
 
 libxl_vendor_device = Enumeration("vendor_device", [
@@ -391,6 +392,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("timeoffset",       string),
                                        ("hpet",             libxl_defbool),
                                        ("vpt_align",        libxl_defbool),
+                                       ("vmware_hw",        uint64),
                                        ("timer_mode",       libxl_timer_mode),
                                        ("nested_hvm",       libxl_defbool),
                                        ("smbios_firmware",  string),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c734f79..89d1724 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1111,6 +1111,8 @@ static void parse_config_data(const char *config_source,
             exit(-ERROR_FAIL);
         }
 
+        if (!xlu_cfg_get_long(config, "vmware_hw",  &l, 1))
+            b_info->u.hvm.vmware_hw = l;
         if (!xlu_cfg_get_long(config, "timer_mode", &l, 1)) {
             const char *s = libxl_timer_mode_to_string(l);
             fprintf(stderr, "WARNING: specifying \"timer_mode\" as an integer is deprecated. "
@@ -1730,13 +1732,15 @@ skip_vfb:
                 b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
             } else if (!strcmp(buf, "none")) {
                 b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
+            } else if (!strcmp(buf, "vmware")) {
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
             } else {
                 fprintf(stderr, "Unknown vga \"%s\" specified\n", buf);
                 exit(1);
             }
         } else if (!xlu_cfg_get_long(config, "stdvga", &l, 0))
             b_info->u.hvm.vga.kind = l ? LIBXL_VGA_INTERFACE_TYPE_STD :
-                                         LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+                                          LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
 
         xlu_cfg_replace_string (config, "keymap", &b_info->u.hvm.keymap, 0);
         xlu_cfg_get_defbool (config, "spice", &b_info->u.hvm.spice.enable, 0);
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files.
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
  2014-10-02 21:30 ` [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves Don Slutz
  2014-10-02 21:30 ` [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2015-01-15 16:46   ` Jan Beulich
  2014-10-02 21:30 ` [PATCH for-4.5 v7 4/7] xen: Add vmware_port support Don Slutz
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

These 2 files: backdoor_def.h and guest_msg_def.h come from:

http://packages.vmware.com/tools/esx/3.5latest/rhel4/SRPMS/index.html
 open-vm-tools-kmod-7.4.8-396269.423167.src.rpm
  open-vm-tools-kmod-7.4.8.tar.gz
   vmhgfs/backdoor_def.h
   vmhgfs/guest_msg_def.h

and are unchanged.

Added the badly named include file includeCheck.h also.  It only
has comments and is provided so that backdoor_def.h and
guest_msg_def.h can be used without change.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
 xen/arch/x86/hvm/vmware/backdoor_def.h | 167 +++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmware/includeCheck.h |  17 ++++
 2 files changed, 184 insertions(+)
 create mode 100644 xen/arch/x86/hvm/vmware/backdoor_def.h
 create mode 100644 xen/arch/x86/hvm/vmware/includeCheck.h

diff --git a/xen/arch/x86/hvm/vmware/backdoor_def.h b/xen/arch/x86/hvm/vmware/backdoor_def.h
new file mode 100644
index 0000000..e76795f
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/backdoor_def.h
@@ -0,0 +1,167 @@
+/* **********************************************************
+ * Copyright 1998 VMware, Inc.  All rights reserved. 
+ * **********************************************************
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+/*
+ * backdoor_def.h --
+ *
+ * This contains backdoor defines that can be included from
+ * an assembly language file.
+ */
+
+
+
+#ifndef _BACKDOOR_DEF_H_
+#define _BACKDOOR_DEF_H_
+
+#define INCLUDE_ALLOW_MODULE
+#define INCLUDE_ALLOW_USERLEVEL
+#define INCLUDE_ALLOW_VMMEXT
+#define INCLUDE_ALLOW_VMCORE
+#define INCLUDE_ALLOW_VMKERNEL
+#include "includeCheck.h"
+
+/*
+ * If you want to add a new low-level backdoor call for a guest userland
+ * application, please consider using the GuestRpc mechanism instead. --hpreg
+ */
+
+#define BDOOR_MAGIC 0x564D5868
+
+/* Low-bandwidth backdoor port. --hpreg */
+
+#define BDOOR_PORT 0x5658
+
+#define BDOOR_CMD_GETMHZ      		   1
+/*
+ * BDOOR_CMD_APMFUNCTION is used by:
+ *
+ * o The FrobOS code, which instead should either program the virtual chipset
+ *   (like the new BIOS code does, matthias offered to implement that), or not
+ *   use any VM-specific code (which requires that we correctly implement
+ *   "power off on CLI HLT" for SMP VMs, boris offered to implement that)
+ *
+ * o The old BIOS code, which will soon be jettisoned
+ *
+ *  --hpreg
+ */
+#define BDOOR_CMD_APMFUNCTION 		   2
+#define BDOOR_CMD_GETDISKGEO  		   3
+#define BDOOR_CMD_GETPTRLOCATION	      4
+#define BDOOR_CMD_SETPTRLOCATION	      5
+#define BDOOR_CMD_GETSELLENGTH		   6
+#define BDOOR_CMD_GETNEXTPIECE		   7
+#define BDOOR_CMD_SETSELLENGTH		   8
+#define BDOOR_CMD_SETNEXTPIECE		   9
+#define BDOOR_CMD_GETVERSION		      10
+#define BDOOR_CMD_GETDEVICELISTELEMENT	11
+#define BDOOR_CMD_TOGGLEDEVICE		   12
+#define BDOOR_CMD_GETGUIOPTIONS		   13
+#define BDOOR_CMD_SETGUIOPTIONS		   14
+#define BDOOR_CMD_GETSCREENSIZE		   15
+#define BDOOR_CMD_MONITOR_CONTROL       16
+#define BDOOR_CMD_GETHWVERSION          17
+#define BDOOR_CMD_OSNOTFOUND            18
+#define BDOOR_CMD_GETUUID               19
+#define BDOOR_CMD_GETMEMSIZE            20
+#define BDOOR_CMD_HOSTCOPY              21 /* Devel only */
+/* BDOOR_CMD_GETOS2INTCURSOR, 22, is very old and defunct. Reuse. */
+#define BDOOR_CMD_GETTIME               23 /* Deprecated. Use GETTIMEFULL. */
+#define BDOOR_CMD_STOPCATCHUP           24
+#define BDOOR_CMD_PUTCHR	        25 /* Devel only */
+#define BDOOR_CMD_ENABLE_MSG	        26 /* Devel only */
+#define BDOOR_CMD_GOTO_TCL	        27 /* Devel only */
+#define BDOOR_CMD_INITPCIOPROM		28
+#define BDOOR_CMD_INT13			29
+#define BDOOR_CMD_MESSAGE               30
+#define BDOOR_CMD_RSVD0                 31
+#define BDOOR_CMD_RSVD1                 32
+#define BDOOR_CMD_RSVD2                 33
+#define BDOOR_CMD_ISACPIDISABLED	34
+#define BDOOR_CMD_TOE			35 /* Not in use */
+/* BDOOR_CMD_INITLSIOPROM, 36, was merged with 28. Reuse. */
+#define BDOOR_CMD_PATCH_SMBIOS_STRUCTS  37
+#define BDOOR_CMD_MAPMEM                38 /* Devel only */
+#define BDOOR_CMD_ABSPOINTER_DATA	39
+#define BDOOR_CMD_ABSPOINTER_STATUS	40
+#define BDOOR_CMD_ABSPOINTER_COMMAND	41
+#define BDOOR_CMD_TIMER_SPONGE          42
+#define BDOOR_CMD_PATCH_ACPI_TABLES	43
+/* Catch-all to allow synchronous tests */
+#define BDOOR_CMD_DEVEL_FAKEHARDWARE	44 /* Debug only - needed in beta */
+#define BDOOR_CMD_GETHZ      		45
+#define BDOOR_CMD_GETTIMEFULL           46
+#define BDOOR_CMD_STATELOGGER           47
+#define BDOOR_CMD_CHECKFORCEBIOSSETUP	48
+#define BDOOR_CMD_LAZYTIMEREMULATION    49
+#define BDOOR_CMD_BIOSBBS               50
+#define BDOOR_CMD_MAX                   51
+
+/* 
+ * IMPORTANT NOTE: When modifying the behavior of an existing backdoor command,
+ * you must adhere to the semantics expected by the oldest Tools who use that
+ * command. Specifically, do not alter the way in which the command modifies 
+ * the registers. Otherwise backwards compatibility will suffer.
+ */
+
+/* High-bandwidth backdoor port. --hpreg */
+
+#define BDOORHB_PORT 0x5659
+
+#define BDOORHB_CMD_MESSAGE 0
+#define BDOORHB_CMD_MAX 1
+
+/*
+ * There is another backdoor which allows access to certain TSC-related
+ * values using otherwise illegal PMC indices when the pseudo_perfctr
+ * control flag is set.
+ */
+
+#define BDOOR_PMC_HW_TSC      0x10000
+#define BDOOR_PMC_REAL_NS     0x10001
+#define BDOOR_PMC_APPARENT_NS 0x10002
+
+#define IS_BDOOR_PMC(index)  (((index) | 3) == 0x10003)
+#define BDOOR_CMD(ecx)       ((ecx) & 0xffff)
+
+
+#ifdef VMM
+/*
+ *----------------------------------------------------------------------
+ *
+ * Backdoor_CmdRequiresFullyValidVCPU --
+ *
+ *    A few backdoor commands require the full VCPU to be valid
+ *    (including GDTR, IDTR, TR and LDTR). The rest get read/write
+ *    access to GPRs and read access to Segment registers (selectors).
+ *
+ * Result:
+ *    True iff VECX contains a command that require the full VCPU to
+ *    be valid.
+ *
+ *----------------------------------------------------------------------
+ */
+static INLINE Bool
+Backdoor_CmdRequiresFullyValidVCPU(unsigned cmd)
+{
+   return cmd == BDOOR_CMD_RSVD0 ||
+          cmd == BDOOR_CMD_RSVD1 ||
+          cmd == BDOOR_CMD_RSVD2;
+}
+#endif
+
+#endif
diff --git a/xen/arch/x86/hvm/vmware/includeCheck.h b/xen/arch/x86/hvm/vmware/includeCheck.h
new file mode 100644
index 0000000..26e0d59
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/includeCheck.h
@@ -0,0 +1,17 @@
+/*
+ * includeCheck.h
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+/*
+ * Nothing here.  Just to use backdoor_def.h without change.
+ */
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 4/7] xen: Add vmware_port support
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
                   ` (2 preceding siblings ...)
  2014-10-02 21:30 ` [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2014-10-02 21:58   ` Don Slutz
  2014-10-02 21:30 ` [PATCH for-4.5 v7 5/7] tools: " Don Slutz
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

This includes adding is_vmware_port_enabled

This is a new domain_create() flag, DOMCRF_vmware_port.  It is
passed to domctl as XEN_DOMCTL_CDF_vmware_port.

This enables limited support of VMware's hyper-call.

This is both a more complete support then in currently provided by
QEMU and/or KVM and less.  The missing part requires QEMU changes
and has been left out until the QEMU patches are accepted upstream.

VMware's hyper-call is also known as VMware Backdoor I/O Port.

Note: this support does not depend on vmware_hw being non-zero.

Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)")
to port 0x5658 specially.  Note: since many operations return data
in EAX, "in (%dx),%eax" is the one to use.  The other lengths like
"in (%dx),%al" will still do things, only AL part of EAX will be
changed.  For "out %eax,(%dx)" of all lengths, EAX will remain
unchanged.

Also this instruction is allowed to be used from ring 3.  To
support this the vmexit for GP needs to be enabled.  I have not
fully tested that nested HVM is doing the right thing for this.

An open source example of using this is:

http://open-vm-tools.sourceforge.net/

Which only uses "inl (%dx)".  Also

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

The support included is enough to allow VMware tools to install in a
HVM domU.

For AMD (svm) the max instruction length of 15 is hard coded.  This
is because __get_instruction_length_from_list() has issues that when
called from #GP handler NRIP is not available, or that NRIP may not
be available at all on a particular HW, leading to the need read the
instruction twice --- once in __get_instruction_length_from_list()
and then again in vmport_gp_check(). Which is bad because memory may
change between the reads.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v7:
      More on AMD in the commit message.
      Switch to only change 32bit part of registers, what VMware
        does.
    Too much logging and tracing.
      Dropped a lot of it.  This includes vmport_debug=

v6:
      Dropped the attempt to use svm_nextrip_insn_length via
      __get_instruction_length (added in v2).  Just always look
      at upto 15 bytes on AMD.

v5:
      we should make sure that svm_vmexit_gp_intercept is not executed for
      any other guest.
        Added an ASSERT on is_vmware_port_enabled.
      magic integers?
        Added #define for them.
      I am fairly certain that you need some brackets here.
        Added brackets.

 xen/arch/x86/domain.c                 |   2 +
 xen/arch/x86/hvm/hvm.c                |   4 +
 xen/arch/x86/hvm/svm/emulate.c        |   2 +-
 xen/arch/x86/hvm/svm/svm.c            |  30 ++++
 xen/arch/x86/hvm/svm/vmcb.c           |   2 +
 xen/arch/x86/hvm/vmware/Makefile      |   1 +
 xen/arch/x86/hvm/vmware/vmport.c      | 274 ++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c           |   2 +
 xen/arch/x86/hvm/vmx/vmx.c            |  63 +++++++-
 xen/arch/x86/hvm/vmx/vvmx.c           |   3 +
 xen/common/domctl.c                   |   3 +
 xen/include/asm-x86/hvm/domain.h      |   3 +
 xen/include/asm-x86/hvm/io.h          |   2 +-
 xen/include/asm-x86/hvm/svm/emulate.h |   1 +
 xen/include/asm-x86/hvm/vmport.h      |  52 +++++++
 xen/include/public/domctl.h           |   3 +
 xen/include/xen/sched.h               |   3 +
 17 files changed, 445 insertions(+), 5 deletions(-)
 create mode 100644 xen/arch/x86/hvm/vmware/vmport.c
 create mode 100644 xen/include/asm-x86/hvm/vmport.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8cfd1ca..a71da52 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -524,6 +524,8 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     d->arch.hvm_domain.mem_sharing_enabled = 0;
 
     d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity);
+    d->arch.hvm_domain.is_vmware_port_enabled =
+        !!(domcr_flags & DOMCRF_vmware_port);
 
     INIT_LIST_HEAD(&d->arch.pdev_list);
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4039061..1357079 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -61,6 +61,7 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/vmware.h>
+#include <asm/hvm/vmport.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
 #include <public/sched.h>
@@ -1444,6 +1445,9 @@ int hvm_domain_initialise(struct domain *d)
         goto fail1;
     d->arch.hvm_domain.io_handler->num_slot = 0;
 
+    if ( d->arch.hvm_domain.is_vmware_port_enabled )
+        vmport_register(d);
+
     if ( is_pvh_domain(d) )
     {
         register_portio_handler(d, 0, 0x10003, handle_pvh_io);
diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 37a1ece..cfad9ab 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -50,7 +50,7 @@ static unsigned int is_prefix(u8 opc)
     return 0;
 }
 
-static unsigned long svm_rip2pointer(struct vcpu *v)
+unsigned long svm_rip2pointer(struct vcpu *v)
 {
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
     unsigned long p = vmcb->cs.base + guest_cpu_user_regs()->eip;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e3e1565..d7f13d9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -59,6 +59,7 @@
 #include <public/sched.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/trace.h>
+#include <asm/hvm/vmport.h>
 #include <asm/hap.h>
 #include <asm/apic.h>
 #include <asm/debugger.h>
@@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
     return;
 }
 
+static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
+                                    struct vcpu *v)
+{
+    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
+    /*
+     * Just use 15 for the instruction length; vmport_gp_check will
+     * adjust it.  This is because
+     * __get_instruction_length_from_list() has issues, and may
+     * require a double read of the instruction bytes.  At some
+     * point a new routine could be added that is based on the code
+     * in vmport_gp_check with extensions to make it more general.
+     * Since that routine is the only user of this code this can be
+     * done later.
+     */
+    unsigned long inst_len = 15;
+    unsigned long inst_addr = svm_rip2pointer(v);
+    int rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
+                             vmcb->exitinfo1, vmcb->exitinfo2);
+
+    if ( !rc )
+        __update_guest_eip(regs, inst_len);
+    else
+        hvm_inject_hw_exception(TRAP_gp_fault, vmcb->exitinfo1);
+}
+
 static void svm_vmexit_ud_intercept(struct cpu_user_regs *regs)
 {
     struct hvm_emulate_ctxt ctxt;
@@ -2471,6 +2497,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
         break;
     }
 
+    case VMEXIT_EXCEPTION_GP:
+        svm_vmexit_gp_intercept(regs, v);
+        break;
+
     case VMEXIT_EXCEPTION_UD:
         svm_vmexit_ud_intercept(regs);
         break;
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 21292bb..45ead61 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -195,6 +195,8 @@ static int construct_vmcb(struct vcpu *v)
         HVM_TRAP_MASK
         | (1U << TRAP_no_device);
 
+    if ( v->domain->arch.hvm_domain.is_vmware_port_enabled )
+        vmcb->_exception_intercepts |= 1U << TRAP_gp_fault;
     if ( paging_mode_hap(v->domain) )
     {
         vmcb->_np_enable = 1; /* enable nested paging */
diff --git a/xen/arch/x86/hvm/vmware/Makefile b/xen/arch/x86/hvm/vmware/Makefile
index 3fb2e0b..cd8815b 100644
--- a/xen/arch/x86/hvm/vmware/Makefile
+++ b/xen/arch/x86/hvm/vmware/Makefile
@@ -1 +1,2 @@
 obj-y += cpuid.o
+obj-y += vmport.o
diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c
new file mode 100644
index 0000000..e73a5e6
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/vmport.c
@@ -0,0 +1,274 @@
+/*
+ * HVM VMPORT emulation
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+#include <xen/lib.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmport.h>
+
+#include "backdoor_def.h"
+
+#define MAX_INST_LEN 15
+
+#ifndef NDEBUG
+unsigned int opt_vmport_debug __read_mostly;
+integer_param("vmport_debug", opt_vmport_debug);
+#endif
+
+/* More VMware defines */
+
+#define VMWARE_GUI_AUTO_GRAB              0x001
+#define VMWARE_GUI_AUTO_UNGRAB            0x002
+#define VMWARE_GUI_AUTO_SCROLL            0x004
+#define VMWARE_GUI_AUTO_RAISE             0x008
+#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
+#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
+#define VMWARE_GUI_FULL_SCREEN            0x040
+
+#define VMWARE_GUI_TO_FULL_SCREEN         0x080
+#define VMWARE_GUI_TO_WINDOW              0x100
+
+#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
+
+#define VMWARE_GUI_SYNC_TIME              0x400
+
+/* When set, toolboxes should not show the cursor options page. */
+#define VMWARE_DISABLE_CURSOR_OPTIONS     0x800
+
+void vmport_register(struct domain *d)
+{
+    register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
+}
+
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    uint32_t cmd = regs->rcx & 0xffff;
+    uint32_t magic = regs->rax;
+    int rc = X86EMUL_OKAY;
+
+    if ( magic == BDOOR_MAGIC )
+    {
+        uint64_t saved_rax = regs->rax;
+        uint64_t value;
+        struct vcpu *curr = current;
+        struct domain *d = curr->domain;
+        struct segment_register sreg;
+
+        switch ( cmd )
+        {
+        case BDOOR_CMD_GETMHZ:
+            regs->rax = (uint32_t)(d->arch.tsc_khz / 1000);
+            break;
+        case BDOOR_CMD_GETVERSION:
+            /* MAGIC */
+            regs->rbx = (regs->rbx & 0xffffffff00000000ull) | BDOOR_MAGIC;
+            /* VERSION_MAGIC */
+            regs->rax = 6;
+            /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */
+            regs->rcx = (regs->rcx & 0xffffffff00000000ull) | 2;
+            break;
+        case BDOOR_CMD_GETSCREENSIZE:
+            /* We have no screen size */
+            regs->rax = -1;
+            break;
+        case BDOOR_CMD_GETHWVERSION:
+            /* vmware_hw */
+            regs->rax = 0;
+            if ( is_hvm_vcpu(curr) )
+            {
+                struct hvm_domain *hd = &d->arch.hvm_domain;
+
+                regs->rax = (uint32_t)hd->params[HVM_PARAM_VMWARE_HW];
+            }
+            if ( !regs->rax )
+                regs->rax = 4;  /* Act like version 4 */
+            break;
+        case BDOOR_CMD_GETHZ:
+            hvm_get_segment_register(curr, x86_seg_ss, &sreg);
+            if ( sreg.attr.fields.dpl == 0 )
+            {
+                value = d->arch.tsc_khz * 1000;
+                /* apic-frequency (bus speed) */
+                regs->rcx = (regs->rcx & 0xffffffff00000000ull) |
+                    (uint32_t)(1000000000ULL / APIC_BUS_CYCLE_NS);
+                /* High part of tsc-frequency */
+                regs->rbx = (regs->rbx & 0xffffffff00000000ull) |
+                    (uint32_t)(value >> 32);
+                /* Low part of tsc-frequency */
+                regs->rax = (uint32_t)value;
+            }
+            break;
+        case BDOOR_CMD_GETTIME:
+            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
+            /* hostUsecs */
+            regs->rbx = (regs->rbx & 0xffffffff00000000ull) |
+                (uint32_t)(value % 1000000UL);
+            /* hostSecs */
+            regs->rax = (uint32_t)(value / 1000000ULL);
+            /* maxTimeLag */
+            regs->rcx = (regs->rcx & 0xffffffff00000000ull) | 1000000;
+            /* offset to GMT in minutes */
+            regs->rdx = (regs->rdx & 0xffffffff00000000ull) |
+                d->time_offset_seconds / 60;
+            break;
+        case BDOOR_CMD_GETTIMEFULL:
+            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
+            /* ... */
+            regs->rax = BDOOR_MAGIC;
+            /* hostUsecs */
+            regs->rbx = (regs->rbx & 0xffffffff00000000ull) |
+                (uint32_t)(value % 1000000UL);
+            /* High part of hostSecs */
+            regs->rsi = (regs->rsi & 0xffffffff00000000ull) |
+                (uint32_t)((value / 1000000ULL) >> 32);
+            /* Low part of hostSecs */
+            regs->rdx = (regs->rdx & 0xffffffff00000000ull) |
+                (uint32_t)(value / 1000000ULL);
+            /* maxTimeLag */
+            regs->rcx = (regs->rcx & 0xffffffff00000000ull) | 1000000;
+            break;
+        case BDOOR_CMD_GETGUIOPTIONS:
+            regs->rax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB |
+                VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME |
+                VMWARE_DISABLE_CURSOR_OPTIONS;
+            break;
+        case BDOOR_CMD_SETGUIOPTIONS:
+            regs->rax = 0x0;
+            break;
+        default:
+            regs->rax = (uint32_t)~0ul;
+            break;
+        }
+        if ( dir == IOREQ_READ )
+        {
+            switch ( bytes )
+            {
+            case 1:
+                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
+                break;
+            case 2:
+                regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff);
+                break;
+            case 4:
+                regs->rax = (uint32_t)regs->rax;
+                break;
+            }
+            *val = regs->rax;
+        }
+        else
+            regs->rax = saved_rax;
+    }
+    else
+        rc = X86EMUL_UNHANDLEABLE;
+
+    return rc;
+}
+
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2)
+{
+    if ( !v->domain->arch.hvm_domain.is_vmware_port_enabled )
+        return X86EMUL_VMPORT_NOT_ENABLED;
+
+    if ( *inst_len && *inst_len <= MAX_INST_LEN &&
+         (regs->rdx & 0xffff) == BDOOR_PORT && ei1 == 0 && ei2 == 0 &&
+         (uint32_t)regs->rax == BDOOR_MAGIC )
+    {
+        int i = 0;
+        uint32_t val;
+        uint32_t byte_cnt = hvm_guest_x86_mode(v);
+        unsigned char bytes[MAX_INST_LEN];
+        unsigned int fetch_len;
+        int frc;
+
+        /* in or out are limited to 32bits */
+        if ( byte_cnt > 4 )
+            byte_cnt = 4;
+
+        /*
+         * Fetch up to the next page break; we'll fetch from the
+         * next page later if we have to.
+         */
+        fetch_len = min_t(unsigned int, *inst_len,
+                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
+        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
+                                                PFEC_page_present);
+        if ( frc != HVMCOPY_okay )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
+                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
+            return X86EMUL_VMPORT_FETCH_ERROR_BYTE1;
+        }
+
+        /* Check for operand size prefix */
+        while ( (i < MAX_INST_LEN) && (bytes[i] == 0x66) )
+        {
+            i++;
+            if ( i >= fetch_len )
+            {
+                frc = hvm_fetch_from_guest_virt_nofault(
+                    &bytes[fetch_len], inst_addr + fetch_len,
+                    MAX_INST_LEN - fetch_len, PFEC_page_present);
+                if ( frc != HVMCOPY_okay )
+                {
+                    gdprintk(XENLOG_WARNING,
+                             "Bad instruction fetch at %#lx + %#x (frc=%d)\n",
+                             inst_addr, fetch_len, frc);
+                    return X86EMUL_VMPORT_FETCH_ERROR_BYTE2;
+                }
+                fetch_len = MAX_INST_LEN;
+            }
+        }
+        *inst_len = i + 1;
+
+        /* Only adjust byte_cnt 1 time */
+        if ( bytes[0] == 0x66 )     /* operand size prefix */
+        {
+            if ( byte_cnt == 4 )
+                byte_cnt = 2;
+            else
+                byte_cnt = 4;
+        }
+        if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
+            return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
+        else if ( bytes[i] == 0xec )     /* in (%dx),%al */
+            return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
+        else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out %ax,(%dx) */
+            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
+        else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
+            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
+        else
+        {
+            *inst_len = 0; /* This is unknown. */
+            return X86EMUL_VMPORT_BAD_OPCODE;
+        }
+    }
+    *inst_len = 0; /* This is unknown. */
+    return X86EMUL_VMPORT_BAD_STATE;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9d8033e..1bab216 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v)
 
     v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
               | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
+              | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                 (1U << TRAP_gp_fault) : 0)
               | (1U << TRAP_no_device);
     vmx_update_exception_bitmap(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 304aeea..300d804 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -44,6 +44,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
+#include <asm/hvm/vmport.h>
 #include <public/sched.h>
 #include <public/hvm/ioreq.h>
 #include <asm/hvm/vpic.h>
@@ -1276,9 +1277,11 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
                         vmx_set_segment_register(
                             v, s, &v->arch.hvm_vmx.vm86_saved_seg[s]);
                 v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
-                          | (paging_mode_hap(v->domain) ?
-                             0 : (1U << TRAP_page_fault))
-                          | (1U << TRAP_no_device);
+                    | (paging_mode_hap(v->domain) ?
+                       0 : (1U << TRAP_page_fault))
+                    | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                       (1U << TRAP_gp_fault) : 0)
+                    | (1U << TRAP_no_device);
                 vmx_update_exception_bitmap(v);
                 vmx_update_debug_state(v);
             }
@@ -2589,6 +2592,57 @@ static void vmx_idtv_reinject(unsigned long idtv_info)
     }
 }
 
+static unsigned long vmx_rip2pointer(struct cpu_user_regs *regs,
+                                     struct vcpu *v)
+{
+    struct segment_register cs;
+    unsigned long p;
+
+    vmx_get_segment_register(v, x86_seg_cs, &cs);
+    p = cs.base + regs->rip;
+    if ( !(cs.attr.fields.l && hvm_long_mode_enabled(v)) )
+        return (uint32_t)p; /* mask to 32 bits */
+    return p;
+}
+
+static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
+                                    struct vcpu *v)
+{
+    unsigned long exit_qualification;
+    unsigned long inst_len;
+    unsigned long inst_addr = vmx_rip2pointer(regs, v);
+    unsigned long ecode;
+    int rc;
+#ifndef NDEBUG
+    unsigned long orig_inst_len;
+    unsigned long vector;
+
+    __vmread(VM_EXIT_INTR_INFO, &vector);
+    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
+    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
+#endif
+
+    __vmread(EXIT_QUALIFICATION, &exit_qualification);
+    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
+    __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
+
+#ifndef NDEBUG
+    orig_inst_len = inst_len;
+#endif
+    rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
+                         ecode, exit_qualification);
+#ifndef NDEBUG
+    if ( inst_len && orig_inst_len != inst_len )
+        gdprintk(XENLOG_WARNING,
+                 "Unexpected instruction length difference: %lu vs %lu\n",
+                 orig_inst_len, inst_len);
+#endif
+    if ( !rc )
+        update_guest_eip();
+    else
+        hvm_inject_hw_exception(TRAP_gp_fault, ecode);
+}
+
 static int vmx_handle_apic_write(void)
 {
     unsigned long exit_qualification;
@@ -2814,6 +2868,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             HVMTRACE_1D(TRAP, vector);
             vmx_fpu_dirty_intercept();
             break;
+        case TRAP_gp_fault:
+            vmx_vmexit_gp_intercept(regs, v);
+            break;
         case TRAP_page_fault:
             __vmread(EXIT_QUALIFICATION, &exit_qualification);
             __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 9ccc03f..8e07f92 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -24,6 +24,7 @@
 #include <asm/types.h>
 #include <asm/mtrr.h>
 #include <asm/p2m.h>
+#include <asm/hvm/vmport.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/nestedhvm.h>
@@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
             if ( v->fpu_dirtied )
                 nvcpu->nv_vmexit_pending = 1;
         }
+        else if ( vector == TRAP_gp_fault )
+            nvcpu->nv_vmexit_pending = 1;
         else if ( (intr_info & valid_mask) == valid_mask )
         {
             exec_bitmap =__get_vvmcs(nvcpu->nv_vvmcx, EXCEPTION_BITMAP);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 30c9e50..fad55a2 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -543,6 +543,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
              ~(XEN_DOMCTL_CDF_hvm_guest
                | XEN_DOMCTL_CDF_pvh_guest
                | XEN_DOMCTL_CDF_hap
+               | XEN_DOMCTL_CDF_vmware_port
                | XEN_DOMCTL_CDF_s3_integrity
                | XEN_DOMCTL_CDF_oos_off)) )
             break;
@@ -586,6 +587,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             domcr_flags |= DOMCRF_s3_integrity;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_oos_off )
             domcr_flags |= DOMCRF_oos_off;
+        if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_vmware_port )
+            domcr_flags |= DOMCRF_vmware_port;
 
         d = domain_create(dom, domcr_flags, op->u.createdomain.ssidref);
         if ( IS_ERR(d) )
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 2757c7f..d4718df 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -121,6 +121,9 @@ struct hvm_domain {
     spinlock_t             uc_lock;
     bool_t                 is_in_uc_mode;
 
+    /* VMware backdoor port available */
+    bool_t                 is_vmware_port_enabled;
+
     /* Pass-through */
     struct hvm_iommu       hvm_iommu;
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 886a9d6..d257161 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,7 +25,7 @@
 #include <public/hvm/ioreq.h>
 #include <public/event_channel.h>
 
-#define MAX_IO_HANDLER             16
+#define MAX_IO_HANDLER             17
 
 #define HVM_PORTIO                  0
 #define HVM_BUFFERED_IO             2
diff --git a/xen/include/asm-x86/hvm/svm/emulate.h b/xen/include/asm-x86/hvm/svm/emulate.h
index ccc2d3c..d9a9dc5 100644
--- a/xen/include/asm-x86/hvm/svm/emulate.h
+++ b/xen/include/asm-x86/hvm/svm/emulate.h
@@ -44,6 +44,7 @@ enum instruction_index {
 
 struct vcpu;
 
+unsigned long svm_rip2pointer(struct vcpu *v);
 int __get_instruction_length_from_list(
     struct vcpu *, const enum instruction_index *, unsigned int list_count);
 
diff --git a/xen/include/asm-x86/hvm/vmport.h b/xen/include/asm-x86/hvm/vmport.h
new file mode 100644
index 0000000..d037d55
--- /dev/null
+++ b/xen/include/asm-x86/hvm/vmport.h
@@ -0,0 +1,52 @@
+/*
+ * asm/hvm/vmport.h: HVM VMPORT emulation
+ *
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ASM_X86_HVM_VMPORT_H__
+#define ASM_X86_HVM_VMPORT_H__
+
+void vmport_register(struct domain *d);
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2);
+/*
+ * Additional return values from vmport_gp_check.
+ *
+ * Note: return values include:
+ *   X86EMUL_OKAY
+ *   X86EMUL_UNHANDLEABLE
+ *   X86EMUL_EXCEPTION
+ *   X86EMUL_RETRY
+ *   X86EMUL_CMPXCHG_FAILED
+ *
+ * The additional do not overlap any of the above.
+ */
+#define X86EMUL_VMPORT_NOT_ENABLED              10
+#define X86EMUL_VMPORT_FETCH_ERROR_BYTE1        11
+#define X86EMUL_VMPORT_FETCH_ERROR_BYTE2        12
+#define X86EMUL_VMPORT_BAD_OPCODE               13
+#define X86EMUL_VMPORT_BAD_STATE                14
+
+#endif /* ASM_X86_HVM_VMPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 61f7555..2b38515 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -63,6 +63,9 @@ struct xen_domctl_createdomain {
  /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
 #define _XEN_DOMCTL_CDF_pvh_guest     4
 #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
+ /* Is VMware backdoor port available? */
+#define _XEN_DOMCTL_CDF_vmware_port   5
+#define XEN_DOMCTL_CDF_vmware_port    (1U<<_XEN_DOMCTL_CDF_vmware_port)
     uint32_t flags;
 };
 typedef struct xen_domctl_createdomain xen_domctl_createdomain_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c5157e6..d741978 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -546,6 +546,9 @@ struct domain *domain_create(
  /* DOMCRF_pvh: Create PV domain in HVM container. */
 #define _DOMCRF_pvh             5
 #define DOMCRF_pvh              (1U<<_DOMCRF_pvh)
+ /* DOMCRF_vmware_port: Enable use of vmware backdoor port. */
+#define _DOMCRF_vmware_port     6
+#define DOMCRF_vmware_port      (1U<<_DOMCRF_vmware_port)
 
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 5/7] tools: Add vmware_port support
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
                   ` (3 preceding siblings ...)
  2014-10-02 21:30 ` [PATCH for-4.5 v7 4/7] xen: Add vmware_port support Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2014-10-02 21:30 ` [PATCH for-4.5 v7 6/7] Add xentrace to vmware_port Don Slutz
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

This new libxl_domain_create_info field is used to set
XEN_DOMCTL_CDF_vmware_port for the xc_domain_create() routine.

In xen it is is_vmware_port_enabled.

If is_vmware_port_enabled then
  enable a limited support of VMware's hyper-call.

VMware's hyper-call is also known as VMware Backdoor I/O Port.

if vmware_port is not specified in the config file, let
"vmware_hw != 0" be the default value.  This means that only
vmware_hw = 7 needs to be specified to enable both features.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
 docs/man/xl.cfg.pod.5        | 11 +++++++++++
 tools/libxl/libxl.h          |  5 +++++
 tools/libxl/libxl_create.c   |  9 +++++++--
 tools/libxl/libxl_dm.c       |  4 +++-
 tools/libxl/libxl_internal.h |  3 ++-
 tools/libxl/libxl_types.idl  |  1 +
 tools/libxl/xl_cmdimpl.c     |  1 +
 7 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6628cfc..413fe8c 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1214,6 +1214,17 @@ For vssd:VirtualSystemType == vmx-07, vmware_hw = 7.
 
 =back
 
+=item B<vmware_port=BOOLEAN>
+
+Turns on or off the exposure of VMware port.  This is known as
+vmport in QEMU.  Also called VMware Backdoor I/O Port.  Not all
+defined VMware backdoor commands are implemented.  All of the
+ones that Linux kernel uses are defined.
+
+if vmware_port is not specified in the config file, let vmware_hw != 0
+be the default value.  This means that only vmware_hw = 7 needs to
+be specified to enable both features.
+
 =back
 
 =head3 Emulated VGA Graphics Device
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 09faa04..9307bb4 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -168,6 +168,11 @@
 #define LIBXL_HAVE_BUILDINFO_HVM_VMWARE_HW 1
 
 /*
+ * libxl_domain_create_info has the vmware_port field.
+ */
+#define LIBXL_HAVE_CREATEINFO_VMWARE_PORT 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 9f4e03c..bda9f96 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -25,7 +25,8 @@
 #include <xen/hvm/hvm_info_table.h>
 
 int libxl__domain_create_info_setdefault(libxl__gc *gc,
-                                         libxl_domain_create_info *c_info)
+                                         libxl_domain_create_info *c_info,
+                                         bool vmware_port_default)
 {
     if (!c_info->type)
         return ERROR_INVAL;
@@ -38,6 +39,7 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&c_info->hap, libxl_defbool_val(c_info->pvh));
     }
 
+    libxl_defbool_setdefault(&c_info->vmware_port, vmware_port_default);
     libxl_defbool_setdefault(&c_info->run_hotplug_scripts, true);
     libxl_defbool_setdefault(&c_info->driver_domain, false);
 
@@ -505,6 +507,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info,
         flags |= XEN_DOMCTL_CDF_hvm_guest;
         flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0;
         flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
+        flags |= libxl_defbool_val(info->vmware_port)? XEN_DOMCTL_CDF_vmware_port : 0;
     } else if (libxl_defbool_val(info->pvh)) {
         flags |= XEN_DOMCTL_CDF_pvh_guest;
         if (!libxl_defbool_val(info->hap)) {
@@ -833,7 +836,9 @@ static void initiate_domain_create(libxl__egc *egc,
         goto error_out;
     }
 
-    ret = libxl__domain_create_info_setdefault(gc, &d_config->c_info);
+    ret = libxl__domain_create_info_setdefault(gc, &d_config->c_info,
+                       d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
+                       d_config->b_info.u.hvm.vmware_hw);
     if (ret) goto error_out;
 
     ret = libxl__domain_make(gc, &d_config->c_info, &domid);
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 8bd6414..bc52957 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -996,7 +996,9 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss)
     dm_config->c_info.run_hotplug_scripts =
         guest_config->c_info.run_hotplug_scripts;
 
-    ret = libxl__domain_create_info_setdefault(gc, &dm_config->c_info);
+    ret = libxl__domain_create_info_setdefault(gc, &dm_config->c_info,
+                       dm_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
+                       dm_config->b_info.u.hvm.vmware_hw);
     if (ret) goto out;
     ret = libxl__domain_build_info_setdefault(gc, &dm_config->b_info);
     if (ret) goto out;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b87c5e2..d74d9ff 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1063,7 +1063,8 @@ _hidden int libxl__nic_type(libxl__gc *gc, libxl__device *dev,
  *     to be called before using any values within these structures.
  */
 _hidden int libxl__domain_create_info_setdefault(libxl__gc *gc,
-                                        libxl_domain_create_info *c_info);
+                                        libxl_domain_create_info *c_info,
+                                        bool vmware_port_default);
 _hidden int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info);
 _hidden int libxl__device_disk_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 5d25b77..c75e1f6 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -304,6 +304,7 @@ libxl_domain_create_info = Struct("domain_create_info",[
     ("type",         libxl_domain_type),
     ("hap",          libxl_defbool),
     ("oos",          libxl_defbool),
+    ("vmware_port",  libxl_defbool),
     ("ssidref",      uint32),
     ("ssid_label",   string),
     ("name",         string),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 89d1724..b821e63 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -892,6 +892,7 @@ static void parse_config_data(const char *config_source,
     }
 
     xlu_cfg_get_defbool(config, "oos", &c_info->oos, 0);
+    xlu_cfg_get_defbool(config, "vmware_port", &c_info->vmware_port, 0);
 
     if (!xlu_cfg_get_string (config, "pool", &buf, 0))
         xlu_cfg_replace_string(config, "pool", &c_info->pool_name, 0);
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 6/7] Add xentrace to vmware_port
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
                   ` (4 preceding siblings ...)
  2014-10-02 21:30 ` [PATCH for-4.5 v7 5/7] tools: " Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2014-10-02 21:30 ` [OPTIONAL][PATCH for-4.5 v7 7/7] Add xen-hvm-param Don Slutz
  2014-10-16  8:12 ` [PATCH for-4.5 v7 0/7] Xen VMware tools support Jan Beulich
  7 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

Also added missing TRAP_DEBUG & VLAPIC.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v7:
      Dropped some of the new traces.
      Added HVMTRACE_ND7.

v6:
      Dropped the attempt to use svm_nextrip_insn_length via
      __get_instruction_length (added in v2).  Just always look
      at upto 15 bytes on AMD.

v5:
      exitinfo1 is used twice.
        Fixed.

 tools/xentrace/formats           |  5 +++++
 xen/arch/x86/hvm/vmware/vmport.c | 22 ++++++++++++++++++++++
 xen/include/asm-x86/hvm/trace.h  | 22 ++++++++++++++++++++++
 xen/include/public/trace.h       |  3 +++
 4 files changed, 52 insertions(+)

diff --git a/tools/xentrace/formats b/tools/xentrace/formats
index da658bf..2c86fe9 100644
--- a/tools/xentrace/formats
+++ b/tools/xentrace/formats
@@ -79,6 +79,11 @@
 0x00082020  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  INTR_WINDOW [ value = 0x%(1)08x ]
 0x00082021  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  NPF         [ gpa = 0x%(2)08x%(1)08x mfn = 0x%(4)08x%(3)08x qual = 0x%(5)04x p2mt = 0x%(6)04x ]
 0x00082023  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  TRAP        [ vector = 0x%(1)02x ]
+0x00082024  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  TRAP_DEBUG  [ exit_qualification = 0x%(1)08x ]
+0x00082025  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  VLAPIC
+0x00082026  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  VMPORT_HANDLED   [ cmd = %(1)d eax = 0x%(2)08x ebx = 0x%(3)08x ecx = 0x%(4)08x edx = 0x%(5)08x esi = 0x%(6)08x edi = 0x%(7)08x ]
+0x00082027  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  VMPORT_UNHANDLED [ bytes << 8 | dir = 0x%(1)03x cmd = 0x%(2)x cmd = %(2)d ebx = 0x%(3)08x ecx = 0x%(4)08x edx = 0x%(5)08x esi = 0x%(6)08x edi = 0x%(7)08x ]
+0x00082028  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  VMPORT_DECODE    [ dir = %(1)d bytes = %(2)d ]
 
 0x0010f001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_map      [ domid = %(1)d ]
 0x0010f002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_unmap    [ domid = %(1)d ]
diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c
index e73a5e6..2c9452f 100644
--- a/xen/arch/x86/hvm/vmware/vmport.c
+++ b/xen/arch/x86/hvm/vmware/vmport.c
@@ -18,6 +18,7 @@
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmport.h>
+#include <asm/hvm/trace.h>
 
 #include "backdoor_def.h"
 
@@ -67,6 +68,7 @@ int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
         struct vcpu *curr = current;
         struct domain *d = curr->domain;
         struct segment_register sreg;
+        int handled = 1;
 
         switch ( cmd )
         {
@@ -151,8 +153,16 @@ int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
             break;
         default:
             regs->rax = (uint32_t)~0ul;
+            handled = 0;
+            HVMTRACE_ND7(VMPORT_UNHANDLED, 0, 0/*cycles*/, 7,
+                         (bytes << 8) | dir, cmd, regs->rbx,
+                         regs->rcx, regs->rdx, regs->rsi, regs->rdi);
             break;
         }
+        if ( handled )
+            HVMTRACE_ND7(VMPORT_HANDLED, 0, 0/*cycles*/, 7,
+                         cmd, regs->rax, regs->rbx, regs->rcx,
+                         regs->rdx, regs->rsi, regs->rdi);
         if ( dir == IOREQ_READ )
         {
             switch ( bytes )
@@ -246,13 +256,25 @@ int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
                 byte_cnt = 4;
         }
         if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
+        {
+            HVMTRACE_2D(VMPORT_DECODE, IOREQ_READ, byte_cnt);
             return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
+        }
         else if ( bytes[i] == 0xec )     /* in (%dx),%al */
+        {
+            HVMTRACE_2D(VMPORT_DECODE, IOREQ_READ, 1);
             return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
+        }
         else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out %ax,(%dx) */
+        {
+            HVMTRACE_2D(VMPORT_DECODE, IOREQ_WRITE, byte_cnt);
             return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
+        }
         else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
+        {
+            HVMTRACE_2D(VMPORT_DECODE, IOREQ_WRITE, 1);
             return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
+        }
         else
         {
             *inst_len = 0; /* This is unknown. */
diff --git a/xen/include/asm-x86/hvm/trace.h b/xen/include/asm-x86/hvm/trace.h
index de802a6..bb25a95 100644
--- a/xen/include/asm-x86/hvm/trace.h
+++ b/xen/include/asm-x86/hvm/trace.h
@@ -54,6 +54,9 @@
 #define DO_TRC_HVM_TRAP             DEFAULT_HVM_MISC
 #define DO_TRC_HVM_TRAP_DEBUG       DEFAULT_HVM_MISC
 #define DO_TRC_HVM_VLAPIC           DEFAULT_HVM_MISC
+#define DO_TRC_HVM_VMPORT_HANDLED   DEFAULT_HVM_IO
+#define DO_TRC_HVM_VMPORT_UNHANDLED DEFAULT_HVM_IO
+#define DO_TRC_HVM_VMPORT_DECODE    DEFAULT_HVM_IO
 
 
 #define TRC_PAR_LONG(par) ((par)&0xFFFFFFFF),((par)>>32)
@@ -83,6 +86,25 @@
         }                                                                 \
     } while(0)
 
+#define HVMTRACE_ND7(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6, d7) \
+    do {                                                                  \
+        if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt )               \
+        {                                                                 \
+            struct {                                                      \
+                u32 d[7];                                                 \
+            } _d;                                                         \
+            _d.d[0]=(d1);                                                 \
+            _d.d[1]=(d2);                                                 \
+            _d.d[2]=(d3);                                                 \
+            _d.d[3]=(d4);                                                 \
+            _d.d[4]=(d5);                                                 \
+            _d.d[5]=(d6);                                                 \
+            _d.d[6]=(d7);                                                 \
+            __trace_var(TRC_HVM_ ## evt | (modifier), cycles,             \
+                        sizeof(*_d.d) * count, &_d);                      \
+        }                                                                 \
+    } while(0)
+
 #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6)    \
     HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6)
 #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5)        \
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
index 5211ae7..a4d51ec 100644
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -227,6 +227,9 @@
 #define TRC_HVM_TRAP             (TRC_HVM_HANDLER + 0x23)
 #define TRC_HVM_TRAP_DEBUG       (TRC_HVM_HANDLER + 0x24)
 #define TRC_HVM_VLAPIC           (TRC_HVM_HANDLER + 0x25)
+#define TRC_HVM_VMPORT_HANDLED   (TRC_HVM_HANDLER + 0x26)
+#define TRC_HVM_VMPORT_UNHANDLED (TRC_HVM_HANDLER + 0x27)
+#define TRC_HVM_VMPORT_DECODE    (TRC_HVM_HANDLER + 0x28)
 
 #define TRC_HVM_IOPORT_WRITE    (TRC_HVM_HANDLER + 0x216)
 #define TRC_HVM_IOMEM_WRITE     (TRC_HVM_HANDLER + 0x217)
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [OPTIONAL][PATCH for-4.5 v7 7/7] Add xen-hvm-param
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
                   ` (5 preceding siblings ...)
  2014-10-02 21:30 ` [PATCH for-4.5 v7 6/7] Add xentrace to vmware_port Don Slutz
@ 2014-10-02 21:30 ` Don Slutz
  2014-10-16  8:12 ` [PATCH for-4.5 v7 0/7] Xen VMware tools support Jan Beulich
  7 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Don Slutz, Tim Deegan,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

A tool to get and set hvm param.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v7:
       Was a later patch.  Still optional.
       Fixed formatting.
       Adjust for drop of VMware RPC.

 .gitignore                 |   1 +
 tools/misc/Makefile        |   7 +-
 tools/misc/xen-hvm-param.c | 169 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 175 insertions(+), 2 deletions(-)
 create mode 100644 tools/misc/xen-hvm-param.c

diff --git a/.gitignore b/.gitignore
index 7a908c4..938b10f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -180,6 +180,7 @@ tools/misc/xen-tmem-list-parse
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
+tools/misc/xen-hvm-param
 tools/misc/gtraceview
 tools/misc/gtracestat
 tools/misc/xenlockprof
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 266fd16..e7c68df 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -12,7 +12,7 @@ CFLAGS += -I$(XEN_ROOT)/tools/libxc
 HDRS     = $(wildcard *.h)
 
 TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov
-TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
+TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvm-param xen-hvmcrash xen-lowmemd xen-mfndump
 TARGETS-$(CONFIG_MIGRATE) += xen-hptool
 TARGETS := $(TARGETS-y)
 
@@ -24,7 +24,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
 
 INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \
 	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
-INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
+INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvm-param xen-hvmcrash xen-lowmemd xen-mfndump
 INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
 INSTALL_SBIN := $(INSTALL_SBIN-y)
 
@@ -59,6 +59,9 @@ clean:
 xen-hvmctx: xen-hvmctx.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-hvm-param: xen-hvm-param.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-hvmcrash: xen-hvmcrash.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-hvm-param.c b/tools/misc/xen-hvm-param.c
new file mode 100644
index 0000000..0496e45
--- /dev/null
+++ b/tools/misc/xen-hvm-param.c
@@ -0,0 +1,169 @@
+/*
+ * tools/misc/xen-hvm-param.c
+ *
+ * Copyright (C) 2014 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <err.h>
+
+#include <xenctrl.h>
+
+
+int
+main(int argc, char **argv)
+{
+    xc_interface *xch;
+    int domid;
+    int start_param = 0;
+    int end_param = HVM_NR_PARAMS;
+    int param;
+    int ret = 0;
+    int i;
+    char hvm_param_name[HVM_NR_PARAMS][80];
+
+    unsigned long hvm_param = -1;
+
+    if ( (argc < 2) || (argc > 4) )
+        errx(1, "usage: %s domid [param [new]]", argv[0]);
+
+    for ( i = 0; i < HVM_NR_PARAMS; i++ )
+        snprintf(hvm_param_name[i], sizeof(hvm_param_name[i]),
+                 "Unknown %d", i);
+
+    snprintf(hvm_param_name[HVM_PARAM_CALLBACK_IRQ],
+             sizeof(hvm_param_name[HVM_PARAM_CALLBACK_IRQ]), "Callback_Irq");
+    snprintf(hvm_param_name[HVM_PARAM_STORE_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_STORE_PFN]), "Store_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_STORE_EVTCHN],
+             sizeof(hvm_param_name[HVM_PARAM_STORE_EVTCHN]), "Store_Evtchn");
+    snprintf(hvm_param_name[HVM_PARAM_PAE_ENABLED],
+             sizeof(hvm_param_name[HVM_PARAM_PAE_ENABLED]), "Pae_Enabled");
+    snprintf(hvm_param_name[HVM_PARAM_IOREQ_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_IOREQ_PFN]), "Ioreq_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_BUFIOREQ_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_BUFIOREQ_PFN]), "Bufioreq_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_VIRIDIAN],
+             sizeof(hvm_param_name[HVM_PARAM_VIRIDIAN]), "Viridian");
+    snprintf(hvm_param_name[HVM_PARAM_TIMER_MODE],
+             sizeof(hvm_param_name[HVM_PARAM_TIMER_MODE]), "Timer_Mode");
+    snprintf(hvm_param_name[HVM_PARAM_HPET_ENABLED],
+             sizeof(hvm_param_name[HVM_PARAM_HPET_ENABLED]), "Hpet_Enabled");
+    snprintf(hvm_param_name[HVM_PARAM_IDENT_PT],
+             sizeof(hvm_param_name[HVM_PARAM_IDENT_PT]), "Ident_Pt");
+    snprintf(hvm_param_name[HVM_PARAM_DM_DOMAIN],
+             sizeof(hvm_param_name[HVM_PARAM_DM_DOMAIN]), "Dm_Domain");
+    snprintf(hvm_param_name[HVM_PARAM_ACPI_S_STATE],
+             sizeof(hvm_param_name[HVM_PARAM_ACPI_S_STATE]), "Acpi_S_State");
+    snprintf(hvm_param_name[HVM_PARAM_VM86_TSS],
+             sizeof(hvm_param_name[HVM_PARAM_VM86_TSS]), "Vm86_Tss");
+    snprintf(hvm_param_name[HVM_PARAM_VPT_ALIGN],
+             sizeof(hvm_param_name[HVM_PARAM_VPT_ALIGN]), "Vpt_Align");
+    snprintf(hvm_param_name[HVM_PARAM_CONSOLE_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_CONSOLE_PFN]), "Console_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_CONSOLE_EVTCHN],
+             sizeof(hvm_param_name[HVM_PARAM_CONSOLE_EVTCHN]),
+             "Console_Evtchn");
+    snprintf(hvm_param_name[HVM_PARAM_ACPI_IOPORTS_LOCATION],
+             sizeof(hvm_param_name[HVM_PARAM_ACPI_IOPORTS_LOCATION]),
+             "Acpi_Ioports_Location");
+    snprintf(hvm_param_name[HVM_PARAM_MEMORY_EVENT_CR0],
+             sizeof(hvm_param_name[HVM_PARAM_MEMORY_EVENT_CR0]),
+             "Memory_Event_Cr0");
+    snprintf(hvm_param_name[HVM_PARAM_MEMORY_EVENT_CR3],
+             sizeof(hvm_param_name[HVM_PARAM_MEMORY_EVENT_CR3]),
+             "Memory_Event_Cr3");
+    snprintf(hvm_param_name[HVM_PARAM_MEMORY_EVENT_CR4],
+             sizeof(hvm_param_name[HVM_PARAM_MEMORY_EVENT_CR4]),
+             "Memory_Event_Cr4");
+    snprintf(hvm_param_name[HVM_PARAM_MEMORY_EVENT_INT3],
+             sizeof(hvm_param_name[HVM_PARAM_MEMORY_EVENT_INT3]),
+             "Memory_Event_Int3");
+    snprintf(hvm_param_name[HVM_PARAM_NESTEDHVM],
+             sizeof(hvm_param_name[HVM_PARAM_NESTEDHVM]), "Nestedhvm");
+    snprintf(hvm_param_name[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP],
+             sizeof(hvm_param_name[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP]),
+             "Memory_Event_Single_Step");
+    snprintf(hvm_param_name[HVM_PARAM_BUFIOREQ_EVTCHN],
+             sizeof(hvm_param_name[HVM_PARAM_BUFIOREQ_EVTCHN]),
+             "Bufioreq_Evtchn");
+    snprintf(hvm_param_name[HVM_PARAM_PAGING_RING_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_PAGING_RING_PFN]),
+             "Paging_Ring_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_ACCESS_RING_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_ACCESS_RING_PFN]),
+             "Access_Ring_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_SHARING_RING_PFN],
+             sizeof(hvm_param_name[HVM_PARAM_SHARING_RING_PFN]),
+             "Sharing_Ring_Pfn");
+    snprintf(hvm_param_name[HVM_PARAM_VMWARE_HW],
+             sizeof(hvm_param_name[HVM_PARAM_VMWARE_HW]), "Vmware_Hw");
+
+    xch = xc_interface_open(0, 0, 0);
+    if ( !xch )
+        err(1, "failed to open control interface");
+
+    domid = atoi(argv[1]);
+    if ( argc > 2 )
+    {
+        start_param = strtol(argv[2], NULL, 0);
+        end_param = start_param + 1;
+    }
+
+    for ( param = start_param; param < end_param; param++ )
+    {
+        ret = xc_get_hvm_param(xch, domid, param, &hvm_param);
+        if ( ret )
+            err(1, "failed to get hvm param %d for domid %d", param, domid);
+        else
+        {
+            if ( argc == 4 )
+            {
+                long new = strtol(argv[3], NULL, 0);
+
+                ret = xc_set_hvm_param(xch, domid, param, new);
+                if ( ret )
+                    err(1, "failed to set hvm param %d for domid %d", param,
+                        domid);
+                else if ( (param >= 0) && (param < HVM_NR_PARAMS) )
+                    printf("hvm_param(%s)=0x%lx(%ld) was 0x%lx(%ld)\n",
+                           hvm_param_name[param], new, new, hvm_param,
+                           hvm_param);
+                else
+                    printf("hvm_param(%d)=0x%lx(%ld) was 0x%lx(%ld)\n",
+                           param, new, new, hvm_param, hvm_param);
+            }
+            else
+            {
+                if ( (param >= 0) && (param < HVM_NR_PARAMS) )
+                    printf("hvm_param(%s)=0x%lx(%ld)\n",
+                           hvm_param_name[param], hvm_param, hvm_param);
+                else
+                    printf("hvm_param(%d)=0x%lx(%ld)\n", param, hvm_param,
+                           hvm_param);
+            }
+        }
+    }
+    xc_interface_close(xch);
+
+    return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 4/7] xen: Add vmware_port support
  2014-10-02 21:30 ` [PATCH for-4.5 v7 4/7] xen: Add vmware_port support Don Slutz
@ 2014-10-02 21:58   ` Don Slutz
  2014-10-02 22:40     ` [PATCH for-4.5 v8 " Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2014-10-02 21:58 UTC (permalink / raw)
  To: Don Slutz, xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Tim Deegan, George Dunlap,
	Aravind Gopalakrishnan, Jan Beulich, Andrew Cooper,
	Boris Ostrovsky, Suravee Suthikulpanit

Andrew Cooper just pointed me to "regs->_ebx", will be changing
this patch to use them.
     -Don Slutz

On 10/02/14 17:30, Don Slutz wrote:
> This includes adding is_vmware_port_enabled
>
> This is a new domain_create() flag, DOMCRF_vmware_port.  It is
> passed to domctl as XEN_DOMCTL_CDF_vmware_port.
>
> This enables limited support of VMware's hyper-call.
>
> This is both a more complete support then in currently provided by
> QEMU and/or KVM and less.  The missing part requires QEMU changes
> and has been left out until the QEMU patches are accepted upstream.
>
> VMware's hyper-call is also known as VMware Backdoor I/O Port.
>
> Note: this support does not depend on vmware_hw being non-zero.
>
> Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)")
> to port 0x5658 specially.  Note: since many operations return data
> in EAX, "in (%dx),%eax" is the one to use.  The other lengths like
> "in (%dx),%al" will still do things, only AL part of EAX will be
> changed.  For "out %eax,(%dx)" of all lengths, EAX will remain
> unchanged.
>
> Also this instruction is allowed to be used from ring 3.  To
> support this the vmexit for GP needs to be enabled.  I have not
> fully tested that nested HVM is doing the right thing for this.
>
> An open source example of using this is:
>
> http://open-vm-tools.sourceforge.net/
>
> Which only uses "inl (%dx)".  Also
>
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
>
> The support included is enough to allow VMware tools to install in a
> HVM domU.
>
> For AMD (svm) the max instruction length of 15 is hard coded.  This
> is because __get_instruction_length_from_list() has issues that when
> called from #GP handler NRIP is not available, or that NRIP may not
> be available at all on a particular HW, leading to the need read the
> instruction twice --- once in __get_instruction_length_from_list()
> and then again in vmport_gp_check(). Which is bad because memory may
> change between the reads.
>
> Signed-off-by: Don Slutz <dslutz@verizon.com>
> ---
> v7:
>        More on AMD in the commit message.
>        Switch to only change 32bit part of registers, what VMware
>          does.
>      Too much logging and tracing.
>        Dropped a lot of it.  This includes vmport_debug=
>
> v6:
>        Dropped the attempt to use svm_nextrip_insn_length via
>        __get_instruction_length (added in v2).  Just always look
>        at upto 15 bytes on AMD.
>
> v5:
>        we should make sure that svm_vmexit_gp_intercept is not executed for
>        any other guest.
>          Added an ASSERT on is_vmware_port_enabled.
>        magic integers?
>          Added #define for them.
>        I am fairly certain that you need some brackets here.
>          Added brackets.
>
>   xen/arch/x86/domain.c                 |   2 +
>   xen/arch/x86/hvm/hvm.c                |   4 +
>   xen/arch/x86/hvm/svm/emulate.c        |   2 +-
>   xen/arch/x86/hvm/svm/svm.c            |  30 ++++
>   xen/arch/x86/hvm/svm/vmcb.c           |   2 +
>   xen/arch/x86/hvm/vmware/Makefile      |   1 +
>   xen/arch/x86/hvm/vmware/vmport.c      | 274 ++++++++++++++++++++++++++++++++++
>   xen/arch/x86/hvm/vmx/vmcs.c           |   2 +
>   xen/arch/x86/hvm/vmx/vmx.c            |  63 +++++++-
>   xen/arch/x86/hvm/vmx/vvmx.c           |   3 +
>   xen/common/domctl.c                   |   3 +
>   xen/include/asm-x86/hvm/domain.h      |   3 +
>   xen/include/asm-x86/hvm/io.h          |   2 +-
>   xen/include/asm-x86/hvm/svm/emulate.h |   1 +
>   xen/include/asm-x86/hvm/vmport.h      |  52 +++++++
>   xen/include/public/domctl.h           |   3 +
>   xen/include/xen/sched.h               |   3 +
>   17 files changed, 445 insertions(+), 5 deletions(-)
>   create mode 100644 xen/arch/x86/hvm/vmware/vmport.c
>   create mode 100644 xen/include/asm-x86/hvm/vmport.h
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 8cfd1ca..a71da52 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -524,6 +524,8 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
>       d->arch.hvm_domain.mem_sharing_enabled = 0;
>   
>       d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity);
> +    d->arch.hvm_domain.is_vmware_port_enabled =
> +        !!(domcr_flags & DOMCRF_vmware_port);
>   
>       INIT_LIST_HEAD(&d->arch.pdev_list);
>   
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4039061..1357079 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -61,6 +61,7 @@
>   #include <asm/hvm/trace.h>
>   #include <asm/hvm/nestedhvm.h>
>   #include <asm/hvm/vmware.h>
> +#include <asm/hvm/vmport.h>
>   #include <asm/mtrr.h>
>   #include <asm/apic.h>
>   #include <public/sched.h>
> @@ -1444,6 +1445,9 @@ int hvm_domain_initialise(struct domain *d)
>           goto fail1;
>       d->arch.hvm_domain.io_handler->num_slot = 0;
>   
> +    if ( d->arch.hvm_domain.is_vmware_port_enabled )
> +        vmport_register(d);
> +
>       if ( is_pvh_domain(d) )
>       {
>           register_portio_handler(d, 0, 0x10003, handle_pvh_io);
> diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
> index 37a1ece..cfad9ab 100644
> --- a/xen/arch/x86/hvm/svm/emulate.c
> +++ b/xen/arch/x86/hvm/svm/emulate.c
> @@ -50,7 +50,7 @@ static unsigned int is_prefix(u8 opc)
>       return 0;
>   }
>   
> -static unsigned long svm_rip2pointer(struct vcpu *v)
> +unsigned long svm_rip2pointer(struct vcpu *v)
>   {
>       struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>       unsigned long p = vmcb->cs.base + guest_cpu_user_regs()->eip;
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index e3e1565..d7f13d9 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -59,6 +59,7 @@
>   #include <public/sched.h>
>   #include <asm/hvm/vpt.h>
>   #include <asm/hvm/trace.h>
> +#include <asm/hvm/vmport.h>
>   #include <asm/hap.h>
>   #include <asm/apic.h>
>   #include <asm/debugger.h>
> @@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
>       return;
>   }
>   
> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
> +                                    struct vcpu *v)
> +{
> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
> +    /*
> +     * Just use 15 for the instruction length; vmport_gp_check will
> +     * adjust it.  This is because
> +     * __get_instruction_length_from_list() has issues, and may
> +     * require a double read of the instruction bytes.  At some
> +     * point a new routine could be added that is based on the code
> +     * in vmport_gp_check with extensions to make it more general.
> +     * Since that routine is the only user of this code this can be
> +     * done later.
> +     */
> +    unsigned long inst_len = 15;
> +    unsigned long inst_addr = svm_rip2pointer(v);
> +    int rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
> +                             vmcb->exitinfo1, vmcb->exitinfo2);
> +
> +    if ( !rc )
> +        __update_guest_eip(regs, inst_len);
> +    else
> +        hvm_inject_hw_exception(TRAP_gp_fault, vmcb->exitinfo1);
> +}
> +
>   static void svm_vmexit_ud_intercept(struct cpu_user_regs *regs)
>   {
>       struct hvm_emulate_ctxt ctxt;
> @@ -2471,6 +2497,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>           break;
>       }
>   
> +    case VMEXIT_EXCEPTION_GP:
> +        svm_vmexit_gp_intercept(regs, v);
> +        break;
> +
>       case VMEXIT_EXCEPTION_UD:
>           svm_vmexit_ud_intercept(regs);
>           break;
> diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
> index 21292bb..45ead61 100644
> --- a/xen/arch/x86/hvm/svm/vmcb.c
> +++ b/xen/arch/x86/hvm/svm/vmcb.c
> @@ -195,6 +195,8 @@ static int construct_vmcb(struct vcpu *v)
>           HVM_TRAP_MASK
>           | (1U << TRAP_no_device);
>   
> +    if ( v->domain->arch.hvm_domain.is_vmware_port_enabled )
> +        vmcb->_exception_intercepts |= 1U << TRAP_gp_fault;
>       if ( paging_mode_hap(v->domain) )
>       {
>           vmcb->_np_enable = 1; /* enable nested paging */
> diff --git a/xen/arch/x86/hvm/vmware/Makefile b/xen/arch/x86/hvm/vmware/Makefile
> index 3fb2e0b..cd8815b 100644
> --- a/xen/arch/x86/hvm/vmware/Makefile
> +++ b/xen/arch/x86/hvm/vmware/Makefile
> @@ -1 +1,2 @@
>   obj-y += cpuid.o
> +obj-y += vmport.o
> diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c
> new file mode 100644
> index 0000000..e73a5e6
> --- /dev/null
> +++ b/xen/arch/x86/hvm/vmware/vmport.c
> @@ -0,0 +1,274 @@
> +/*
> + * HVM VMPORT emulation
> + *
> + * Copyright (C) 2012 Verizon Corporation
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License Version 2 (GPLv2)
> + * as published by the Free Software Foundation.
> + *
> + * This file is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details. <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/config.h>
> +#include <xen/lib.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/support.h>
> +#include <asm/hvm/vmport.h>
> +
> +#include "backdoor_def.h"
> +
> +#define MAX_INST_LEN 15
> +
> +#ifndef NDEBUG
> +unsigned int opt_vmport_debug __read_mostly;
> +integer_param("vmport_debug", opt_vmport_debug);
> +#endif
> +
> +/* More VMware defines */
> +
> +#define VMWARE_GUI_AUTO_GRAB              0x001
> +#define VMWARE_GUI_AUTO_UNGRAB            0x002
> +#define VMWARE_GUI_AUTO_SCROLL            0x004
> +#define VMWARE_GUI_AUTO_RAISE             0x008
> +#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
> +#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
> +#define VMWARE_GUI_FULL_SCREEN            0x040
> +
> +#define VMWARE_GUI_TO_FULL_SCREEN         0x080
> +#define VMWARE_GUI_TO_WINDOW              0x100
> +
> +#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
> +
> +#define VMWARE_GUI_SYNC_TIME              0x400
> +
> +/* When set, toolboxes should not show the cursor options page. */
> +#define VMWARE_DISABLE_CURSOR_OPTIONS     0x800
> +
> +void vmport_register(struct domain *d)
> +{
> +    register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
> +}
> +
> +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    uint32_t cmd = regs->rcx & 0xffff;
> +    uint32_t magic = regs->rax;
> +    int rc = X86EMUL_OKAY;
> +
> +    if ( magic == BDOOR_MAGIC )
> +    {
> +        uint64_t saved_rax = regs->rax;
> +        uint64_t value;
> +        struct vcpu *curr = current;
> +        struct domain *d = curr->domain;
> +        struct segment_register sreg;
> +
> +        switch ( cmd )
> +        {
> +        case BDOOR_CMD_GETMHZ:
> +            regs->rax = (uint32_t)(d->arch.tsc_khz / 1000);
> +            break;
> +        case BDOOR_CMD_GETVERSION:
> +            /* MAGIC */
> +            regs->rbx = (regs->rbx & 0xffffffff00000000ull) | BDOOR_MAGIC;
> +            /* VERSION_MAGIC */
> +            regs->rax = 6;
> +            /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */
> +            regs->rcx = (regs->rcx & 0xffffffff00000000ull) | 2;
> +            break;
> +        case BDOOR_CMD_GETSCREENSIZE:
> +            /* We have no screen size */
> +            regs->rax = -1;
> +            break;
> +        case BDOOR_CMD_GETHWVERSION:
> +            /* vmware_hw */
> +            regs->rax = 0;
> +            if ( is_hvm_vcpu(curr) )
> +            {
> +                struct hvm_domain *hd = &d->arch.hvm_domain;
> +
> +                regs->rax = (uint32_t)hd->params[HVM_PARAM_VMWARE_HW];
> +            }
> +            if ( !regs->rax )
> +                regs->rax = 4;  /* Act like version 4 */
> +            break;
> +        case BDOOR_CMD_GETHZ:
> +            hvm_get_segment_register(curr, x86_seg_ss, &sreg);
> +            if ( sreg.attr.fields.dpl == 0 )
> +            {
> +                value = d->arch.tsc_khz * 1000;
> +                /* apic-frequency (bus speed) */
> +                regs->rcx = (regs->rcx & 0xffffffff00000000ull) |
> +                    (uint32_t)(1000000000ULL / APIC_BUS_CYCLE_NS);
> +                /* High part of tsc-frequency */
> +                regs->rbx = (regs->rbx & 0xffffffff00000000ull) |
> +                    (uint32_t)(value >> 32);
> +                /* Low part of tsc-frequency */
> +                regs->rax = (uint32_t)value;
> +            }
> +            break;
> +        case BDOOR_CMD_GETTIME:
> +            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
> +            /* hostUsecs */
> +            regs->rbx = (regs->rbx & 0xffffffff00000000ull) |
> +                (uint32_t)(value % 1000000UL);
> +            /* hostSecs */
> +            regs->rax = (uint32_t)(value / 1000000ULL);
> +            /* maxTimeLag */
> +            regs->rcx = (regs->rcx & 0xffffffff00000000ull) | 1000000;
> +            /* offset to GMT in minutes */
> +            regs->rdx = (regs->rdx & 0xffffffff00000000ull) |
> +                d->time_offset_seconds / 60;
> +            break;
> +        case BDOOR_CMD_GETTIMEFULL:
> +            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
> +            /* ... */
> +            regs->rax = BDOOR_MAGIC;
> +            /* hostUsecs */
> +            regs->rbx = (regs->rbx & 0xffffffff00000000ull) |
> +                (uint32_t)(value % 1000000UL);
> +            /* High part of hostSecs */
> +            regs->rsi = (regs->rsi & 0xffffffff00000000ull) |
> +                (uint32_t)((value / 1000000ULL) >> 32);
> +            /* Low part of hostSecs */
> +            regs->rdx = (regs->rdx & 0xffffffff00000000ull) |
> +                (uint32_t)(value / 1000000ULL);
> +            /* maxTimeLag */
> +            regs->rcx = (regs->rcx & 0xffffffff00000000ull) | 1000000;
> +            break;
> +        case BDOOR_CMD_GETGUIOPTIONS:
> +            regs->rax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB |
> +                VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME |
> +                VMWARE_DISABLE_CURSOR_OPTIONS;
> +            break;
> +        case BDOOR_CMD_SETGUIOPTIONS:
> +            regs->rax = 0x0;
> +            break;
> +        default:
> +            regs->rax = (uint32_t)~0ul;
> +            break;
> +        }
> +        if ( dir == IOREQ_READ )
> +        {
> +            switch ( bytes )
> +            {
> +            case 1:
> +                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
> +                break;
> +            case 2:
> +                regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff);
> +                break;
> +            case 4:
> +                regs->rax = (uint32_t)regs->rax;
> +                break;
> +            }
> +            *val = regs->rax;
> +        }
> +        else
> +            regs->rax = saved_rax;
> +    }
> +    else
> +        rc = X86EMUL_UNHANDLEABLE;
> +
> +    return rc;
> +}
> +
> +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
> +                    unsigned long *inst_len, unsigned long inst_addr,
> +                    unsigned long ei1, unsigned long ei2)
> +{
> +    if ( !v->domain->arch.hvm_domain.is_vmware_port_enabled )
> +        return X86EMUL_VMPORT_NOT_ENABLED;
> +
> +    if ( *inst_len && *inst_len <= MAX_INST_LEN &&
> +         (regs->rdx & 0xffff) == BDOOR_PORT && ei1 == 0 && ei2 == 0 &&
> +         (uint32_t)regs->rax == BDOOR_MAGIC )
> +    {
> +        int i = 0;
> +        uint32_t val;
> +        uint32_t byte_cnt = hvm_guest_x86_mode(v);
> +        unsigned char bytes[MAX_INST_LEN];
> +        unsigned int fetch_len;
> +        int frc;
> +
> +        /* in or out are limited to 32bits */
> +        if ( byte_cnt > 4 )
> +            byte_cnt = 4;
> +
> +        /*
> +         * Fetch up to the next page break; we'll fetch from the
> +         * next page later if we have to.
> +         */
> +        fetch_len = min_t(unsigned int, *inst_len,
> +                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
> +        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
> +                                                PFEC_page_present);
> +        if ( frc != HVMCOPY_okay )
> +        {
> +            gdprintk(XENLOG_WARNING,
> +                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
> +                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
> +            return X86EMUL_VMPORT_FETCH_ERROR_BYTE1;
> +        }
> +
> +        /* Check for operand size prefix */
> +        while ( (i < MAX_INST_LEN) && (bytes[i] == 0x66) )
> +        {
> +            i++;
> +            if ( i >= fetch_len )
> +            {
> +                frc = hvm_fetch_from_guest_virt_nofault(
> +                    &bytes[fetch_len], inst_addr + fetch_len,
> +                    MAX_INST_LEN - fetch_len, PFEC_page_present);
> +                if ( frc != HVMCOPY_okay )
> +                {
> +                    gdprintk(XENLOG_WARNING,
> +                             "Bad instruction fetch at %#lx + %#x (frc=%d)\n",
> +                             inst_addr, fetch_len, frc);
> +                    return X86EMUL_VMPORT_FETCH_ERROR_BYTE2;
> +                }
> +                fetch_len = MAX_INST_LEN;
> +            }
> +        }
> +        *inst_len = i + 1;
> +
> +        /* Only adjust byte_cnt 1 time */
> +        if ( bytes[0] == 0x66 )     /* operand size prefix */
> +        {
> +            if ( byte_cnt == 4 )
> +                byte_cnt = 2;
> +            else
> +                byte_cnt = 4;
> +        }
> +        if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
> +            return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
> +        else if ( bytes[i] == 0xec )     /* in (%dx),%al */
> +            return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
> +        else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out %ax,(%dx) */
> +            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
> +        else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
> +            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
> +        else
> +        {
> +            *inst_len = 0; /* This is unknown. */
> +            return X86EMUL_VMPORT_BAD_OPCODE;
> +        }
> +    }
> +    *inst_len = 0; /* This is unknown. */
> +    return X86EMUL_VMPORT_BAD_STATE;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-set-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 9d8033e..1bab216 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v)
>   
>       v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
>                 | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
> +              | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
> +                 (1U << TRAP_gp_fault) : 0)
>                 | (1U << TRAP_no_device);
>       vmx_update_exception_bitmap(v);
>   
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 304aeea..300d804 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -44,6 +44,7 @@
>   #include <asm/hvm/support.h>
>   #include <asm/hvm/vmx/vmx.h>
>   #include <asm/hvm/vmx/vmcs.h>
> +#include <asm/hvm/vmport.h>
>   #include <public/sched.h>
>   #include <public/hvm/ioreq.h>
>   #include <asm/hvm/vpic.h>
> @@ -1276,9 +1277,11 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
>                           vmx_set_segment_register(
>                               v, s, &v->arch.hvm_vmx.vm86_saved_seg[s]);
>                   v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
> -                          | (paging_mode_hap(v->domain) ?
> -                             0 : (1U << TRAP_page_fault))
> -                          | (1U << TRAP_no_device);
> +                    | (paging_mode_hap(v->domain) ?
> +                       0 : (1U << TRAP_page_fault))
> +                    | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
> +                       (1U << TRAP_gp_fault) : 0)
> +                    | (1U << TRAP_no_device);
>                   vmx_update_exception_bitmap(v);
>                   vmx_update_debug_state(v);
>               }
> @@ -2589,6 +2592,57 @@ static void vmx_idtv_reinject(unsigned long idtv_info)
>       }
>   }
>   
> +static unsigned long vmx_rip2pointer(struct cpu_user_regs *regs,
> +                                     struct vcpu *v)
> +{
> +    struct segment_register cs;
> +    unsigned long p;
> +
> +    vmx_get_segment_register(v, x86_seg_cs, &cs);
> +    p = cs.base + regs->rip;
> +    if ( !(cs.attr.fields.l && hvm_long_mode_enabled(v)) )
> +        return (uint32_t)p; /* mask to 32 bits */
> +    return p;
> +}
> +
> +static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
> +                                    struct vcpu *v)
> +{
> +    unsigned long exit_qualification;
> +    unsigned long inst_len;
> +    unsigned long inst_addr = vmx_rip2pointer(regs, v);
> +    unsigned long ecode;
> +    int rc;
> +#ifndef NDEBUG
> +    unsigned long orig_inst_len;
> +    unsigned long vector;
> +
> +    __vmread(VM_EXIT_INTR_INFO, &vector);
> +    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
> +    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
> +#endif
> +
> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
> +    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
> +    __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
> +
> +#ifndef NDEBUG
> +    orig_inst_len = inst_len;
> +#endif
> +    rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
> +                         ecode, exit_qualification);
> +#ifndef NDEBUG
> +    if ( inst_len && orig_inst_len != inst_len )
> +        gdprintk(XENLOG_WARNING,
> +                 "Unexpected instruction length difference: %lu vs %lu\n",
> +                 orig_inst_len, inst_len);
> +#endif
> +    if ( !rc )
> +        update_guest_eip();
> +    else
> +        hvm_inject_hw_exception(TRAP_gp_fault, ecode);
> +}
> +
>   static int vmx_handle_apic_write(void)
>   {
>       unsigned long exit_qualification;
> @@ -2814,6 +2868,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>               HVMTRACE_1D(TRAP, vector);
>               vmx_fpu_dirty_intercept();
>               break;
> +        case TRAP_gp_fault:
> +            vmx_vmexit_gp_intercept(regs, v);
> +            break;
>           case TRAP_page_fault:
>               __vmread(EXIT_QUALIFICATION, &exit_qualification);
>               __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 9ccc03f..8e07f92 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -24,6 +24,7 @@
>   #include <asm/types.h>
>   #include <asm/mtrr.h>
>   #include <asm/p2m.h>
> +#include <asm/hvm/vmport.h>
>   #include <asm/hvm/vmx/vmx.h>
>   #include <asm/hvm/vmx/vvmx.h>
>   #include <asm/hvm/nestedhvm.h>
> @@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>               if ( v->fpu_dirtied )
>                   nvcpu->nv_vmexit_pending = 1;
>           }
> +        else if ( vector == TRAP_gp_fault )
> +            nvcpu->nv_vmexit_pending = 1;
>           else if ( (intr_info & valid_mask) == valid_mask )
>           {
>               exec_bitmap =__get_vvmcs(nvcpu->nv_vvmcx, EXCEPTION_BITMAP);
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 30c9e50..fad55a2 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -543,6 +543,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>                ~(XEN_DOMCTL_CDF_hvm_guest
>                  | XEN_DOMCTL_CDF_pvh_guest
>                  | XEN_DOMCTL_CDF_hap
> +               | XEN_DOMCTL_CDF_vmware_port
>                  | XEN_DOMCTL_CDF_s3_integrity
>                  | XEN_DOMCTL_CDF_oos_off)) )
>               break;
> @@ -586,6 +587,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>               domcr_flags |= DOMCRF_s3_integrity;
>           if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_oos_off )
>               domcr_flags |= DOMCRF_oos_off;
> +        if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_vmware_port )
> +            domcr_flags |= DOMCRF_vmware_port;
>   
>           d = domain_create(dom, domcr_flags, op->u.createdomain.ssidref);
>           if ( IS_ERR(d) )
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index 2757c7f..d4718df 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -121,6 +121,9 @@ struct hvm_domain {
>       spinlock_t             uc_lock;
>       bool_t                 is_in_uc_mode;
>   
> +    /* VMware backdoor port available */
> +    bool_t                 is_vmware_port_enabled;
> +
>       /* Pass-through */
>       struct hvm_iommu       hvm_iommu;
>   
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 886a9d6..d257161 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -25,7 +25,7 @@
>   #include <public/hvm/ioreq.h>
>   #include <public/event_channel.h>
>   
> -#define MAX_IO_HANDLER             16
> +#define MAX_IO_HANDLER             17
>   
>   #define HVM_PORTIO                  0
>   #define HVM_BUFFERED_IO             2
> diff --git a/xen/include/asm-x86/hvm/svm/emulate.h b/xen/include/asm-x86/hvm/svm/emulate.h
> index ccc2d3c..d9a9dc5 100644
> --- a/xen/include/asm-x86/hvm/svm/emulate.h
> +++ b/xen/include/asm-x86/hvm/svm/emulate.h
> @@ -44,6 +44,7 @@ enum instruction_index {
>   
>   struct vcpu;
>   
> +unsigned long svm_rip2pointer(struct vcpu *v);
>   int __get_instruction_length_from_list(
>       struct vcpu *, const enum instruction_index *, unsigned int list_count);
>   
> diff --git a/xen/include/asm-x86/hvm/vmport.h b/xen/include/asm-x86/hvm/vmport.h
> new file mode 100644
> index 0000000..d037d55
> --- /dev/null
> +++ b/xen/include/asm-x86/hvm/vmport.h
> @@ -0,0 +1,52 @@
> +/*
> + * asm/hvm/vmport.h: HVM VMPORT emulation
> + *
> + *
> + * Copyright (C) 2012 Verizon Corporation
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License Version 2 (GPLv2)
> + * as published by the Free Software Foundation.
> + *
> + * This file is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details. <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ASM_X86_HVM_VMPORT_H__
> +#define ASM_X86_HVM_VMPORT_H__
> +
> +void vmport_register(struct domain *d);
> +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val);
> +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
> +                    unsigned long *inst_len, unsigned long inst_addr,
> +                    unsigned long ei1, unsigned long ei2);
> +/*
> + * Additional return values from vmport_gp_check.
> + *
> + * Note: return values include:
> + *   X86EMUL_OKAY
> + *   X86EMUL_UNHANDLEABLE
> + *   X86EMUL_EXCEPTION
> + *   X86EMUL_RETRY
> + *   X86EMUL_CMPXCHG_FAILED
> + *
> + * The additional do not overlap any of the above.
> + */
> +#define X86EMUL_VMPORT_NOT_ENABLED              10
> +#define X86EMUL_VMPORT_FETCH_ERROR_BYTE1        11
> +#define X86EMUL_VMPORT_FETCH_ERROR_BYTE2        12
> +#define X86EMUL_VMPORT_BAD_OPCODE               13
> +#define X86EMUL_VMPORT_BAD_STATE                14
> +
> +#endif /* ASM_X86_HVM_VMPORT_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 61f7555..2b38515 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -63,6 +63,9 @@ struct xen_domctl_createdomain {
>    /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
>   #define _XEN_DOMCTL_CDF_pvh_guest     4
>   #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
> + /* Is VMware backdoor port available? */
> +#define _XEN_DOMCTL_CDF_vmware_port   5
> +#define XEN_DOMCTL_CDF_vmware_port    (1U<<_XEN_DOMCTL_CDF_vmware_port)
>       uint32_t flags;
>   };
>   typedef struct xen_domctl_createdomain xen_domctl_createdomain_t;
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index c5157e6..d741978 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -546,6 +546,9 @@ struct domain *domain_create(
>    /* DOMCRF_pvh: Create PV domain in HVM container. */
>   #define _DOMCRF_pvh             5
>   #define DOMCRF_pvh              (1U<<_DOMCRF_pvh)
> + /* DOMCRF_vmware_port: Enable use of vmware backdoor port. */
> +#define _DOMCRF_vmware_port     6
> +#define DOMCRF_vmware_port      (1U<<_DOMCRF_vmware_port)
>   
>   /*
>    * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support
  2014-10-02 21:30 ` [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support Don Slutz
@ 2014-10-02 22:21   ` Andrew Cooper
  2014-10-02 22:56     ` [PATCH for-4.5 v8 " Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Cooper @ 2014-10-02 22:21 UTC (permalink / raw)
  To: Don Slutz, xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Tim Deegan, George Dunlap,
	Aravind Gopalakrishnan, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit

On 02/10/2014 22:30, Don Slutz wrote:
> diff --git a/docs/misc/hypervisor-cpuid.markdown b/docs/misc/hypervisor-cpuid.markdown
> new file mode 100644
> index 0000000..964a5f4
> --- /dev/null
> +++ b/docs/misc/hypervisor-cpuid.markdown
> @@ -0,0 +1,30 @@
> +Hypervisor Cpuid
> +================
> +
> +There is no agreed standard for the use of hypervisor cpuid leaves.
> +
> +Other than the range 0x40000000 to 0x400000ff can be used by
> +hypervisors.

This sentence doesn't parse.

Checking in the latest Intel and AMD manuals I can find,

Intel (Vol 2a, 3.2 CPUID), states "Invalid. No existing or future CPU
will return processor identification or feature information if the
initial EAX value is in the range 40000000H to 4FFFFFFFH."

AMD (Vol3, Appendix E.3.9) states "CPUID Fn4000_00[FF:00]: These
function numbers are reserved for use by the virtual machine monitor."

I feel this is information needs recording as well.

Perhaps:

"
There is no agreed standard for the use of hypervisor cpuid leaves.

AMD (Vol3, Appendix E.3.9) reserves 0x40000000 to 0x400000ff for
hypervisor use, while Intel (Vol 2a, 3.2 CPUID) guarantees that no
existing or future CPUs will use the range 0x40000000 to 0x4fffffff.

Different hypervisors use the space as follows:
"

> +
> +MicroSoft Hyper-V (AKA viridian) leaves currently must be at
> +0x40000000.
> +
> +VMware leaves currently must be at 0x40000000.
> +
> +KVM leaves currently must be at 0x40000000 (from Seabios).
> +
> +Xen leaves can be found at the first otherwise unused 0x100 aligned
> +offset between 0x40000000 and 0x40010000.
> +
> +http://download.microsoft.com/download/F/B/0/FB0D01A3-8E3A-4F5F-AA59-08C8026D3B8A/requirements-for-implementing-microsoft-hypervisor-interface.docx
> +
> +http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
> +
> +http://lwn.net/Articles/301888/
> +  Attempted to get this cleaned up.
> +
> +So if Viridian or VMware_hw is selected, return their format for the
> +range 0x40000000 to 0x400000ff. And return Xen format for the range
> +0x40000100 to 0x400001ff.
> +
> +Otherwise return Xen format for the range 0x40000000 to 0x400000ff.

<snip>

> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1111,6 +1111,8 @@ static void parse_config_data(const char *config_source,
>              exit(-ERROR_FAIL);
>          }
>  
> +        if (!xlu_cfg_get_long(config, "vmware_hw",  &l, 1))
> +            b_info->u.hvm.vmware_hw = l;
>          if (!xlu_cfg_get_long(config, "timer_mode", &l, 1)) {
>              const char *s = libxl_timer_mode_to_string(l);
>              fprintf(stderr, "WARNING: specifying \"timer_mode\" as an integer is deprecated. "
> @@ -1730,13 +1732,15 @@ skip_vfb:
>                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
>              } else if (!strcmp(buf, "none")) {
>                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
> +            } else if (!strcmp(buf, "vmware")) {
> +                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
>              } else {
>                  fprintf(stderr, "Unknown vga \"%s\" specified\n", buf);
>                  exit(1);
>              }
>          } else if (!xlu_cfg_get_long(config, "stdvga", &l, 0))
>              b_info->u.hvm.vga.kind = l ? LIBXL_VGA_INTERFACE_TYPE_STD :
> -                                         LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
> +                                          LIBXL_VGA_INTERFACE_TYPE_CIRRUS;

Spurious whitepsace change.

Other than these two comments, the rest of the patch looks ok, so for
what its worth

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2014-10-02 21:58   ` Don Slutz
@ 2014-10-02 22:40     ` Don Slutz
  2015-01-16 10:09       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2014-10-02 22:40 UTC (permalink / raw)
  To: Don Slutz, xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Tim Deegan, George Dunlap,
	Aravind Gopalakrishnan, Jan Beulich, Andrew Cooper,
	Boris Ostrovsky, Suravee Suthikulpanit

[-- Attachment #1: Type: text/plain, Size: 27441 bytes --]

Attempt to send in text.  Also attached.
     -Don Slutz

--------------------------------------------------------

 From 4db1093d0b420cc54258c0db03d991fa3b3acd7f Mon Sep 17 00:00:00 2001
From: Don Slutz <dslutz@verizon.com>
Date: Thu, 21 Nov 2013 15:01:08 -0500
Subject: [PATCH for-4.5 v7 4/7] xen: Add vmware_port support

This includes adding is_vmware_port_enabled

This is a new domain_create() flag, DOMCRF_vmware_port.  It is
passed to domctl as XEN_DOMCTL_CDF_vmware_port.

This enables limited support of VMware's hyper-call.

This is both a more complete support then in currently provided by
QEMU and/or KVM and less.  The missing part requires QEMU changes
and has been left out until the QEMU patches are accepted upstream.

VMware's hyper-call is also known as VMware Backdoor I/O Port.

Note: this support does not depend on vmware_hw being non-zero.

Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)")
to port 0x5658 specially.  Note: since many operations return data
in EAX, "in (%dx),%eax" is the one to use.  The other lengths like
"in (%dx),%al" will still do things, only AL part of EAX will be
changed.  For "out %eax,(%dx)" of all lengths, EAX will remain
unchanged.

Also this instruction is allowed to be used from ring 3.  To
support this the vmexit for GP needs to be enabled.  I have not
fully tested that nested HVM is doing the right thing for this.

An open source example of using this is:

http://open-vm-tools.sourceforge.net/

Which only uses "inl (%dx)".  Also

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

The support included is enough to allow VMware tools to install in a
HVM domU.

For AMD (svm) the max instruction length of 15 is hard coded.  This
is because __get_instruction_length_from_list() has issues that when
called from #GP handler NRIP is not available, or that NRIP may not
be available at all on a particular HW, leading to the need read the
instruction twice --- once in __get_instruction_length_from_list()
and then again in vmport_gp_check(). Which is bad because memory may
change between the reads.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v8:
     Switch to _ebx etc.

v7:
       More on AMD in the commit message.
       Switch to only change 32bit part of registers, what VMware
         does.
     Too much logging and tracing.
       Dropped a lot of it.  This includes vmport_debug=

v6:
       Dropped the attempt to use svm_nextrip_insn_length via
       __get_instruction_length (added in v2).  Just always look
       at upto 15 bytes on AMD.

v5:
       we should make sure that svm_vmexit_gp_intercept is not executed for
       any other guest.
         Added an ASSERT on is_vmware_port_enabled.
       magic integers?
         Added #define for them.
       I am fairly certain that you need some brackets here.
         Added brackets.

  xen/arch/x86/domain.c                 |   2 +
  xen/arch/x86/hvm/hvm.c                |   4 +
  xen/arch/x86/hvm/svm/emulate.c        |   2 +-
  xen/arch/x86/hvm/svm/svm.c            |  30 ++++
  xen/arch/x86/hvm/svm/vmcb.c           |   2 +
  xen/arch/x86/hvm/vmware/Makefile      |   1 +
  xen/arch/x86/hvm/vmware/vmport.c      | 262 
++++++++++++++++++++++++++++++++++
  xen/arch/x86/hvm/vmx/vmcs.c           |   2 +
  xen/arch/x86/hvm/vmx/vmx.c            |  63 +++++++-
  xen/arch/x86/hvm/vmx/vvmx.c           |   3 +
  xen/common/domctl.c                   |   3 +
  xen/include/asm-x86/hvm/domain.h      |   3 +
  xen/include/asm-x86/hvm/io.h          |   2 +-
  xen/include/asm-x86/hvm/svm/emulate.h |   1 +
  xen/include/asm-x86/hvm/vmport.h      |  52 +++++++
  xen/include/public/domctl.h           |   3 +
  xen/include/xen/sched.h               |   3 +
  17 files changed, 433 insertions(+), 5 deletions(-)
  create mode 100644 xen/arch/x86/hvm/vmware/vmport.c
  create mode 100644 xen/include/asm-x86/hvm/vmport.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8cfd1ca..a71da52 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -524,6 +524,8 @@ int arch_domain_create(struct domain *d, unsigned 
int domcr_flags)
      d->arch.hvm_domain.mem_sharing_enabled = 0;

      d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity);
+    d->arch.hvm_domain.is_vmware_port_enabled =
+        !!(domcr_flags & DOMCRF_vmware_port);

      INIT_LIST_HEAD(&d->arch.pdev_list);

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4039061..1357079 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -61,6 +61,7 @@
  #include <asm/hvm/trace.h>
  #include <asm/hvm/nestedhvm.h>
  #include <asm/hvm/vmware.h>
+#include <asm/hvm/vmport.h>
  #include <asm/mtrr.h>
  #include <asm/apic.h>
  #include <public/sched.h>
@@ -1444,6 +1445,9 @@ int hvm_domain_initialise(struct domain *d)
          goto fail1;
      d->arch.hvm_domain.io_handler->num_slot = 0;

+    if ( d->arch.hvm_domain.is_vmware_port_enabled )
+        vmport_register(d);
+
      if ( is_pvh_domain(d) )
      {
          register_portio_handler(d, 0, 0x10003, handle_pvh_io);
diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 37a1ece..cfad9ab 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -50,7 +50,7 @@ static unsigned int is_prefix(u8 opc)
      return 0;
  }

-static unsigned long svm_rip2pointer(struct vcpu *v)
+unsigned long svm_rip2pointer(struct vcpu *v)
  {
      struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
      unsigned long p = vmcb->cs.base + guest_cpu_user_regs()->eip;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e3e1565..d7f13d9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -59,6 +59,7 @@
  #include <public/sched.h>
  #include <asm/hvm/vpt.h>
  #include <asm/hvm/trace.h>
+#include <asm/hvm/vmport.h>
  #include <asm/hap.h>
  #include <asm/apic.h>
  #include <asm/debugger.h>
@@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
      return;
  }

+static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
+                                    struct vcpu *v)
+{
+    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
+    /*
+     * Just use 15 for the instruction length; vmport_gp_check will
+     * adjust it.  This is because
+     * __get_instruction_length_from_list() has issues, and may
+     * require a double read of the instruction bytes.  At some
+     * point a new routine could be added that is based on the code
+     * in vmport_gp_check with extensions to make it more general.
+     * Since that routine is the only user of this code this can be
+     * done later.
+     */
+    unsigned long inst_len = 15;
+    unsigned long inst_addr = svm_rip2pointer(v);
+    int rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
+                             vmcb->exitinfo1, vmcb->exitinfo2);
+
+    if ( !rc )
+        __update_guest_eip(regs, inst_len);
+    else
+        hvm_inject_hw_exception(TRAP_gp_fault, vmcb->exitinfo1);
+}
+
  static void svm_vmexit_ud_intercept(struct cpu_user_regs *regs)
  {
      struct hvm_emulate_ctxt ctxt;
@@ -2471,6 +2497,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
          break;
      }

+    case VMEXIT_EXCEPTION_GP:
+        svm_vmexit_gp_intercept(regs, v);
+        break;
+
      case VMEXIT_EXCEPTION_UD:
          svm_vmexit_ud_intercept(regs);
          break;
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 21292bb..45ead61 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -195,6 +195,8 @@ static int construct_vmcb(struct vcpu *v)
          HVM_TRAP_MASK
          | (1U << TRAP_no_device);

+    if ( v->domain->arch.hvm_domain.is_vmware_port_enabled )
+        vmcb->_exception_intercepts |= 1U << TRAP_gp_fault;
      if ( paging_mode_hap(v->domain) )
      {
          vmcb->_np_enable = 1; /* enable nested paging */
diff --git a/xen/arch/x86/hvm/vmware/Makefile 
b/xen/arch/x86/hvm/vmware/Makefile
index 3fb2e0b..cd8815b 100644
--- a/xen/arch/x86/hvm/vmware/Makefile
+++ b/xen/arch/x86/hvm/vmware/Makefile
@@ -1 +1,2 @@
  obj-y += cpuid.o
+obj-y += vmport.o
diff --git a/xen/arch/x86/hvm/vmware/vmport.c 
b/xen/arch/x86/hvm/vmware/vmport.c
new file mode 100644
index 0000000..183bb7e
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/vmport.c
@@ -0,0 +1,262 @@
+/*
+ * HVM VMPORT emulation
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+#include <xen/lib.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmport.h>
+
+#include "backdoor_def.h"
+
+#define MAX_INST_LEN 15
+
+#ifndef NDEBUG
+unsigned int opt_vmport_debug __read_mostly;
+integer_param("vmport_debug", opt_vmport_debug);
+#endif
+
+/* More VMware defines */
+
+#define VMWARE_GUI_AUTO_GRAB              0x001
+#define VMWARE_GUI_AUTO_UNGRAB            0x002
+#define VMWARE_GUI_AUTO_SCROLL            0x004
+#define VMWARE_GUI_AUTO_RAISE             0x008
+#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
+#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
+#define VMWARE_GUI_FULL_SCREEN            0x040
+
+#define VMWARE_GUI_TO_FULL_SCREEN         0x080
+#define VMWARE_GUI_TO_WINDOW              0x100
+
+#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
+
+#define VMWARE_GUI_SYNC_TIME              0x400
+
+/* When set, toolboxes should not show the cursor options page. */
+#define VMWARE_DISABLE_CURSOR_OPTIONS     0x800
+
+void vmport_register(struct domain *d)
+{
+    register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
+}
+
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    uint16_t cmd = regs->rcx;
+    int rc = X86EMUL_OKAY;
+
+    if ( regs->_eax == BDOOR_MAGIC )
+    {
+        uint64_t saved_rax = regs->rax;
+        uint64_t value;
+        struct vcpu *curr = current;
+        struct domain *d = curr->domain;
+        struct segment_register sreg;
+
+        switch ( cmd )
+        {
+        case BDOOR_CMD_GETMHZ:
+            regs->_eax = d->arch.tsc_khz / 1000;
+            break;
+        case BDOOR_CMD_GETVERSION:
+            /* MAGIC */
+            regs->_ebx = BDOOR_MAGIC;
+            /* VERSION_MAGIC */
+            regs->_eax = 6;
+            /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */
+            regs->_ecx = 2;
+            break;
+        case BDOOR_CMD_GETSCREENSIZE:
+            /* We have no screen size */
+            regs->_eax = ~0u;
+            break;
+        case BDOOR_CMD_GETHWVERSION:
+            /* vmware_hw */
+            regs->_eax = 0;
+            if ( is_hvm_vcpu(curr) )
+            {
+                struct hvm_domain *hd = &d->arch.hvm_domain;
+
+                regs->_eax = hd->params[HVM_PARAM_VMWARE_HW];
+            }
+            if ( !regs->_eax )
+                regs->_eax = 4;  /* Act like version 4 */
+            break;
+        case BDOOR_CMD_GETHZ:
+            hvm_get_segment_register(curr, x86_seg_ss, &sreg);
+            if ( sreg.attr.fields.dpl == 0 )
+            {
+                value = d->arch.tsc_khz * 1000;
+                /* apic-frequency (bus speed) */
+                regs->_ecx = 1000000000ULL / APIC_BUS_CYCLE_NS;
+                /* High part of tsc-frequency */
+                regs->_ebx = value >> 32;
+                /* Low part of tsc-frequency */
+                regs->_eax = value;
+            }
+            break;
+        case BDOOR_CMD_GETTIME:
+            value = get_localtime_us(d) - d->time_offset_seconds * 
1000000ULL;
+            /* hostUsecs */
+            regs->_ebx = value % 1000000UL;
+            /* hostSecs */
+            regs->_eax = value / 1000000ULL;
+            /* maxTimeLag */
+            regs->_ecx = 1000000;
+            /* offset to GMT in minutes */
+            regs->_edx = d->time_offset_seconds / 60;
+            break;
+        case BDOOR_CMD_GETTIMEFULL:
+            value = get_localtime_us(d) - d->time_offset_seconds * 
1000000ULL;
+            /* ... */
+            regs->_eax = BDOOR_MAGIC;
+            /* hostUsecs */
+            regs->_ebx =value / 1000000ULL;
+            /* maxTimeLag */
+            regs->_ecx = 1000000;
+            break;
+        case BDOOR_CMD_GETGUIOPTIONS:
+            regs->_eax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB |
+                VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME |
+                VMWARE_DISABLE_CURSOR_OPTIONS;
+            break;
+        case BDOOR_CMD_SETGUIOPTIONS:
+            regs->_eax = 0x0;
+            break;
+        default:
+            regs->_eax = ~0u;
+            break;
+        }
+        if ( dir == IOREQ_READ )
+        {
+            switch ( bytes )
+            {
+            case 1:
+                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
+                break;
+            case 2:
+                regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 
0xffff);
+                break;
+            case 4:
+                regs->rax = regs->_eax;
+                break;
+            }
+            *val = regs->rax;
+        }
+        else
+            regs->rax = saved_rax;
+    }
+    else
+        rc = X86EMUL_UNHANDLEABLE;
+
+    return rc;
+}
+
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2)
+{
+    if ( !v->domain->arch.hvm_domain.is_vmware_port_enabled )
+        return X86EMUL_VMPORT_NOT_ENABLED;
+
+    if ( *inst_len && *inst_len <= MAX_INST_LEN &&
+         (regs->rdx & 0xffff) == BDOOR_PORT && ei1 == 0 && ei2 == 0 &&
+         regs->_eax == BDOOR_MAGIC )
+    {
+        int i = 0;
+        uint32_t val;
+        uint32_t byte_cnt = hvm_guest_x86_mode(v);
+        unsigned char bytes[MAX_INST_LEN];
+        unsigned int fetch_len;
+        int frc;
+
+        /* in or out are limited to 32bits */
+        if ( byte_cnt > 4 )
+            byte_cnt = 4;
+
+        /*
+         * Fetch up to the next page break; we'll fetch from the
+         * next page later if we have to.
+         */
+        fetch_len = min_t(unsigned int, *inst_len,
+                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
+        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, 
fetch_len,
+                                                PFEC_page_present);
+        if ( frc != HVMCOPY_okay )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Bad instruction fetch at %#lx (frc=%d il=%lu 
fl=%u)\n",
+                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
+            return X86EMUL_VMPORT_FETCH_ERROR_BYTE1;
+        }
+
+        /* Check for operand size prefix */
+        while ( (i < MAX_INST_LEN) && (bytes[i] == 0x66) )
+        {
+            i++;
+            if ( i >= fetch_len )
+            {
+                frc = hvm_fetch_from_guest_virt_nofault(
+                    &bytes[fetch_len], inst_addr + fetch_len,
+                    MAX_INST_LEN - fetch_len, PFEC_page_present);
+                if ( frc != HVMCOPY_okay )
+                {
+                    gdprintk(XENLOG_WARNING,
+                             "Bad instruction fetch at %#lx + %#x 
(frc=%d)\n",
+                             inst_addr, fetch_len, frc);
+                    return X86EMUL_VMPORT_FETCH_ERROR_BYTE2;
+                }
+                fetch_len = MAX_INST_LEN;
+            }
+        }
+        *inst_len = i + 1;
+
+        /* Only adjust byte_cnt 1 time */
+        if ( bytes[0] == 0x66 )     /* operand size prefix */
+        {
+            if ( byte_cnt == 4 )
+                byte_cnt = 2;
+            else
+                byte_cnt = 4;
+        }
+        if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
+            return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
+        else if ( bytes[i] == 0xec )     /* in (%dx),%al */
+            return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
+        else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out 
%ax,(%dx) */
+            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
+        else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
+            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
+        else
+        {
+            *inst_len = 0; /* This is unknown. */
+            return X86EMUL_VMPORT_BAD_OPCODE;
+        }
+    }
+    *inst_len = 0; /* This is unknown. */
+    return X86EMUL_VMPORT_BAD_STATE;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9d8033e..1bab216 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v)

      v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
                | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
+              | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                 (1U << TRAP_gp_fault) : 0)
                | (1U << TRAP_no_device);
      vmx_update_exception_bitmap(v);

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 304aeea..300d804 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -44,6 +44,7 @@
  #include <asm/hvm/support.h>
  #include <asm/hvm/vmx/vmx.h>
  #include <asm/hvm/vmx/vmcs.h>
+#include <asm/hvm/vmport.h>
  #include <public/sched.h>
  #include <public/hvm/ioreq.h>
  #include <asm/hvm/vpic.h>
@@ -1276,9 +1277,11 @@ static void vmx_update_guest_cr(struct vcpu *v, 
unsigned int cr)
                          vmx_set_segment_register(
                              v, s, &v->arch.hvm_vmx.vm86_saved_seg[s]);
                  v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
-                          | (paging_mode_hap(v->domain) ?
-                             0 : (1U << TRAP_page_fault))
-                          | (1U << TRAP_no_device);
+                    | (paging_mode_hap(v->domain) ?
+                       0 : (1U << TRAP_page_fault))
+                    | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                       (1U << TRAP_gp_fault) : 0)
+                    | (1U << TRAP_no_device);
                  vmx_update_exception_bitmap(v);
                  vmx_update_debug_state(v);
              }
@@ -2589,6 +2592,57 @@ static void vmx_idtv_reinject(unsigned long 
idtv_info)
      }
  }

+static unsigned long vmx_rip2pointer(struct cpu_user_regs *regs,
+                                     struct vcpu *v)
+{
+    struct segment_register cs;
+    unsigned long p;
+
+    vmx_get_segment_register(v, x86_seg_cs, &cs);
+    p = cs.base + regs->rip;
+    if ( !(cs.attr.fields.l && hvm_long_mode_enabled(v)) )
+        return (uint32_t)p; /* mask to 32 bits */
+    return p;
+}
+
+static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
+                                    struct vcpu *v)
+{
+    unsigned long exit_qualification;
+    unsigned long inst_len;
+    unsigned long inst_addr = vmx_rip2pointer(regs, v);
+    unsigned long ecode;
+    int rc;
+#ifndef NDEBUG
+    unsigned long orig_inst_len;
+    unsigned long vector;
+
+    __vmread(VM_EXIT_INTR_INFO, &vector);
+    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
+    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
+#endif
+
+    __vmread(EXIT_QUALIFICATION, &exit_qualification);
+    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
+    __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
+
+#ifndef NDEBUG
+    orig_inst_len = inst_len;
+#endif
+    rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
+                         ecode, exit_qualification);
+#ifndef NDEBUG
+    if ( inst_len && orig_inst_len != inst_len )
+        gdprintk(XENLOG_WARNING,
+                 "Unexpected instruction length difference: %lu vs %lu\n",
+                 orig_inst_len, inst_len);
+#endif
+    if ( !rc )
+        update_guest_eip();
+    else
+        hvm_inject_hw_exception(TRAP_gp_fault, ecode);
+}
+
  static int vmx_handle_apic_write(void)
  {
      unsigned long exit_qualification;
@@ -2814,6 +2868,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
              HVMTRACE_1D(TRAP, vector);
              vmx_fpu_dirty_intercept();
              break;
+        case TRAP_gp_fault:
+            vmx_vmexit_gp_intercept(regs, v);
+            break;
          case TRAP_page_fault:
              __vmread(EXIT_QUALIFICATION, &exit_qualification);
              __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 9ccc03f..8e07f92 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -24,6 +24,7 @@
  #include <asm/types.h>
  #include <asm/mtrr.h>
  #include <asm/p2m.h>
+#include <asm/hvm/vmport.h>
  #include <asm/hvm/vmx/vmx.h>
  #include <asm/hvm/vmx/vvmx.h>
  #include <asm/hvm/nestedhvm.h>
@@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
              if ( v->fpu_dirtied )
                  nvcpu->nv_vmexit_pending = 1;
          }
+        else if ( vector == TRAP_gp_fault )
+            nvcpu->nv_vmexit_pending = 1;
          else if ( (intr_info & valid_mask) == valid_mask )
          {
              exec_bitmap =__get_vvmcs(nvcpu->nv_vvmcx, EXCEPTION_BITMAP);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 30c9e50..fad55a2 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -543,6 +543,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
               ~(XEN_DOMCTL_CDF_hvm_guest
                 | XEN_DOMCTL_CDF_pvh_guest
                 | XEN_DOMCTL_CDF_hap
+               | XEN_DOMCTL_CDF_vmware_port
                 | XEN_DOMCTL_CDF_s3_integrity
                 | XEN_DOMCTL_CDF_oos_off)) )
              break;
@@ -586,6 +587,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
              domcr_flags |= DOMCRF_s3_integrity;
          if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_oos_off )
              domcr_flags |= DOMCRF_oos_off;
+        if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_vmware_port )
+            domcr_flags |= DOMCRF_vmware_port;

          d = domain_create(dom, domcr_flags, op->u.createdomain.ssidref);
          if ( IS_ERR(d) )
diff --git a/xen/include/asm-x86/hvm/domain.h 
b/xen/include/asm-x86/hvm/domain.h
index 2757c7f..d4718df 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -121,6 +121,9 @@ struct hvm_domain {
      spinlock_t             uc_lock;
      bool_t                 is_in_uc_mode;

+    /* VMware backdoor port available */
+    bool_t                 is_vmware_port_enabled;
+
      /* Pass-through */
      struct hvm_iommu       hvm_iommu;

diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 886a9d6..d257161 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,7 +25,7 @@
  #include <public/hvm/ioreq.h>
  #include <public/event_channel.h>

-#define MAX_IO_HANDLER             16
+#define MAX_IO_HANDLER             17

  #define HVM_PORTIO                  0
  #define HVM_BUFFERED_IO             2
diff --git a/xen/include/asm-x86/hvm/svm/emulate.h 
b/xen/include/asm-x86/hvm/svm/emulate.h
index ccc2d3c..d9a9dc5 100644
--- a/xen/include/asm-x86/hvm/svm/emulate.h
+++ b/xen/include/asm-x86/hvm/svm/emulate.h
@@ -44,6 +44,7 @@ enum instruction_index {

  struct vcpu;

+unsigned long svm_rip2pointer(struct vcpu *v);
  int __get_instruction_length_from_list(
      struct vcpu *, const enum instruction_index *, unsigned int 
list_count);

diff --git a/xen/include/asm-x86/hvm/vmport.h 
b/xen/include/asm-x86/hvm/vmport.h
new file mode 100644
index 0000000..d037d55
--- /dev/null
+++ b/xen/include/asm-x86/hvm/vmport.h
@@ -0,0 +1,52 @@
+/*
+ * asm/hvm/vmport.h: HVM VMPORT emulation
+ *
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ASM_X86_HVM_VMPORT_H__
+#define ASM_X86_HVM_VMPORT_H__
+
+void vmport_register(struct domain *d);
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2);
+/*
+ * Additional return values from vmport_gp_check.
+ *
+ * Note: return values include:
+ *   X86EMUL_OKAY
+ *   X86EMUL_UNHANDLEABLE
+ *   X86EMUL_EXCEPTION
+ *   X86EMUL_RETRY
+ *   X86EMUL_CMPXCHG_FAILED
+ *
+ * The additional do not overlap any of the above.
+ */
+#define X86EMUL_VMPORT_NOT_ENABLED              10
+#define X86EMUL_VMPORT_FETCH_ERROR_BYTE1        11
+#define X86EMUL_VMPORT_FETCH_ERROR_BYTE2        12
+#define X86EMUL_VMPORT_BAD_OPCODE               13
+#define X86EMUL_VMPORT_BAD_STATE                14
+
+#endif /* ASM_X86_HVM_VMPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 61f7555..2b38515 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -63,6 +63,9 @@ struct xen_domctl_createdomain {
   /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
  #define _XEN_DOMCTL_CDF_pvh_guest     4
  #define XEN_DOMCTL_CDF_pvh_guest (1U<<_XEN_DOMCTL_CDF_pvh_guest)
+ /* Is VMware backdoor port available? */
+#define _XEN_DOMCTL_CDF_vmware_port   5
+#define XEN_DOMCTL_CDF_vmware_port (1U<<_XEN_DOMCTL_CDF_vmware_port)
      uint32_t flags;
  };
  typedef struct xen_domctl_createdomain xen_domctl_createdomain_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c5157e6..d741978 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -546,6 +546,9 @@ struct domain *domain_create(
   /* DOMCRF_pvh: Create PV domain in HVM container. */
  #define _DOMCRF_pvh             5
  #define DOMCRF_pvh              (1U<<_DOMCRF_pvh)
+ /* DOMCRF_vmware_port: Enable use of vmware backdoor port. */
+#define _DOMCRF_vmware_port     6
+#define DOMCRF_vmware_port      (1U<<_DOMCRF_vmware_port)

  /*
   * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
-- 
1.8.4


[-- Attachment #2: 0004-xen-Add-vmware_port-support.patch --]
[-- Type: text/x-patch, Size: 27185 bytes --]

>From 4db1093d0b420cc54258c0db03d991fa3b3acd7f Mon Sep 17 00:00:00 2001
From: Don Slutz <dslutz@verizon.com>
Date: Thu, 21 Nov 2013 15:01:08 -0500
Subject: [PATCH for-4.5 v7 4/7] xen: Add vmware_port support

This includes adding is_vmware_port_enabled

This is a new domain_create() flag, DOMCRF_vmware_port.  It is
passed to domctl as XEN_DOMCTL_CDF_vmware_port.

This enables limited support of VMware's hyper-call.

This is both a more complete support then in currently provided by
QEMU and/or KVM and less.  The missing part requires QEMU changes
and has been left out until the QEMU patches are accepted upstream.

VMware's hyper-call is also known as VMware Backdoor I/O Port.

Note: this support does not depend on vmware_hw being non-zero.

Summary is that VMware treats "in (%dx),%eax" (or "out %eax,(%dx)")
to port 0x5658 specially.  Note: since many operations return data
in EAX, "in (%dx),%eax" is the one to use.  The other lengths like
"in (%dx),%al" will still do things, only AL part of EAX will be
changed.  For "out %eax,(%dx)" of all lengths, EAX will remain
unchanged.

Also this instruction is allowed to be used from ring 3.  To
support this the vmexit for GP needs to be enabled.  I have not
fully tested that nested HVM is doing the right thing for this.

An open source example of using this is:

http://open-vm-tools.sourceforge.net/

Which only uses "inl (%dx)".  Also

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

The support included is enough to allow VMware tools to install in a
HVM domU.

For AMD (svm) the max instruction length of 15 is hard coded.  This
is because __get_instruction_length_from_list() has issues that when
called from #GP handler NRIP is not available, or that NRIP may not
be available at all on a particular HW, leading to the need read the
instruction twice --- once in __get_instruction_length_from_list()
and then again in vmport_gp_check(). Which is bad because memory may
change between the reads.

Signed-off-by: Don Slutz <dslutz@verizon.com>
---
v8:
    Switch to _ebx etc.

v7:
      More on AMD in the commit message.
      Switch to only change 32bit part of registers, what VMware
        does.
    Too much logging and tracing.
      Dropped a lot of it.  This includes vmport_debug=

v6:
      Dropped the attempt to use svm_nextrip_insn_length via
      __get_instruction_length (added in v2).  Just always look
      at upto 15 bytes on AMD.

v5:
      we should make sure that svm_vmexit_gp_intercept is not executed for
      any other guest.
        Added an ASSERT on is_vmware_port_enabled.
      magic integers?
        Added #define for them.
      I am fairly certain that you need some brackets here.
        Added brackets.

 xen/arch/x86/domain.c                 |   2 +
 xen/arch/x86/hvm/hvm.c                |   4 +
 xen/arch/x86/hvm/svm/emulate.c        |   2 +-
 xen/arch/x86/hvm/svm/svm.c            |  30 ++++
 xen/arch/x86/hvm/svm/vmcb.c           |   2 +
 xen/arch/x86/hvm/vmware/Makefile      |   1 +
 xen/arch/x86/hvm/vmware/vmport.c      | 262 ++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c           |   2 +
 xen/arch/x86/hvm/vmx/vmx.c            |  63 +++++++-
 xen/arch/x86/hvm/vmx/vvmx.c           |   3 +
 xen/common/domctl.c                   |   3 +
 xen/include/asm-x86/hvm/domain.h      |   3 +
 xen/include/asm-x86/hvm/io.h          |   2 +-
 xen/include/asm-x86/hvm/svm/emulate.h |   1 +
 xen/include/asm-x86/hvm/vmport.h      |  52 +++++++
 xen/include/public/domctl.h           |   3 +
 xen/include/xen/sched.h               |   3 +
 17 files changed, 433 insertions(+), 5 deletions(-)
 create mode 100644 xen/arch/x86/hvm/vmware/vmport.c
 create mode 100644 xen/include/asm-x86/hvm/vmport.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8cfd1ca..a71da52 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -524,6 +524,8 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags)
     d->arch.hvm_domain.mem_sharing_enabled = 0;
 
     d->arch.s3_integrity = !!(domcr_flags & DOMCRF_s3_integrity);
+    d->arch.hvm_domain.is_vmware_port_enabled =
+        !!(domcr_flags & DOMCRF_vmware_port);
 
     INIT_LIST_HEAD(&d->arch.pdev_list);
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4039061..1357079 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -61,6 +61,7 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/vmware.h>
+#include <asm/hvm/vmport.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
 #include <public/sched.h>
@@ -1444,6 +1445,9 @@ int hvm_domain_initialise(struct domain *d)
         goto fail1;
     d->arch.hvm_domain.io_handler->num_slot = 0;
 
+    if ( d->arch.hvm_domain.is_vmware_port_enabled )
+        vmport_register(d);
+
     if ( is_pvh_domain(d) )
     {
         register_portio_handler(d, 0, 0x10003, handle_pvh_io);
diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 37a1ece..cfad9ab 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -50,7 +50,7 @@ static unsigned int is_prefix(u8 opc)
     return 0;
 }
 
-static unsigned long svm_rip2pointer(struct vcpu *v)
+unsigned long svm_rip2pointer(struct vcpu *v)
 {
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
     unsigned long p = vmcb->cs.base + guest_cpu_user_regs()->eip;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e3e1565..d7f13d9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -59,6 +59,7 @@
 #include <public/sched.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/trace.h>
+#include <asm/hvm/vmport.h>
 #include <asm/hap.h>
 #include <asm/apic.h>
 #include <asm/debugger.h>
@@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
     return;
 }
 
+static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
+                                    struct vcpu *v)
+{
+    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
+    /*
+     * Just use 15 for the instruction length; vmport_gp_check will
+     * adjust it.  This is because
+     * __get_instruction_length_from_list() has issues, and may
+     * require a double read of the instruction bytes.  At some
+     * point a new routine could be added that is based on the code
+     * in vmport_gp_check with extensions to make it more general.
+     * Since that routine is the only user of this code this can be
+     * done later.
+     */
+    unsigned long inst_len = 15;
+    unsigned long inst_addr = svm_rip2pointer(v);
+    int rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
+                             vmcb->exitinfo1, vmcb->exitinfo2);
+
+    if ( !rc )
+        __update_guest_eip(regs, inst_len);
+    else
+        hvm_inject_hw_exception(TRAP_gp_fault, vmcb->exitinfo1);
+}
+
 static void svm_vmexit_ud_intercept(struct cpu_user_regs *regs)
 {
     struct hvm_emulate_ctxt ctxt;
@@ -2471,6 +2497,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
         break;
     }
 
+    case VMEXIT_EXCEPTION_GP:
+        svm_vmexit_gp_intercept(regs, v);
+        break;
+
     case VMEXIT_EXCEPTION_UD:
         svm_vmexit_ud_intercept(regs);
         break;
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 21292bb..45ead61 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -195,6 +195,8 @@ static int construct_vmcb(struct vcpu *v)
         HVM_TRAP_MASK
         | (1U << TRAP_no_device);
 
+    if ( v->domain->arch.hvm_domain.is_vmware_port_enabled )
+        vmcb->_exception_intercepts |= 1U << TRAP_gp_fault;
     if ( paging_mode_hap(v->domain) )
     {
         vmcb->_np_enable = 1; /* enable nested paging */
diff --git a/xen/arch/x86/hvm/vmware/Makefile b/xen/arch/x86/hvm/vmware/Makefile
index 3fb2e0b..cd8815b 100644
--- a/xen/arch/x86/hvm/vmware/Makefile
+++ b/xen/arch/x86/hvm/vmware/Makefile
@@ -1 +1,2 @@
 obj-y += cpuid.o
+obj-y += vmport.o
diff --git a/xen/arch/x86/hvm/vmware/vmport.c b/xen/arch/x86/hvm/vmware/vmport.c
new file mode 100644
index 0000000..183bb7e
--- /dev/null
+++ b/xen/arch/x86/hvm/vmware/vmport.c
@@ -0,0 +1,262 @@
+/*
+ * HVM VMPORT emulation
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/config.h>
+#include <xen/lib.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmport.h>
+
+#include "backdoor_def.h"
+
+#define MAX_INST_LEN 15
+
+#ifndef NDEBUG
+unsigned int opt_vmport_debug __read_mostly;
+integer_param("vmport_debug", opt_vmport_debug);
+#endif
+
+/* More VMware defines */
+
+#define VMWARE_GUI_AUTO_GRAB              0x001
+#define VMWARE_GUI_AUTO_UNGRAB            0x002
+#define VMWARE_GUI_AUTO_SCROLL            0x004
+#define VMWARE_GUI_AUTO_RAISE             0x008
+#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
+#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
+#define VMWARE_GUI_FULL_SCREEN            0x040
+
+#define VMWARE_GUI_TO_FULL_SCREEN         0x080
+#define VMWARE_GUI_TO_WINDOW              0x100
+
+#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
+
+#define VMWARE_GUI_SYNC_TIME              0x400
+
+/* When set, toolboxes should not show the cursor options page. */
+#define VMWARE_DISABLE_CURSOR_OPTIONS     0x800
+
+void vmport_register(struct domain *d)
+{
+    register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
+}
+
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    uint16_t cmd = regs->rcx;
+    int rc = X86EMUL_OKAY;
+
+    if ( regs->_eax == BDOOR_MAGIC )
+    {
+        uint64_t saved_rax = regs->rax;
+        uint64_t value;
+        struct vcpu *curr = current;
+        struct domain *d = curr->domain;
+        struct segment_register sreg;
+
+        switch ( cmd )
+        {
+        case BDOOR_CMD_GETMHZ:
+            regs->_eax = d->arch.tsc_khz / 1000;
+            break;
+        case BDOOR_CMD_GETVERSION:
+            /* MAGIC */
+            regs->_ebx = BDOOR_MAGIC;
+            /* VERSION_MAGIC */
+            regs->_eax = 6;
+            /* Claim we are an ESX. VMX_TYPE_SCALABLE_SERVER */
+            regs->_ecx = 2;
+            break;
+        case BDOOR_CMD_GETSCREENSIZE:
+            /* We have no screen size */
+            regs->_eax = ~0u;
+            break;
+        case BDOOR_CMD_GETHWVERSION:
+            /* vmware_hw */
+            regs->_eax = 0;
+            if ( is_hvm_vcpu(curr) )
+            {
+                struct hvm_domain *hd = &d->arch.hvm_domain;
+
+                regs->_eax = hd->params[HVM_PARAM_VMWARE_HW];
+            }
+            if ( !regs->_eax )
+                regs->_eax = 4;  /* Act like version 4 */
+            break;
+        case BDOOR_CMD_GETHZ:
+            hvm_get_segment_register(curr, x86_seg_ss, &sreg);
+            if ( sreg.attr.fields.dpl == 0 )
+            {
+                value = d->arch.tsc_khz * 1000;
+                /* apic-frequency (bus speed) */
+                regs->_ecx = 1000000000ULL / APIC_BUS_CYCLE_NS;
+                /* High part of tsc-frequency */
+                regs->_ebx = value >> 32;
+                /* Low part of tsc-frequency */
+                regs->_eax = value;
+            }
+            break;
+        case BDOOR_CMD_GETTIME:
+            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
+            /* hostUsecs */
+            regs->_ebx = value % 1000000UL;
+            /* hostSecs */
+            regs->_eax = value / 1000000ULL;
+            /* maxTimeLag */
+            regs->_ecx = 1000000;
+            /* offset to GMT in minutes */
+            regs->_edx = d->time_offset_seconds / 60;
+            break;
+        case BDOOR_CMD_GETTIMEFULL:
+            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
+            /* ... */
+            regs->_eax = BDOOR_MAGIC;
+            /* hostUsecs */
+            regs->_ebx =value / 1000000ULL;
+            /* maxTimeLag */
+            regs->_ecx = 1000000;
+            break;
+        case BDOOR_CMD_GETGUIOPTIONS:
+            regs->_eax = VMWARE_GUI_AUTO_GRAB | VMWARE_GUI_AUTO_UNGRAB |
+                VMWARE_GUI_AUTO_RAISE_DISABLED | VMWARE_GUI_SYNC_TIME |
+                VMWARE_DISABLE_CURSOR_OPTIONS;
+            break;
+        case BDOOR_CMD_SETGUIOPTIONS:
+            regs->_eax = 0x0;
+            break;
+        default:
+            regs->_eax = ~0u;
+            break;
+        }
+        if ( dir == IOREQ_READ )
+        {
+            switch ( bytes )
+            {
+            case 1:
+                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
+                break;
+            case 2:
+                regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff);
+                break;
+            case 4:
+                regs->rax = regs->_eax;
+                break;
+            }
+            *val = regs->rax;
+        }
+        else
+            regs->rax = saved_rax;
+    }
+    else
+        rc = X86EMUL_UNHANDLEABLE;
+
+    return rc;
+}
+
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2)
+{
+    if ( !v->domain->arch.hvm_domain.is_vmware_port_enabled )
+        return X86EMUL_VMPORT_NOT_ENABLED;
+
+    if ( *inst_len && *inst_len <= MAX_INST_LEN &&
+         (regs->rdx & 0xffff) == BDOOR_PORT && ei1 == 0 && ei2 == 0 &&
+         regs->_eax == BDOOR_MAGIC )
+    {
+        int i = 0;
+        uint32_t val;
+        uint32_t byte_cnt = hvm_guest_x86_mode(v);
+        unsigned char bytes[MAX_INST_LEN];
+        unsigned int fetch_len;
+        int frc;
+
+        /* in or out are limited to 32bits */
+        if ( byte_cnt > 4 )
+            byte_cnt = 4;
+
+        /*
+         * Fetch up to the next page break; we'll fetch from the
+         * next page later if we have to.
+         */
+        fetch_len = min_t(unsigned int, *inst_len,
+                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
+        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
+                                                PFEC_page_present);
+        if ( frc != HVMCOPY_okay )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
+                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
+            return X86EMUL_VMPORT_FETCH_ERROR_BYTE1;
+        }
+
+        /* Check for operand size prefix */
+        while ( (i < MAX_INST_LEN) && (bytes[i] == 0x66) )
+        {
+            i++;
+            if ( i >= fetch_len )
+            {
+                frc = hvm_fetch_from_guest_virt_nofault(
+                    &bytes[fetch_len], inst_addr + fetch_len,
+                    MAX_INST_LEN - fetch_len, PFEC_page_present);
+                if ( frc != HVMCOPY_okay )
+                {
+                    gdprintk(XENLOG_WARNING,
+                             "Bad instruction fetch at %#lx + %#x (frc=%d)\n",
+                             inst_addr, fetch_len, frc);
+                    return X86EMUL_VMPORT_FETCH_ERROR_BYTE2;
+                }
+                fetch_len = MAX_INST_LEN;
+            }
+        }
+        *inst_len = i + 1;
+
+        /* Only adjust byte_cnt 1 time */
+        if ( bytes[0] == 0x66 )     /* operand size prefix */
+        {
+            if ( byte_cnt == 4 )
+                byte_cnt = 2;
+            else
+                byte_cnt = 4;
+        }
+        if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
+            return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
+        else if ( bytes[i] == 0xec )     /* in (%dx),%al */
+            return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
+        else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out %ax,(%dx) */
+            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
+        else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
+            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
+        else
+        {
+            *inst_len = 0; /* This is unknown. */
+            return X86EMUL_VMPORT_BAD_OPCODE;
+        }
+    }
+    *inst_len = 0; /* This is unknown. */
+    return X86EMUL_VMPORT_BAD_STATE;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9d8033e..1bab216 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1102,6 +1102,8 @@ static int construct_vmcs(struct vcpu *v)
 
     v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
               | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
+              | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                 (1U << TRAP_gp_fault) : 0)
               | (1U << TRAP_no_device);
     vmx_update_exception_bitmap(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 304aeea..300d804 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -44,6 +44,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
+#include <asm/hvm/vmport.h>
 #include <public/sched.h>
 #include <public/hvm/ioreq.h>
 #include <asm/hvm/vpic.h>
@@ -1276,9 +1277,11 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
                         vmx_set_segment_register(
                             v, s, &v->arch.hvm_vmx.vm86_saved_seg[s]);
                 v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK
-                          | (paging_mode_hap(v->domain) ?
-                             0 : (1U << TRAP_page_fault))
-                          | (1U << TRAP_no_device);
+                    | (paging_mode_hap(v->domain) ?
+                       0 : (1U << TRAP_page_fault))
+                    | (v->domain->arch.hvm_domain.is_vmware_port_enabled ?
+                       (1U << TRAP_gp_fault) : 0)
+                    | (1U << TRAP_no_device);
                 vmx_update_exception_bitmap(v);
                 vmx_update_debug_state(v);
             }
@@ -2589,6 +2592,57 @@ static void vmx_idtv_reinject(unsigned long idtv_info)
     }
 }
 
+static unsigned long vmx_rip2pointer(struct cpu_user_regs *regs,
+                                     struct vcpu *v)
+{
+    struct segment_register cs;
+    unsigned long p;
+
+    vmx_get_segment_register(v, x86_seg_cs, &cs);
+    p = cs.base + regs->rip;
+    if ( !(cs.attr.fields.l && hvm_long_mode_enabled(v)) )
+        return (uint32_t)p; /* mask to 32 bits */
+    return p;
+}
+
+static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
+                                    struct vcpu *v)
+{
+    unsigned long exit_qualification;
+    unsigned long inst_len;
+    unsigned long inst_addr = vmx_rip2pointer(regs, v);
+    unsigned long ecode;
+    int rc;
+#ifndef NDEBUG
+    unsigned long orig_inst_len;
+    unsigned long vector;
+
+    __vmread(VM_EXIT_INTR_INFO, &vector);
+    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
+    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
+#endif
+
+    __vmread(EXIT_QUALIFICATION, &exit_qualification);
+    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
+    __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
+
+#ifndef NDEBUG
+    orig_inst_len = inst_len;
+#endif
+    rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
+                         ecode, exit_qualification);
+#ifndef NDEBUG
+    if ( inst_len && orig_inst_len != inst_len )
+        gdprintk(XENLOG_WARNING,
+                 "Unexpected instruction length difference: %lu vs %lu\n",
+                 orig_inst_len, inst_len);
+#endif
+    if ( !rc )
+        update_guest_eip();
+    else
+        hvm_inject_hw_exception(TRAP_gp_fault, ecode);
+}
+
 static int vmx_handle_apic_write(void)
 {
     unsigned long exit_qualification;
@@ -2814,6 +2868,9 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             HVMTRACE_1D(TRAP, vector);
             vmx_fpu_dirty_intercept();
             break;
+        case TRAP_gp_fault:
+            vmx_vmexit_gp_intercept(regs, v);
+            break;
         case TRAP_page_fault:
             __vmread(EXIT_QUALIFICATION, &exit_qualification);
             __vmread(VM_EXIT_INTR_ERROR_CODE, &ecode);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 9ccc03f..8e07f92 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -24,6 +24,7 @@
 #include <asm/types.h>
 #include <asm/mtrr.h>
 #include <asm/p2m.h>
+#include <asm/hvm/vmport.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/nestedhvm.h>
@@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
             if ( v->fpu_dirtied )
                 nvcpu->nv_vmexit_pending = 1;
         }
+        else if ( vector == TRAP_gp_fault )
+            nvcpu->nv_vmexit_pending = 1;
         else if ( (intr_info & valid_mask) == valid_mask )
         {
             exec_bitmap =__get_vvmcs(nvcpu->nv_vvmcx, EXCEPTION_BITMAP);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 30c9e50..fad55a2 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -543,6 +543,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
              ~(XEN_DOMCTL_CDF_hvm_guest
                | XEN_DOMCTL_CDF_pvh_guest
                | XEN_DOMCTL_CDF_hap
+               | XEN_DOMCTL_CDF_vmware_port
                | XEN_DOMCTL_CDF_s3_integrity
                | XEN_DOMCTL_CDF_oos_off)) )
             break;
@@ -586,6 +587,8 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             domcr_flags |= DOMCRF_s3_integrity;
         if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_oos_off )
             domcr_flags |= DOMCRF_oos_off;
+        if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_vmware_port )
+            domcr_flags |= DOMCRF_vmware_port;
 
         d = domain_create(dom, domcr_flags, op->u.createdomain.ssidref);
         if ( IS_ERR(d) )
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 2757c7f..d4718df 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -121,6 +121,9 @@ struct hvm_domain {
     spinlock_t             uc_lock;
     bool_t                 is_in_uc_mode;
 
+    /* VMware backdoor port available */
+    bool_t                 is_vmware_port_enabled;
+
     /* Pass-through */
     struct hvm_iommu       hvm_iommu;
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 886a9d6..d257161 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,7 +25,7 @@
 #include <public/hvm/ioreq.h>
 #include <public/event_channel.h>
 
-#define MAX_IO_HANDLER             16
+#define MAX_IO_HANDLER             17
 
 #define HVM_PORTIO                  0
 #define HVM_BUFFERED_IO             2
diff --git a/xen/include/asm-x86/hvm/svm/emulate.h b/xen/include/asm-x86/hvm/svm/emulate.h
index ccc2d3c..d9a9dc5 100644
--- a/xen/include/asm-x86/hvm/svm/emulate.h
+++ b/xen/include/asm-x86/hvm/svm/emulate.h
@@ -44,6 +44,7 @@ enum instruction_index {
 
 struct vcpu;
 
+unsigned long svm_rip2pointer(struct vcpu *v);
 int __get_instruction_length_from_list(
     struct vcpu *, const enum instruction_index *, unsigned int list_count);
 
diff --git a/xen/include/asm-x86/hvm/vmport.h b/xen/include/asm-x86/hvm/vmport.h
new file mode 100644
index 0000000..d037d55
--- /dev/null
+++ b/xen/include/asm-x86/hvm/vmport.h
@@ -0,0 +1,52 @@
+/*
+ * asm/hvm/vmport.h: HVM VMPORT emulation
+ *
+ *
+ * Copyright (C) 2012 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ASM_X86_HVM_VMPORT_H__
+#define ASM_X86_HVM_VMPORT_H__
+
+void vmport_register(struct domain *d);
+int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val);
+int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
+                    unsigned long *inst_len, unsigned long inst_addr,
+                    unsigned long ei1, unsigned long ei2);
+/*
+ * Additional return values from vmport_gp_check.
+ *
+ * Note: return values include:
+ *   X86EMUL_OKAY
+ *   X86EMUL_UNHANDLEABLE
+ *   X86EMUL_EXCEPTION
+ *   X86EMUL_RETRY
+ *   X86EMUL_CMPXCHG_FAILED
+ *
+ * The additional do not overlap any of the above.
+ */
+#define X86EMUL_VMPORT_NOT_ENABLED              10
+#define X86EMUL_VMPORT_FETCH_ERROR_BYTE1        11
+#define X86EMUL_VMPORT_FETCH_ERROR_BYTE2        12
+#define X86EMUL_VMPORT_BAD_OPCODE               13
+#define X86EMUL_VMPORT_BAD_STATE                14
+
+#endif /* ASM_X86_HVM_VMPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 61f7555..2b38515 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -63,6 +63,9 @@ struct xen_domctl_createdomain {
  /* Is this a PVH guest (as opposed to an HVM or PV guest)? */
 #define _XEN_DOMCTL_CDF_pvh_guest     4
 #define XEN_DOMCTL_CDF_pvh_guest      (1U<<_XEN_DOMCTL_CDF_pvh_guest)
+ /* Is VMware backdoor port available? */
+#define _XEN_DOMCTL_CDF_vmware_port   5
+#define XEN_DOMCTL_CDF_vmware_port    (1U<<_XEN_DOMCTL_CDF_vmware_port)
     uint32_t flags;
 };
 typedef struct xen_domctl_createdomain xen_domctl_createdomain_t;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c5157e6..d741978 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -546,6 +546,9 @@ struct domain *domain_create(
  /* DOMCRF_pvh: Create PV domain in HVM container. */
 #define _DOMCRF_pvh             5
 #define DOMCRF_pvh              (1U<<_DOMCRF_pvh)
+ /* DOMCRF_vmware_port: Enable use of vmware backdoor port. */
+#define _DOMCRF_vmware_port     6
+#define DOMCRF_vmware_port      (1U<<_DOMCRF_vmware_port)
 
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
-- 
1.8.4


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v8 2/7] tools: Add vmware_hw support
  2014-10-02 22:21   ` Andrew Cooper
@ 2014-10-02 22:56     ` Don Slutz
  0 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-02 22:56 UTC (permalink / raw)
  To: Andrew Cooper, Don Slutz, xen-devel
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Eddie Dong, Ian Jackson, Tim Deegan, George Dunlap,
	Aravind Gopalakrishnan, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit

[-- Attachment #1: Type: text/plain, Size: 13202 bytes --]

Attempt to send text patch.  Attached same data.
    -Don Slutz

-------------------------------------------------------------------
 From c64a364226edfbbc556523ae05bbe69380f2f6b4 Mon Sep 17 00:00:00 2001
From: Don Slutz <dslutz@verizon.com>
Date: Thu, 21 Nov 2013 14:28:00 -0500
Subject: [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support

This is used to set HVM_PARAM_VMWARE_HW. It is set to the VMware
virtual hardware version.

Currently 0, 3-4, 6-11 are good values.  However the code only
checks for == 0 or != 0.

If non-zero then
   default VGA to VMware's VGA.

Also now allows vga=vmware

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8:
   Adjusted as Andrew Cooper segested.
   Added Reviewed-by: Andrew Cooper

  docs/man/xl.cfg.pod.5               | 25 +++++++++++++++++++++++--
  docs/misc/hypervisor-cpuid.markdown | 33 +++++++++++++++++++++++++++++++++
  tools/libxc/xc_domain_restore.c     | 14 ++++++++++++++
  tools/libxc/xc_domain_save.c        | 11 +++++++++++
  tools/libxc/xg_save_restore.h       |  2 ++
  tools/libxl/libxl.h                 | 10 ++++++++++
  tools/libxl/libxl_create.c          | 12 +++++++++---
  tools/libxl/libxl_dm.c              |  8 ++++++++
  tools/libxl/libxl_dom.c             |  2 ++
  tools/libxl/libxl_types.idl         |  2 ++
  tools/libxl/xl_cmdimpl.c            |  4 ++++
  11 files changed, 118 insertions(+), 5 deletions(-)
  create mode 100644 docs/misc/hypervisor-cpuid.markdown

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8bba21c..6628cfc 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1197,6 +1197,23 @@ The viridian option can be specified as a 
boolean. A value of true (1)
  is equivalent to the list [ "defaults" ], and a value of false (0) is
  equivalent to an empty list.

+=item B<vmware_hw=NUMBER>
+
+Turns on or off the exposure of VMware cpuid.  The number is
+VMware's hardware version number, where 0 is off.  If not zero it
+changes the default VGA to VMware's VGA.
+
+The hardware version number (vmware_hw) come from VMware config files.
+
+=over 4
+
+In a .vmx it is virtualHW.version
+
+In a .ovf it is part of the value of vssd:VirtualSystemType.
+For vssd:VirtualSystemType == vmx-07, vmware_hw = 7.
+
+=back
+
  =back

  =head3 Emulated VGA Graphics Device
@@ -1233,10 +1250,14 @@ later (e.g. Windows XP onwards) then you should 
enable this.
  stdvga supports more video ram and bigger resolutions than Cirrus.
  This option is deprecated, use vga="stdvga" instead.

+The deprecated B<stdvga=0> prevents the usage of vmware by default
+if B<vmware_hw> is non-zero.
+
  =item B<vga="STRING">

-Selects the emulated video card (none|stdvga|cirrus).
-The default is cirrus.
+Selects the emulated video card (none|stdvga|cirrus|vmware).
+The default is cirrus unless B<vmware_hw> is non-zero in which case it
+is vmware.

  =item B<vnc=BOOLEAN>

diff --git a/docs/misc/hypervisor-cpuid.markdown 
b/docs/misc/hypervisor-cpuid.markdown
new file mode 100644
index 0000000..3d4304c
--- /dev/null
+++ b/docs/misc/hypervisor-cpuid.markdown
@@ -0,0 +1,33 @@
+Hypervisor Cpuid
+================
+
+There is no agreed standard for the use of hypervisor cpuid leaves.
+
+AMD (Vol3, Appendix E.3.9) reserves 0x40000000 to 0x400000ff for
+hypervisor use, while Intel (Vol 2a, 3.2 CPUID) guarantees that no
+existing or future CPUs will use the range 0x40000000 to 0x4fffffff.
+
+Different hypervisors use the space as follows:
+
+MicroSoft Hyper-V (AKA viridian) leaves currently must be at
+0x40000000.
+
+VMware leaves currently must be at 0x40000000.
+
+KVM leaves currently must be at 0x40000000 (from Seabios).
+
+Xen leaves can be found at the first otherwise unused 0x100 aligned
+offset between 0x40000000 and 0x40010000.
+
+http://download.microsoft.com/download/F/B/0/FB0D01A3-8E3A-4F5F-AA59-08C8026D3B8A/requirements-for-implementing-microsoft-hypervisor-interface.docx
+
+http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
+
+http://lwn.net/Articles/301888/
+  Attempted to get this cleaned up.
+
+So if Viridian or VMware_hw is selected, return their format for the
+range 0x40000000 to 0x400000ff. And return Xen format for the range
+0x40000100 to 0x400001ff.
+
+Otherwise return Xen format for the range 0x40000000 to 0x400000ff.
diff --git a/tools/libxc/xc_domain_restore.c 
b/tools/libxc/xc_domain_restore.c
index d8bd9b3..d262fa0 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -743,6 +743,7 @@ typedef struct {
      uint64_t vm_generationid_addr;
      uint64_t ioreq_server_pfn;
      uint64_t nr_ioreq_server_pages;
+    uint64_t vmware_hw;

      struct toolstack_data_t tdata;
  } pagebuf_t;
@@ -927,6 +928,16 @@ static int pagebuf_get_one(xc_interface *xch, 
struct restore_ctx *ctx,
          }
          return pagebuf_get_one(xch, ctx, buf, fd, dom);

+    case XC_SAVE_ID_HVM_VMWARE_HW:
+        /* Skip padding 4 bytes then read the vmware hw version. */
+        if ( RDEXACT(fd, &buf->vmware_hw, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->vmware_hw, sizeof(uint64_t)) )
+        {
+            PERROR("error read the vmware_hw value");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
      case XC_SAVE_ID_TOOLSTACK:
          {
              if ( RDEXACT(fd, &buf->tdata.len, sizeof(buf->tdata.len)) )
@@ -1774,6 +1785,9 @@ int xc_domain_restore(xc_interface *xch, int 
io_fd, uint32_t dom,
          }
      }

+    if (pagebuf.vmware_hw != 0)
+        xc_set_hvm_param(xch, dom, HVM_PARAM_VMWARE_HW, pagebuf.vmware_hw);
+
      if (pagebuf.acpi_ioport_location == 1) {
          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
          xc_hvm_param_set(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..76dc307 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1750,6 +1750,17 @@ int xc_domain_save(xc_interface *xch, int io_fd, 
uint32_t dom, uint32_t max_iter
              PERROR("Error when writing the ioreq server gmfn count");
              goto out;
          }
+
+        chunk.id = XC_SAVE_ID_HVM_VMWARE_HW;
+        chunk.data = 0;
+        xc_hvm_param_get(xch, dom, HVM_PARAM_VMWARE_HW, &chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the vmware_hw value");
+            goto out;
+        }
      }

      if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index bdd9009..d185ba9 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -262,6 +262,8 @@
  /* These are a pair; it is an error for one to exist without the other */
  #define XC_SAVE_ID_HVM_IOREQ_SERVER_PFN -19
  #define XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES -20
+/* VMware data */
+#define XC_SAVE_ID_HVM_VMWARE_HW      -21

  /*
  ** We process save/restore/migrate in batches of pages; the below
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 2700cc1..09faa04 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -158,6 +158,16 @@
  #define LIBXL_BUILDINFO_HVM_VIRIDIAN_ENABLE_DISABLE_WIDTH 64

  /*
+ * The libxl_vga_interface_type has the type for vmware.
+ */
+#define LIBXL_HAVE_LIBXL_VGA_INTERFACE_TYPE_VMWARE 1
+
+/*
+ * libxl_domain_build_info has the u.hvm.vmware_hw field.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_VMWARE_HW 1
+
+/*
   * libxl ABI compatibility
   *
   * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f7f178e..9f4e03c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -227,8 +227,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
          if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
              b_info->shadow_memkb = 0;

-        if (!b_info->u.hvm.vga.kind)
-            b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        if (!b_info->u.hvm.vga.kind) {
+            if (b_info->u.hvm.vmware_hw)
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
+            else
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        }

          switch (b_info->device_model_version) {
          case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
@@ -428,13 +432,15 @@ int libxl__domain_build(libxl__gc *gc,
          vments[4] = "start_time";
          vments[5] = libxl__sprintf(gc, "%lu.%02d", 
start_time.tv_sec,(int)start_time.tv_usec/10000);

-        localents = libxl__calloc(gc, 7, sizeof(char *));
+        localents = libxl__calloc(gc, 9, sizeof(char *));
          localents[0] = "platform/acpi";
          localents[1] = libxl_defbool_val(info->u.hvm.acpi) ? "1" : "0";
          localents[2] = "platform/acpi_s3";
          localents[3] = libxl_defbool_val(info->u.hvm.acpi_s3) ? "1" : "0";
          localents[4] = "platform/acpi_s4";
          localents[5] = libxl_defbool_val(info->u.hvm.acpi_s4) ? "1" : "0";
+        localents[6] = "platform/vmware_hw";
+        localents[7] = libxl__sprintf(gc, "%"PRId64, 
info->u.hvm.vmware_hw);

          break;
      case LIBXL_DOMAIN_TYPE_PV:
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 0018113..8bd6414 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -243,6 +243,9 @@ static char ** 
libxl__build_device_model_args_old(libxl__gc *gc,
          case LIBXL_VGA_INTERFACE_TYPE_NONE:
              flexarray_append_pair(dm_args, "-vga", "none");
              break;
+        case LIBXL_VGA_INTERFACE_TYPE_VMWARE:
+            flexarray_append_pair(dm_args, "-vga", "vmware");
+            break;
          }

          if (b_info->u.hvm.boot) {
@@ -555,6 +558,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,
              break;
          case LIBXL_VGA_INTERFACE_TYPE_NONE:
              break;
+        case LIBXL_VGA_INTERFACE_TYPE_VMWARE:
+            flexarray_append_pair(dm_args, "-device",
+                GCSPRINTF("vmware-svga,vgamem_mb=%d",
+                libxl__sizekb_to_mb(b_info->video_memkb)));
+            break;
          }

          if (b_info->u.hvm.boot) {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d63ae1b..b0f0513 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -290,6 +290,8 @@ static void hvm_set_conf_params(xc_interface 
*handle, uint32_t domid,
  #if defined(__i386__) || defined(__x86_64__)
      xc_hvm_param_set(handle, domid, HVM_PARAM_HPET_ENABLED,
                      libxl_defbool_val(info->u.hvm.hpet));
+    xc_set_hvm_param(handle, domid, HVM_PARAM_VMWARE_HW,
+                     info->u.hvm.vmware_hw);
  #endif
      xc_hvm_param_set(handle, domid, HVM_PARAM_TIMER_MODE, 
timer_mode(info));
      xc_hvm_param_set(handle, domid, HVM_PARAM_VPT_ALIGN,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index bbb03e2..5d25b77 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -175,6 +175,7 @@ libxl_vga_interface_type = 
Enumeration("vga_interface_type", [
      (1, "CIRRUS"),
      (2, "STD"),
      (3, "NONE"),
+    (4, "VMWARE"),
      ], init_val = "LIBXL_VGA_INTERFACE_TYPE_CIRRUS")

  libxl_vendor_device = Enumeration("vendor_device", [
@@ -391,6 +392,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                         ("timeoffset", string),
                                         ("hpet", libxl_defbool),
                                         ("vpt_align", libxl_defbool),
+                                       ("vmware_hw", uint64),
                                         ("timer_mode", libxl_timer_mode),
                                         ("nested_hvm", libxl_defbool),
                                         ("smbios_firmware", string),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c734f79..307a9a9 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1111,6 +1111,8 @@ static void parse_config_data(const char 
*config_source,
              exit(-ERROR_FAIL);
          }

+        if (!xlu_cfg_get_long(config, "vmware_hw",  &l, 1))
+            b_info->u.hvm.vmware_hw = l;
          if (!xlu_cfg_get_long(config, "timer_mode", &l, 1)) {
              const char *s = libxl_timer_mode_to_string(l);
              fprintf(stderr, "WARNING: specifying \"timer_mode\" as an 
integer is deprecated. "
@@ -1730,6 +1732,8 @@ skip_vfb:
                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
              } else if (!strcmp(buf, "none")) {
                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
+            } else if (!strcmp(buf, "vmware")) {
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
              } else {
                  fprintf(stderr, "Unknown vga \"%s\" specified\n", buf);
                  exit(1);
-- 
1.8.4


[-- Attachment #2: 0002-tools-Add-vmware_hw-support.patch --]
[-- Type: text/x-patch, Size: 13013 bytes --]

>From c64a364226edfbbc556523ae05bbe69380f2f6b4 Mon Sep 17 00:00:00 2001
From: Don Slutz <dslutz@verizon.com>
Date: Thu, 21 Nov 2013 14:28:00 -0500
Subject: [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support

This is used to set HVM_PARAM_VMWARE_HW. It is set to the VMware
virtual hardware version.

Currently 0, 3-4, 6-11 are good values.  However the code only
checks for == 0 or != 0.

If non-zero then
  default VGA to VMware's VGA.

Also now allows vga=vmware

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v8:
  Adjusted as Andrew Cooper segested.
  Added Reviewed-by: Andrew Cooper

 docs/man/xl.cfg.pod.5               | 25 +++++++++++++++++++++++--
 docs/misc/hypervisor-cpuid.markdown | 33 +++++++++++++++++++++++++++++++++
 tools/libxc/xc_domain_restore.c     | 14 ++++++++++++++
 tools/libxc/xc_domain_save.c        | 11 +++++++++++
 tools/libxc/xg_save_restore.h       |  2 ++
 tools/libxl/libxl.h                 | 10 ++++++++++
 tools/libxl/libxl_create.c          | 12 +++++++++---
 tools/libxl/libxl_dm.c              |  8 ++++++++
 tools/libxl/libxl_dom.c             |  2 ++
 tools/libxl/libxl_types.idl         |  2 ++
 tools/libxl/xl_cmdimpl.c            |  4 ++++
 11 files changed, 118 insertions(+), 5 deletions(-)
 create mode 100644 docs/misc/hypervisor-cpuid.markdown

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8bba21c..6628cfc 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1197,6 +1197,23 @@ The viridian option can be specified as a boolean. A value of true (1)
 is equivalent to the list [ "defaults" ], and a value of false (0) is
 equivalent to an empty list.
 
+=item B<vmware_hw=NUMBER>
+
+Turns on or off the exposure of VMware cpuid.  The number is
+VMware's hardware version number, where 0 is off.  If not zero it
+changes the default VGA to VMware's VGA.
+
+The hardware version number (vmware_hw) come from VMware config files.
+
+=over 4
+
+In a .vmx it is virtualHW.version
+
+In a .ovf it is part of the value of vssd:VirtualSystemType.
+For vssd:VirtualSystemType == vmx-07, vmware_hw = 7.
+
+=back
+
 =back
 
 =head3 Emulated VGA Graphics Device
@@ -1233,10 +1250,14 @@ later (e.g. Windows XP onwards) then you should enable this.
 stdvga supports more video ram and bigger resolutions than Cirrus.
 This option is deprecated, use vga="stdvga" instead.
 
+The deprecated B<stdvga=0> prevents the usage of vmware by default
+if B<vmware_hw> is non-zero. 
+
 =item B<vga="STRING">
 
-Selects the emulated video card (none|stdvga|cirrus).
-The default is cirrus.
+Selects the emulated video card (none|stdvga|cirrus|vmware).
+The default is cirrus unless B<vmware_hw> is non-zero in which case it
+is vmware.
 
 =item B<vnc=BOOLEAN>
 
diff --git a/docs/misc/hypervisor-cpuid.markdown b/docs/misc/hypervisor-cpuid.markdown
new file mode 100644
index 0000000..3d4304c
--- /dev/null
+++ b/docs/misc/hypervisor-cpuid.markdown
@@ -0,0 +1,33 @@
+Hypervisor Cpuid
+================
+
+There is no agreed standard for the use of hypervisor cpuid leaves.
+
+AMD (Vol3, Appendix E.3.9) reserves 0x40000000 to 0x400000ff for
+hypervisor use, while Intel (Vol 2a, 3.2 CPUID) guarantees that no
+existing or future CPUs will use the range 0x40000000 to 0x4fffffff.
+
+Different hypervisors use the space as follows:
+
+MicroSoft Hyper-V (AKA viridian) leaves currently must be at
+0x40000000.
+
+VMware leaves currently must be at 0x40000000.
+
+KVM leaves currently must be at 0x40000000 (from Seabios).
+
+Xen leaves can be found at the first otherwise unused 0x100 aligned
+offset between 0x40000000 and 0x40010000.
+
+http://download.microsoft.com/download/F/B/0/FB0D01A3-8E3A-4F5F-AA59-08C8026D3B8A/requirements-for-implementing-microsoft-hypervisor-interface.docx
+
+http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458
+
+http://lwn.net/Articles/301888/
+  Attempted to get this cleaned up.
+
+So if Viridian or VMware_hw is selected, return their format for the
+range 0x40000000 to 0x400000ff. And return Xen format for the range
+0x40000100 to 0x400001ff.
+
+Otherwise return Xen format for the range 0x40000000 to 0x400000ff.
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index d8bd9b3..d262fa0 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -743,6 +743,7 @@ typedef struct {
     uint64_t vm_generationid_addr;
     uint64_t ioreq_server_pfn;
     uint64_t nr_ioreq_server_pages;
+    uint64_t vmware_hw;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -927,6 +928,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         }
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_VMWARE_HW:
+        /* Skip padding 4 bytes then read the vmware hw version. */
+        if ( RDEXACT(fd, &buf->vmware_hw, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->vmware_hw, sizeof(uint64_t)) )
+        {
+            PERROR("error read the vmware_hw value");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     case XC_SAVE_ID_TOOLSTACK:
         {
             if ( RDEXACT(fd, &buf->tdata.len, sizeof(buf->tdata.len)) )
@@ -1774,6 +1785,9 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
         }
     }
 
+    if (pagebuf.vmware_hw != 0)
+        xc_set_hvm_param(xch, dom, HVM_PARAM_VMWARE_HW, pagebuf.vmware_hw);
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_hvm_param_set(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 254fdb3..76dc307 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1750,6 +1750,17 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the ioreq server gmfn count");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_VMWARE_HW;
+        chunk.data = 0;
+        xc_hvm_param_get(xch, dom, HVM_PARAM_VMWARE_HW, &chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the vmware_hw value");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index bdd9009..d185ba9 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -262,6 +262,8 @@
 /* These are a pair; it is an error for one to exist without the other */
 #define XC_SAVE_ID_HVM_IOREQ_SERVER_PFN -19
 #define XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES -20
+/* VMware data */
+#define XC_SAVE_ID_HVM_VMWARE_HW      -21
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 2700cc1..09faa04 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -158,6 +158,16 @@
 #define LIBXL_BUILDINFO_HVM_VIRIDIAN_ENABLE_DISABLE_WIDTH 64
 
 /*
+ * The libxl_vga_interface_type has the type for vmware.
+ */
+#define LIBXL_HAVE_LIBXL_VGA_INTERFACE_TYPE_VMWARE 1
+
+/*
+ * libxl_domain_build_info has the u.hvm.vmware_hw field.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_VMWARE_HW 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f7f178e..9f4e03c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -227,8 +227,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
             b_info->shadow_memkb = 0;
 
-        if (!b_info->u.hvm.vga.kind)
-            b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        if (!b_info->u.hvm.vga.kind) {
+            if (b_info->u.hvm.vmware_hw)
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
+            else
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        }
 
         switch (b_info->device_model_version) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
@@ -428,13 +432,15 @@ int libxl__domain_build(libxl__gc *gc,
         vments[4] = "start_time";
         vments[5] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000);
 
-        localents = libxl__calloc(gc, 7, sizeof(char *));
+        localents = libxl__calloc(gc, 9, sizeof(char *));
         localents[0] = "platform/acpi";
         localents[1] = libxl_defbool_val(info->u.hvm.acpi) ? "1" : "0";
         localents[2] = "platform/acpi_s3";
         localents[3] = libxl_defbool_val(info->u.hvm.acpi_s3) ? "1" : "0";
         localents[4] = "platform/acpi_s4";
         localents[5] = libxl_defbool_val(info->u.hvm.acpi_s4) ? "1" : "0";
+        localents[6] = "platform/vmware_hw";
+        localents[7] = libxl__sprintf(gc, "%"PRId64, info->u.hvm.vmware_hw);
 
         break;
     case LIBXL_DOMAIN_TYPE_PV:
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 0018113..8bd6414 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -243,6 +243,9 @@ static char ** libxl__build_device_model_args_old(libxl__gc *gc,
         case LIBXL_VGA_INTERFACE_TYPE_NONE:
             flexarray_append_pair(dm_args, "-vga", "none");
             break;
+        case LIBXL_VGA_INTERFACE_TYPE_VMWARE:
+            flexarray_append_pair(dm_args, "-vga", "vmware");
+            break;
         }
 
         if (b_info->u.hvm.boot) {
@@ -555,6 +558,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc,
             break;
         case LIBXL_VGA_INTERFACE_TYPE_NONE:
             break;
+        case LIBXL_VGA_INTERFACE_TYPE_VMWARE:
+            flexarray_append_pair(dm_args, "-device",
+                GCSPRINTF("vmware-svga,vgamem_mb=%d",
+                libxl__sizekb_to_mb(b_info->video_memkb)));
+            break;
         }
 
         if (b_info->u.hvm.boot) {
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index d63ae1b..b0f0513 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -290,6 +290,8 @@ static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
 #if defined(__i386__) || defined(__x86_64__)
     xc_hvm_param_set(handle, domid, HVM_PARAM_HPET_ENABLED,
                     libxl_defbool_val(info->u.hvm.hpet));
+    xc_set_hvm_param(handle, domid, HVM_PARAM_VMWARE_HW,
+                     info->u.hvm.vmware_hw);
 #endif
     xc_hvm_param_set(handle, domid, HVM_PARAM_TIMER_MODE, timer_mode(info));
     xc_hvm_param_set(handle, domid, HVM_PARAM_VPT_ALIGN,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index bbb03e2..5d25b77 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -175,6 +175,7 @@ libxl_vga_interface_type = Enumeration("vga_interface_type", [
     (1, "CIRRUS"),
     (2, "STD"),
     (3, "NONE"),
+    (4, "VMWARE"),
     ], init_val = "LIBXL_VGA_INTERFACE_TYPE_CIRRUS")
 
 libxl_vendor_device = Enumeration("vendor_device", [
@@ -391,6 +392,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("timeoffset",       string),
                                        ("hpet",             libxl_defbool),
                                        ("vpt_align",        libxl_defbool),
+                                       ("vmware_hw",        uint64),
                                        ("timer_mode",       libxl_timer_mode),
                                        ("nested_hvm",       libxl_defbool),
                                        ("smbios_firmware",  string),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c734f79..307a9a9 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1111,6 +1111,8 @@ static void parse_config_data(const char *config_source,
             exit(-ERROR_FAIL);
         }
 
+        if (!xlu_cfg_get_long(config, "vmware_hw",  &l, 1))
+            b_info->u.hvm.vmware_hw = l;
         if (!xlu_cfg_get_long(config, "timer_mode", &l, 1)) {
             const char *s = libxl_timer_mode_to_string(l);
             fprintf(stderr, "WARNING: specifying \"timer_mode\" as an integer is deprecated. "
@@ -1730,6 +1732,8 @@ skip_vfb:
                 b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
             } else if (!strcmp(buf, "none")) {
                 b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
+            } else if (!strcmp(buf, "vmware")) {
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_VMWARE;
             } else {
                 fprintf(stderr, "Unknown vga \"%s\" specified\n", buf);
                 exit(1);
-- 
1.8.4


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 0/7]  Xen VMware tools support
  2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
                   ` (6 preceding siblings ...)
  2014-10-02 21:30 ` [OPTIONAL][PATCH for-4.5 v7 7/7] Add xen-hvm-param Don Slutz
@ 2014-10-16  8:12 ` Jan Beulich
  2014-10-16 12:10   ` Don Slutz
  7 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2014-10-16  8:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:

Just for the record - I do not see this as warranting a release exception.
The effort to get this in place was started way too late, and shouldn't
be rushed close to or already beyond the feature freeze point.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 0/7]  Xen VMware tools support
  2014-10-16  8:12 ` [PATCH for-4.5 v7 0/7] Xen VMware tools support Jan Beulich
@ 2014-10-16 12:10   ` Don Slutz
  2014-10-16 12:17     ` Ian Jackson
  2014-10-16 12:22     ` Jan Beulich
  0 siblings, 2 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-16 12:10 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

On 10/16/14 04:12, Jan Beulich wrote:
>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
> Just for the record - I do not see this as warranting a release exception.
> The effort to get this in place was started way too late, and shouldn't
> be rushed close to or already beyond the feature freeze point.

Being first posted on Sept 1, 2014:

From: Don Slutz <dslutz@verizon.com>
To: <xen-devel@lists.xen.org>
Date: Mon, 1 Sep 2014 11:33:46 -0400
Message-ID: <1409585629-25840-1-git-send-email-dslutz@verizon.com>
Subject: [Xen-devel] [PATCH v2 0/3] Xen VMware tools support

and a feature freeze of 10th September 2014, I would not expect to
need a release exception.  However since none of the patches have a
Reviewed-by, and a big part of this has been dropped, I have no issue
with pushing this into 4.6

    -Don Slutz



> Jan
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 0/7]  Xen VMware tools support
  2014-10-16 12:10   ` Don Slutz
@ 2014-10-16 12:17     ` Ian Jackson
  2014-10-16 12:22     ` Jan Beulich
  1 sibling, 0 replies; 37+ messages in thread
From: Ian Jackson @ 2014-10-16 12:17 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	George Dunlap, Andrew Cooper, Stefano Stabellini, Eddie Dong,
	xen-devel, Jan Beulich, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

Don Slutz writes ("Re: [PATCH for-4.5 v7 0/7]  Xen VMware tools support"):
> Being first posted on Sept 1, 2014:
> 
> From: Don Slutz <dslutz@verizon.com>
> To: <xen-devel@lists.xen.org>
> Date: Mon, 1 Sep 2014 11:33:46 -0400
> Message-ID: <1409585629-25840-1-git-send-email-dslutz@verizon.com>
> Subject: [Xen-devel] [PATCH v2 0/3] Xen VMware tools support
> 
> and a feature freeze of 10th September 2014, I would not expect to
> need a release exception.  However since none of the patches have a
> Reviewed-by, and a big part of this has been dropped, I have no issue
> with pushing this into 4.6

The review and discussion of this series has uncovered a number of
complex technical problems, some relating to design and security
issues.

I'm not at all surprised that these questions took much longer than
nine days to resolve.  Indeed, it would have been astonishing to me if
they could have been sorted out within that timeframe.

Ian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 0/7]  Xen VMware tools support
  2014-10-16 12:10   ` Don Slutz
  2014-10-16 12:17     ` Ian Jackson
@ 2014-10-16 12:22     ` Jan Beulich
  2014-10-16 12:58       ` Don Slutz
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2014-10-16 12:22 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 16.10.14 at 14:10, <dslutz@verizon.com> wrote:
> On 10/16/14 04:12, Jan Beulich wrote:
>>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
>> Just for the record - I do not see this as warranting a release exception.
>> The effort to get this in place was started way too late, and shouldn't
>> be rushed close to or already beyond the feature freeze point.
> 
> Being first posted on Sept 1, 2014:
> 
> From: Don Slutz <dslutz@verizon.com>
> To: <xen-devel@lists.xen.org>
> Date: Mon, 1 Sep 2014 11:33:46 -0400
> Message-ID: <1409585629-25840-1-git-send-email-dslutz@verizon.com>
> Subject: [Xen-devel] [PATCH v2 0/3] Xen VMware tools support
> 
> and a feature freeze of 10th September 2014, I would not expect to
> need a release exception.

I hope you didn't mean this seriously: That might have been okay
for a feature that can be expected to not need a lot of discussion
and is relatively limited in the amount of code it touches. But
generally, and particularly for anything larger, please get used to
starting the work and discussion early, not near the end of a
development cycle.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 0/7]  Xen VMware tools support
  2014-10-16 12:22     ` Jan Beulich
@ 2014-10-16 12:58       ` Don Slutz
  0 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2014-10-16 12:58 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

On 10/16/14 08:22, Jan Beulich wrote:
>>>> On 16.10.14 at 14:10, <dslutz@verizon.com> wrote:
>> On 10/16/14 04:12, Jan Beulich wrote:
>>>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
>>> Just for the record - I do not see this as warranting a release exception.
>>> The effort to get this in place was started way too late, and shouldn't
>>> be rushed close to or already beyond the feature freeze point.
>> Being first posted on Sept 1, 2014:
>>
>> From: Don Slutz <dslutz@verizon.com>
>> To: <xen-devel@lists.xen.org>
>> Date: Mon, 1 Sep 2014 11:33:46 -0400
>> Message-ID: <1409585629-25840-1-git-send-email-dslutz@verizon.com>
>> Subject: [Xen-devel] [PATCH v2 0/3] Xen VMware tools support
>>
>> and a feature freeze of 10th September 2014, I would not expect to
>> need a release exception.
> I hope you didn't mean this seriously: That might have been okay
> for a feature that can be expected to not need a lot of discussion
> and is relatively limited in the amount of code it touches. But
> generally, and particularly for anything larger, please get used to
> starting the work and discussion early, not near the end of a
> development cycle.

Clearly I did not understand this correctly.  Sorry about that.  I did 
attempt
to start this discussion early (December 13, 2013), but failed (do to other
things) to post the v2 until Sept 1, 2014.

    -Don Slutz

> Jan
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves
  2014-10-02 21:30 ` [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves Don Slutz
@ 2015-01-15 16:42   ` Jan Beulich
  2015-01-15 21:00     ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-15 16:42 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
> @@ -5536,6 +5540,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( curr_d == d )
>                      break;
>  
> +                if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] )
> +                {
> +                    rc = -EXDEV;

That's a pretty strange error code here. -EOPNOTSUPP perhaps?

> --- /dev/null
> +++ b/xen/arch/x86/hvm/vmware/cpuid.c

Whether adding another subdirectory here is really the way to go
heavily depends on how much of this new code we really want to
take into the tree. There sheer size of the series makes me
hesitant to consider taking it all.

> +/*
> + * VMware hardware version 7 defines some of these cpuid levels,
> + * below is a brief description about those.
> + *
> + *     Leaf 0x40000000, Hypervisor CPUID information
> + * # EAX: The maximum input value for hypervisor CPUID info (0x40000010).
> + * # EBX, ECX, EDX: Hypervisor vendor ID signature. E.g. "VMwareVMware"
> + *
> + *     Leaf 0x40000010, Timing information.
> + * # EAX: (Virtual) TSC frequency in kHz.
> + * # EBX: (Virtual) Bus (local apic timer) frequency in kHz.
> + * # ECX, EDX: RESERVED
> + */
> +
> +int cpuid_vmware_leaves(uint32_t idx, uint32_t *eax, uint32_t *ebx,
> +                        uint32_t *ecx, uint32_t *edx)
> +{
> +    struct domain *d = current->domain;
> +
> +    if ( !is_vmware_domain(d) )
> +        return 0;
> +
> +    switch ( idx - 0x40000000 )
> +    {
> +    case 0x0:
> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
> +        {
> +            *eax = 0x40000010;  /* Largest leaf */
> +            *ebx = 0x61774d56;  /* "VMwa" */
> +            *ecx = 0x4d566572;  /* "reVM" */
> +            *edx = 0x65726177;  /* "ware" */
> +            break;
> +        }
> +        /* fallthrough */
> +    case 0x10:
> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
> +        {
> +            /* (Virtual) TSC frequency in kHz. */
> +            *eax =  d->arch.tsc_khz;
> +            /* (Virtual) Bus (local apic timer) frequency in kHz. */
> +            *ebx = 1000000ull / APIC_BUS_CYCLE_NS;
> +            *ecx = 0;          /* Reserved */
> +            *edx = 0;          /* Reserved */
> +            break;
> +        }
> +        /* fallthrough */

So for versions < 7 there's effectively no CPUID support at all?
Wouldn't it then make more sense to check for the version together
with the is_vmware_domain() check at the top?

> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -346,6 +346,9 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
>  #define is_viridian_domain(d) \
>      (is_hvm_domain(d) && (viridian_feature_mask(d) & HVMPV_base_freq))
>  
> +#define is_vmware_domain(_d)                                             \
> + (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW]))

Indentation. Also please use d, not _d. I.e. take the macro above
as reference.

> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -189,6 +189,9 @@
>  /* Location of the VM Generation ID in guest physical address space. */
>  #define HVM_PARAM_VM_GENERATION_ID_ADDR 34
>  
> -#define HVM_NR_PARAMS          35
> +/* Params for VMware */
> +#define HVM_PARAM_VMWARE_HW                 35

The comment seems wrong - after all it's the version, not some
arbitrary parameters/flags.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files.
  2014-10-02 21:30 ` [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files Don Slutz
@ 2015-01-15 16:46   ` Jan Beulich
  2015-01-15 21:36     ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-15 16:46 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
> These 2 files: backdoor_def.h and guest_msg_def.h come from:
> 
> http://packages.vmware.com/tools/esx/3.5latest/rhel4/SRPMS/index.html 
>  open-vm-tools-kmod-7.4.8-396269.423167.src.rpm
>   open-vm-tools-kmod-7.4.8.tar.gz
>    vmhgfs/backdoor_def.h
>    vmhgfs/guest_msg_def.h
> 
> and are unchanged.

Either the description is wrong, or the patch is stale - there's no
guest_msg_def.h here.

> Added the badly named include file includeCheck.h also.  It only
> has comments and is provided so that backdoor_def.h and
> guest_msg_def.h can be used without change.

In which case I'd say a file with a single comment line in it would
suffice. Such a comment is hardly copyrightable...

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves
  2015-01-15 16:42   ` Jan Beulich
@ 2015-01-15 21:00     ` Don Slutz
  2015-01-16  7:57       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2015-01-15 21:00 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

On 01/15/15 11:42, Jan Beulich wrote:
>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
>> @@ -5536,6 +5540,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>                  if ( curr_d == d )
>>                      break;
>>  
>> +                if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] )
>> +                {
>> +                    rc = -EXDEV;
> 
> That's a pretty strange error code here. -EOPNOTSUPP perhaps?
> 

Sure.

>> --- /dev/null
>> +++ b/xen/arch/x86/hvm/vmware/cpuid.c
> 
> Whether adding another subdirectory here is really the way to go
> heavily depends on how much of this new code we really want to
> take into the tree. There sheer size of the series makes me
> hesitant to consider taking it all.
> 

Not sure what I can say here to help.

>> +/*
>> + * VMware hardware version 7 defines some of these cpuid levels,
>> + * below is a brief description about those.
>> + *
>> + *     Leaf 0x40000000, Hypervisor CPUID information
>> + * # EAX: The maximum input value for hypervisor CPUID info (0x40000010).
>> + * # EBX, ECX, EDX: Hypervisor vendor ID signature. E.g. "VMwareVMware"
>> + *
>> + *     Leaf 0x40000010, Timing information.
>> + * # EAX: (Virtual) TSC frequency in kHz.
>> + * # EBX: (Virtual) Bus (local apic timer) frequency in kHz.
>> + * # ECX, EDX: RESERVED
>> + */
>> +
>> +int cpuid_vmware_leaves(uint32_t idx, uint32_t *eax, uint32_t *ebx,
>> +                        uint32_t *ecx, uint32_t *edx)
>> +{
>> +    struct domain *d = current->domain;
>> +
>> +    if ( !is_vmware_domain(d) )
>> +        return 0;
>> +
>> +    switch ( idx - 0x40000000 )
>> +    {
>> +    case 0x0:
>> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
>> +        {
>> +            *eax = 0x40000010;  /* Largest leaf */
>> +            *ebx = 0x61774d56;  /* "VMwa" */
>> +            *ecx = 0x4d566572;  /* "reVM" */
>> +            *edx = 0x65726177;  /* "ware" */
>> +            break;
>> +        }
>> +        /* fallthrough */
>> +    case 0x10:
>> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
>> +        {
>> +            /* (Virtual) TSC frequency in kHz. */
>> +            *eax =  d->arch.tsc_khz;
>> +            /* (Virtual) Bus (local apic timer) frequency in kHz. */
>> +            *ebx = 1000000ull / APIC_BUS_CYCLE_NS;
>> +            *ecx = 0;          /* Reserved */
>> +            *edx = 0;          /* Reserved */
>> +            break;
>> +        }
>> +        /* fallthrough */
> 
> So for versions < 7 there's effectively no CPUID support at all?
> Wouldn't it then make more sense to check for the version together
> with the is_vmware_domain() check at the top?
> 

Nope, when version is > 0 & < 7, all zeros are returned for these.
This is why there is a fallthrough comment.

Doing the zero returns is part of making the environment look as much
like VMware as possible.  I feel this is better, but I can be pushed
into including this check at the top (since the VMware KB article
referenced just has an equal statement).

This range of zeros is what I see during testing on a VMware ESX server.

>> --- a/xen/include/asm-x86/hvm/hvm.h
>> +++ b/xen/include/asm-x86/hvm/hvm.h
>> @@ -346,6 +346,9 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
>>  #define is_viridian_domain(d) \
>>      (is_hvm_domain(d) && (viridian_feature_mask(d) & HVMPV_base_freq))
>>  
>> +#define is_vmware_domain(_d)                                             \
>> + (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW]))
> 
> Indentation. Also please use d, not _d. I.e. take the macro above
> as reference.
> 

I did, it has changed during the time I started this patch set.  Will
fix.

>> --- a/xen/include/public/hvm/params.h
>> +++ b/xen/include/public/hvm/params.h
>> @@ -189,6 +189,9 @@
>>  /* Location of the VM Generation ID in guest physical address space. */
>>  #define HVM_PARAM_VM_GENERATION_ID_ADDR 34
>>  
>> -#define HVM_NR_PARAMS          35
>> +/* Params for VMware */
>> +#define HVM_PARAM_VMWARE_HW                 35
> 
> The comment seems wrong - after all it's the version, not some
> arbitrary parameters/flags.
> 

It use to be a set of Params.  Will fix, thinking of:

/* VMware Hardware Version */

Did you what me to rename HVM_PARAM_VMWARE_HW also to new name (maybe
HVM_PARAM_VMWARE_HARDWARE_VERSION)?

   -Don Slutz

> Jan
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files.
  2015-01-15 16:46   ` Jan Beulich
@ 2015-01-15 21:36     ` Don Slutz
  0 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2015-01-15 21:36 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

On 01/15/15 11:46, Jan Beulich wrote:
>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
>> These 2 files: backdoor_def.h and guest_msg_def.h come from:
>>
>> http://packages.vmware.com/tools/esx/3.5latest/rhel4/SRPMS/index.html 
>>  open-vm-tools-kmod-7.4.8-396269.423167.src.rpm
>>   open-vm-tools-kmod-7.4.8.tar.gz
>>    vmhgfs/backdoor_def.h
>>    vmhgfs/guest_msg_def.h
>>
>> and are unchanged.
> 
> Either the description is wrong, or the patch is stale - there's no
> guest_msg_def.h here.
> 

The commit message is stale.  Missed adjusting for removal of VMware
rpc stuff.  Will Fix.

>> Added the badly named include file includeCheck.h also.  It only
>> has comments and is provided so that backdoor_def.h and
>> guest_msg_def.h can be used without change.
> 
> In which case I'd say a file with a single comment line in it would
> suffice. Such a comment is hardly copyrightable...
> 

Ok, Will make it a single line comment.

   -Don Slutz

> Jan
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves
  2015-01-15 21:00     ` Don Slutz
@ 2015-01-16  7:57       ` Jan Beulich
  2015-01-16 19:21         ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-16  7:57 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 15.01.15 at 22:00, <dslutz@verizon.com> wrote:
> On 01/15/15 11:42, Jan Beulich wrote:
>>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/x86/hvm/vmware/cpuid.c
>> 
>> Whether adding another subdirectory here is really the way to go
>> heavily depends on how much of this new code we really want to
>> take into the tree. There sheer size of the series makes me
>> hesitant to consider taking it all.
>> 
> 
> Not sure what I can say here to help.

Much will depend on the discussion of the subsequent much bigger
patch, so nothing specific to say here for the moment.

>>> +/*
>>> + * VMware hardware version 7 defines some of these cpuid levels,
>>> + * below is a brief description about those.
>>> + *
>>> + *     Leaf 0x40000000, Hypervisor CPUID information
>>> + * # EAX: The maximum input value for hypervisor CPUID info (0x40000010).
>>> + * # EBX, ECX, EDX: Hypervisor vendor ID signature. E.g. "VMwareVMware"
>>> + *
>>> + *     Leaf 0x40000010, Timing information.
>>> + * # EAX: (Virtual) TSC frequency in kHz.
>>> + * # EBX: (Virtual) Bus (local apic timer) frequency in kHz.
>>> + * # ECX, EDX: RESERVED
>>> + */
>>> +
>>> +int cpuid_vmware_leaves(uint32_t idx, uint32_t *eax, uint32_t *ebx,
>>> +                        uint32_t *ecx, uint32_t *edx)
>>> +{
>>> +    struct domain *d = current->domain;
>>> +
>>> +    if ( !is_vmware_domain(d) )
>>> +        return 0;
>>> +
>>> +    switch ( idx - 0x40000000 )
>>> +    {
>>> +    case 0x0:
>>> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
>>> +        {
>>> +            *eax = 0x40000010;  /* Largest leaf */
>>> +            *ebx = 0x61774d56;  /* "VMwa" */
>>> +            *ecx = 0x4d566572;  /* "reVM" */
>>> +            *edx = 0x65726177;  /* "ware" */
>>> +            break;
>>> +        }
>>> +        /* fallthrough */
>>> +    case 0x10:
>>> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
>>> +        {
>>> +            /* (Virtual) TSC frequency in kHz. */
>>> +            *eax =  d->arch.tsc_khz;
>>> +            /* (Virtual) Bus (local apic timer) frequency in kHz. */
>>> +            *ebx = 1000000ull / APIC_BUS_CYCLE_NS;
>>> +            *ecx = 0;          /* Reserved */
>>> +            *edx = 0;          /* Reserved */
>>> +            break;
>>> +        }
>>> +        /* fallthrough */
>> 
>> So for versions < 7 there's effectively no CPUID support at all?
>> Wouldn't it then make more sense to check for the version together
>> with the is_vmware_domain() check at the top?
>> 
> 
> Nope, when version is > 0 & < 7, all zeros are returned for these.
> This is why there is a fallthrough comment.
> 
> Doing the zero returns is part of making the environment look as much
> like VMware as possible.  I feel this is better, but I can be pushed
> into including this check at the top (since the VMware KB article
> referenced just has an equal statement).

But afaict zeros get returned even when you bail from the function
right at the top, thanks to the subsequent domain_cpuid() invocation
in the caller (unless overridden in the guest config, which surely
would be dubious).

>>> --- a/xen/include/public/hvm/params.h
>>> +++ b/xen/include/public/hvm/params.h
>>> @@ -189,6 +189,9 @@
>>>  /* Location of the VM Generation ID in guest physical address space. */
>>>  #define HVM_PARAM_VM_GENERATION_ID_ADDR 34
>>>  
>>> -#define HVM_NR_PARAMS          35
>>> +/* Params for VMware */
>>> +#define HVM_PARAM_VMWARE_HW                 35
>> 
>> The comment seems wrong - after all it's the version, not some
>> arbitrary parameters/flags.
>> 
> 
> It use to be a set of Params.  Will fix, thinking of:
> 
> /* VMware Hardware Version */

... emulated ...

> Did you what me to rename HVM_PARAM_VMWARE_HW also to new name (maybe
> HVM_PARAM_VMWARE_HARDWARE_VERSION)?

Perhaps a good idea; if it turns out too long,
HVM_PARAM_VMWARE_HWVER would also seem fine.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2014-10-02 22:40     ` [PATCH for-4.5 v8 " Don Slutz
@ 2015-01-16 10:09       ` Jan Beulich
  2015-01-21 17:52         ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-16 10:09 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:
> This is a new domain_create() flag, DOMCRF_vmware_port.  It is
> passed to domctl as XEN_DOMCTL_CDF_vmware_port.

Can you explain why a HVM param isn't suitable here?

> This is both a more complete support then in currently provided by
> QEMU and/or KVM and less.  The missing part requires QEMU changes
> and has been left out until the QEMU patches are accepted upstream.

I vaguely recall the question having been asked before, but I can't
find it to the answer to it: If qemu has support for this, why can't
you build on that rather than adding everything in the hypervisor?

> For AMD (svm) the max instruction length of 15 is hard coded.  This
> is because __get_instruction_length_from_list() has issues that when
> called from #GP handler NRIP is not available, or that NRIP may not
> be available at all on a particular HW, leading to the need read the
> instruction twice --- once in __get_instruction_length_from_list()
> and then again in vmport_gp_check(). Which is bad because memory may
> change between the reads.

I don't get the connection between the first sentence (which just
states an architectural fact) and the rest of this paragraph.

> @@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
>       return;
>   }
> 
> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
> +                                    struct vcpu *v)
> +{
> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
> +    /*
> +     * Just use 15 for the instruction length; vmport_gp_check will
> +     * adjust it.  This is because
> +     * __get_instruction_length_from_list() has issues, and may
> +     * require a double read of the instruction bytes.  At some
> +     * point a new routine could be added that is based on the code
> +     * in vmport_gp_check with extensions to make it more general.
> +     * Since that routine is the only user of this code this can be
> +     * done later.
> +     */
> +    unsigned long inst_len = 15;

Surely this can be unsigned int? And the value be MAX_INST_LEN?

> --- /dev/null
> +++ b/xen/arch/x86/hvm/vmware/vmport.c
> @@ -0,0 +1,262 @@
> +/*
> + * HVM VMPORT emulation
> + *
> + * Copyright (C) 2012 Verizon Corporation
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License Version 2 (GPLv2)
> + * as published by the Free Software Foundation.
> + *
> + * This file is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details. <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/config.h>

No need for this.

> +#define MAX_INST_LEN 15

Please move SVM's identical definition into e.g. asm-x86/processor.h
or even x86_emulate/x86_emulate.h (so it can also be used in
x86_emulate/x86_emulate.c), and avoid adding another instance
here.

> +#ifndef NDEBUG
> +unsigned int opt_vmport_debug __read_mostly;
> +integer_param("vmport_debug", opt_vmport_debug);
> +#endif

If this was used anywhere, the variable ought to be static. But
since it seems unused, it ought to be dropped.

> +/* More VMware defines */
> +
> +#define VMWARE_GUI_AUTO_GRAB              0x001
> +#define VMWARE_GUI_AUTO_UNGRAB            0x002
> +#define VMWARE_GUI_AUTO_SCROLL            0x004
> +#define VMWARE_GUI_AUTO_RAISE             0x008
> +#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
> +#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
> +#define VMWARE_GUI_FULL_SCREEN            0x040
> +
> +#define VMWARE_GUI_TO_FULL_SCREEN         0x080
> +#define VMWARE_GUI_TO_WINDOW              0x100
> +
> +#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
> +
> +#define VMWARE_GUI_SYNC_TIME              0x400

What do all of the above mean? Without any explanation it is
impossible to understand why reporting any of them set below
is correct/acceptable.

> +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    uint16_t cmd = regs->rcx;

As you already have most other variables needed only inside the if()
below declared in that scope, please be consistent with this one.
Albeit the value of this variable is questionable anyway - it's being
used exactly once.

> +    int rc = X86EMUL_OKAY;
> +
> +    if ( regs->_eax == BDOOR_MAGIC )

With this, is handling other than 32-bit in/out really meaningful/
correct?

> +        case BDOOR_CMD_GETHWVERSION:
> +            /* vmware_hw */
> +            regs->_eax = 0;
> +            if ( is_hvm_vcpu(curr) )

Since you can't get here for PV, I can't see what you need this
conditional for.

> +            {
> +                struct hvm_domain *hd = &d->arch.hvm_domain;
> +
> +                regs->_eax = hd->params[HVM_PARAM_VMWARE_HW];
> +            }
> +            if ( !regs->_eax )
> +                regs->_eax = 4;  /* Act like version 4 */

Why version 4?

> +        case BDOOR_CMD_GETTIME:
> +            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
> +            /* hostUsecs */
> +            regs->_ebx = value % 1000000UL;
> +            /* hostSecs */
> +            regs->_eax = value / 1000000ULL;
> +            /* maxTimeLag */
> +            regs->_ecx = 1000000;
> +            /* offset to GMT in minutes */
> +            regs->_edx = d->time_offset_seconds / 60;
> +            break;
> +        case BDOOR_CMD_GETTIMEFULL:
> +            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
> +            /* ... */

???

> +            regs->_eax = BDOOR_MAGIC;

regs->_eax already has this value.

> +            /* hostUsecs */
> +            regs->_ebx =value / 1000000ULL;
> +            /* maxTimeLag */
> +            regs->_ecx = 1000000;
> +            break;

Perhaps this should share code with BDOOR_CMD_GETTIME; I have
to admit though that I can't make any sense of why the latter one
has a FULL suffix when it returns _less_ information.

> +        if ( dir == IOREQ_READ )
> +        {
> +            switch ( bytes )
> +            {
> +            case 1:
> +                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
> +                break;
> +            case 2:
> +                regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff);
> +                break;

Both of these zero the high 32 bits when they shouldn't. But also see
below.

> +            case 4:
> +                regs->rax = regs->_eax;
> +                break;
> +            }
> +            *val = regs->rax;
> +        }
> +        else
> +            regs->rax = saved_rax;

This is all rather dubious - instead of clobbering reg->rax within the
earlier switch, write the value to a local variable and then merge it
here. But as much as above, the question on what to do with
operand size being other than 32-bit - in particular for the cases
where other registers get modified - is relevant here too. Even more
so that the "port" function parameter isn't even checked (and hence
you'd also handle e.g. "in %dx,%al" with %dx being BDOOR_PORT
+ 1, 2, or 3 afaict).

> +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
> +                    unsigned long *inst_len, unsigned long inst_addr,
> +                    unsigned long ei1, unsigned long ei2)
> +{
> +    if ( !v->domain->arch.hvm_domain.is_vmware_port_enabled )
> +        return X86EMUL_VMPORT_NOT_ENABLED;
> +
> +    if ( *inst_len && *inst_len <= MAX_INST_LEN &&
> +         (regs->rdx & 0xffff) == BDOOR_PORT && ei1 == 0 && ei2 == 0 &&

regs->_edx may yield slightly better code; I wonder whether we
shouldn't extend __DECL_REG() to also give us ->dx (and maybe
even ->dl, ->dh, etc).

These ei1/ei2 checks belong in the callers imo - even if both SVM
and VMX happen to have them be zero in the cases you're
interested in, these are still vendor dependent values which
shouldn't be interpreted by vendor independent code.

> +         regs->_eax == BDOOR_MAGIC )
> +    {
> +        int i = 0;

unsigned int

> +        uint32_t val;
> +        uint32_t byte_cnt = hvm_guest_x86_mode(v);

Not the best variable name - x86emul uses op_bytes. Which gets me
to a fundamental question: With all this custom decoding and
linearizing of CS:RIP, did you investigate using x86emul with suitably
set up callbacks instead? That would e.g. at once make you properly
ignore benign instruction prefixes (you look for 0x66 only below).

> +        unsigned char bytes[MAX_INST_LEN];
> +        unsigned int fetch_len;
> +        int frc;
> +
> +        /* in or out are limited to 32bits */
> +        if ( byte_cnt > 4 )
> +            byte_cnt = 4;
> +
> +        /*
> +         * Fetch up to the next page break; we'll fetch from the
> +         * next page later if we have to.
> +         */
> +        fetch_len = min_t(unsigned int, *inst_len,
> +                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
> +        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
> +                                                PFEC_page_present);
> +        if ( frc != HVMCOPY_okay )
> +        {
> +            gdprintk(XENLOG_WARNING,
> +                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
> +                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);

Pointless cast. But the value of log messages like this one is
questionable anyway.

> +            return X86EMUL_VMPORT_FETCH_ERROR_BYTE1;
> +        }
> +
> +        /* Check for operand size prefix */
> +        while ( (i < MAX_INST_LEN) && (bytes[i] == 0x66) )
> +        {
> +            i++;
> +            if ( i >= fetch_len )
> +            {
> +                frc = hvm_fetch_from_guest_virt_nofault(
> +                    &bytes[fetch_len], inst_addr + fetch_len,
> +                    MAX_INST_LEN - fetch_len, PFEC_page_present);
> +                if ( frc != HVMCOPY_okay )
> +                {
> +                    gdprintk(XENLOG_WARNING,
> +                             "Bad instruction fetch at %#lx + %#x (frc=%d)\n",
> +                             inst_addr, fetch_len, frc);
> +                    return X86EMUL_VMPORT_FETCH_ERROR_BYTE2;
> +                }
> +                fetch_len = MAX_INST_LEN;
> +            }
> +        }
> +        *inst_len = i + 1;

i may be MAX_INST_LEN already when you get here.

> +
> +        /* Only adjust byte_cnt 1 time */
> +        if ( bytes[0] == 0x66 )     /* operand size prefix */
> +        {
> +            if ( byte_cnt == 4 )
> +                byte_cnt = 2;
> +            else
> +                byte_cnt = 4;
> +        }

Iirc REX.W set following 0x66 cancels the effect of the latter. Another
thing x86emul would be taking care of for you if you used it.

Also this byte_cnt handling isn't correct for the real and VM86 mode
cases (where hvm_guest_x86_mode() returns 0/1 respectively).

> +        if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
> +            return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
> +        else if ( bytes[i] == 0xec )     /* in (%dx),%al */
> +            return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
> +        else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out %ax,(%dx) */
> +            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
> +        else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
> +            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
> +        else
> +        {
> +            *inst_len = 0; /* This is unknown. */
> +            return X86EMUL_VMPORT_BAD_OPCODE;
> +        }

switch() please.

> +static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
> +                                    struct vcpu *v)
> +{
> +    unsigned long exit_qualification;
> +    unsigned long inst_len;
> +    unsigned long inst_addr = vmx_rip2pointer(regs, v);
> +    unsigned long ecode;
> +    int rc;
> +#ifndef NDEBUG
> +    unsigned long orig_inst_len;
> +    unsigned long vector;
> +
> +    __vmread(VM_EXIT_INTR_INFO, &vector);
> +    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
> +    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
> +#endif

If you use ASSERT() instead of BUG_ON(), I think you can avoid most
of this preprocessor conditional.

> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
> +    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);

get_instruction_length(). But is it architecturally defined that
#GP intercept vmexits actually set this field?

> @@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>               if ( v->fpu_dirtied )
>                   nvcpu->nv_vmexit_pending = 1;
>           }
> +        else if ( vector == TRAP_gp_fault )
> +            nvcpu->nv_vmexit_pending = 1;

Doesn't that mean an unconditional vmexit even if the L1 hypervisor
didn't ask for such?

> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -25,7 +25,7 @@
>   #include <public/hvm/ioreq.h>
>   #include <public/event_channel.h>
> 
> -#define MAX_IO_HANDLER             16
> +#define MAX_IO_HANDLER             17

If you're really getting beyond 16 (which I don't see, I'm counting
14 current users) this should be bumped by more than just 1.

> +/*
> + * Additional return values from vmport_gp_check.
> + *
> + * Note: return values include:
> + *   X86EMUL_OKAY
> + *   X86EMUL_UNHANDLEABLE
> + *   X86EMUL_EXCEPTION
> + *   X86EMUL_RETRY
> + *   X86EMUL_CMPXCHG_FAILED
> + *
> + * The additional do not overlap any of the above.
> + */
> +#define X86EMUL_VMPORT_NOT_ENABLED              10
> +#define X86EMUL_VMPORT_FETCH_ERROR_BYTE1        11
> +#define X86EMUL_VMPORT_FETCH_ERROR_BYTE2        12
> +#define X86EMUL_VMPORT_BAD_OPCODE               13
> +#define X86EMUL_VMPORT_BAD_STATE                14

Going through the patch, you only ever return these, but never
check for any of them. Why do you add these in the first place,
risking future collisions even if there are none now?

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves
  2015-01-16  7:57       ` Jan Beulich
@ 2015-01-16 19:21         ` Don Slutz
  0 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2015-01-16 19:21 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

On 01/16/15 02:57, Jan Beulich wrote:
>>>> On 15.01.15 at 22:00, <dslutz@verizon.com> wrote:
>> On 01/15/15 11:42, Jan Beulich wrote:
>>>>>> On 02.10.14 at 23:30, <dslutz@verizon.com> wrote:
>>>> --- /dev/null
>>>> +++ b/xen/arch/x86/hvm/vmware/cpuid.c
>>>
>>> Whether adding another subdirectory here is really the way to go
>>> heavily depends on how much of this new code we really want to
>>> take into the tree. There sheer size of the series makes me
>>> hesitant to consider taking it all.
>>>
>>
>> Not sure what I can say here to help.
> 
> Much will depend on the discussion of the subsequent much bigger
> patch, so nothing specific to say here for the moment.
> 
>>>> +/*
>>>> + * VMware hardware version 7 defines some of these cpuid levels,
>>>> + * below is a brief description about those.
>>>> + *
>>>> + *     Leaf 0x40000000, Hypervisor CPUID information
>>>> + * # EAX: The maximum input value for hypervisor CPUID info (0x40000010).
>>>> + * # EBX, ECX, EDX: Hypervisor vendor ID signature. E.g. "VMwareVMware"
>>>> + *
>>>> + *     Leaf 0x40000010, Timing information.
>>>> + * # EAX: (Virtual) TSC frequency in kHz.
>>>> + * # EBX: (Virtual) Bus (local apic timer) frequency in kHz.
>>>> + * # ECX, EDX: RESERVED
>>>> + */
>>>> +
>>>> +int cpuid_vmware_leaves(uint32_t idx, uint32_t *eax, uint32_t *ebx,
>>>> +                        uint32_t *ecx, uint32_t *edx)
>>>> +{
>>>> +    struct domain *d = current->domain;
>>>> +
>>>> +    if ( !is_vmware_domain(d) )
>>>> +        return 0;
>>>> +
>>>> +    switch ( idx - 0x40000000 )
>>>> +    {
>>>> +    case 0x0:
>>>> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
>>>> +        {
>>>> +            *eax = 0x40000010;  /* Largest leaf */
>>>> +            *ebx = 0x61774d56;  /* "VMwa" */
>>>> +            *ecx = 0x4d566572;  /* "reVM" */
>>>> +            *edx = 0x65726177;  /* "ware" */
>>>> +            break;
>>>> +        }
>>>> +        /* fallthrough */
>>>> +    case 0x10:
>>>> +        if ( d->arch.hvm_domain.params[HVM_PARAM_VMWARE_HW] >= 7 )
>>>> +        {
>>>> +            /* (Virtual) TSC frequency in kHz. */
>>>> +            *eax =  d->arch.tsc_khz;
>>>> +            /* (Virtual) Bus (local apic timer) frequency in kHz. */
>>>> +            *ebx = 1000000ull / APIC_BUS_CYCLE_NS;
>>>> +            *ecx = 0;          /* Reserved */
>>>> +            *edx = 0;          /* Reserved */
>>>> +            break;
>>>> +        }
>>>> +        /* fallthrough */
>>>
>>> So for versions < 7 there's effectively no CPUID support at all?
>>> Wouldn't it then make more sense to check for the version together
>>> with the is_vmware_domain() check at the top?
>>>
>>
>> Nope, when version is > 0 & < 7, all zeros are returned for these.
>> This is why there is a fallthrough comment.
>>
>> Doing the zero returns is part of making the environment look as much
>> like VMware as possible.  I feel this is better, but I can be pushed
>> into including this check at the top (since the VMware KB article
>> referenced just has an equal statement).
> 
> But afaict zeros get returned even when you bail from the function
> right at the top, thanks to the subsequent domain_cpuid() invocation
> in the caller (unless overridden in the guest config, which surely
> would be dubious).
> 

I will check this out.

>>>> --- a/xen/include/public/hvm/params.h
>>>> +++ b/xen/include/public/hvm/params.h
>>>> @@ -189,6 +189,9 @@
>>>>  /* Location of the VM Generation ID in guest physical address space. */
>>>>  #define HVM_PARAM_VM_GENERATION_ID_ADDR 34
>>>>  
>>>> -#define HVM_NR_PARAMS          35
>>>> +/* Params for VMware */
>>>> +#define HVM_PARAM_VMWARE_HW                 35
>>>
>>> The comment seems wrong - after all it's the version, not some
>>> arbitrary parameters/flags.
>>>
>>
>> It use to be a set of Params.  Will fix, thinking of:
>>
>> /* VMware Hardware Version */
> 
> ... emulated ...
> 

Sure.

>> Did you what me to rename HVM_PARAM_VMWARE_HW also to new name (maybe
>> HVM_PARAM_VMWARE_HARDWARE_VERSION)?
> 
> Perhaps a good idea; if it turns out too long,
> HVM_PARAM_VMWARE_HWVER would also seem fine.
> 

Where it is used the lines are already long, so I will go with this.

   -Don Slutz

> Jan
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-16 10:09       ` Jan Beulich
@ 2015-01-21 17:52         ` Don Slutz
  2015-01-22  8:32           ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2015-01-21 17:52 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

On 01/16/15 05:09, Jan Beulich wrote:
>>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:
>> This is a new domain_create() flag, DOMCRF_vmware_port.  It is
>> passed to domctl as XEN_DOMCTL_CDF_vmware_port.
> 
> Can you explain why a HVM param isn't suitable here?
> 

The issue is that you need this flag during construct_vmcb() and
construct_vmcs().  While Intel has vmx_update_exception_bitmap()
AMD does not.  So when HVM param's are setup and/or changed there
currently is no way to adjust AMD's exception bitmap.

So this is the simpler way.


>> This is both a more complete support then in currently provided by
>> QEMU and/or KVM and less.  The missing part requires QEMU changes
>> and has been left out until the QEMU patches are accepted upstream.
> 
> I vaguely recall the question having been asked before, but I can't
> find it to the answer to it: If qemu has support for this, why can't
> you build on that rather than adding everything in the hypervisor?
> 

The v10 version of this patch set (which is waiting for an adjusted
QEMU (the released 2.2.0 is one) does use QEMU for more VMware port
support.  The issues are:

1) QEMU needs access to parts of CPU registers to handle VMware port.
2) You need to allow ring 3 access to this 1 I/O port.
3) There is more state in xen that would need to also be sent to
   QEMU if all support is moved to QEMU.

>> For AMD (svm) the max instruction length of 15 is hard coded.  This
>> is because __get_instruction_length_from_list() has issues that when
>> called from #GP handler NRIP is not available, or that NRIP may not
>> be available at all on a particular HW, leading to the need read the
>> instruction twice --- once in __get_instruction_length_from_list()
>> and then again in vmport_gp_check(). Which is bad because memory may
>> change between the reads.
> 
> I don't get the connection between the first sentence (which just
> states an architectural fact) and the rest of this paragraph.
> 

This is an AMD thing.  Boris Ostrovsky suggested to use the AMD only
routine __get_instruction_length_from_list(), which I did in v5 and
found out that many (maybe all) AMD chips would still do the hand
decoding.

Now I could make this patch set even bigger by adjusting or adding
AMD only routines that return the decode of the current instruction.

This would also split vmport_gp_check() into an AMD and Intel versions.

So I went with the stranger way of saying that the current instruction
length is 15 for AMD in all cases.  Then in vmport_gp_check() I only
fetch bytes to the end of the page (which should be almost the same
overhead between 1 and 15 when all 15 are in the same page).

I.E. the wasted fetching on AMD is not a problem.

I will try and add some of this to the commit message.


>> @@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
>>       return;
>>   }
>>
>> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
>> +                                    struct vcpu *v)
>> +{
>> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>> +    /*
>> +     * Just use 15 for the instruction length; vmport_gp_check will
>> +     * adjust it.  This is because
>> +     * __get_instruction_length_from_list() has issues, and may
>> +     * require a double read of the instruction bytes.  At some
>> +     * point a new routine could be added that is based on the code
>> +     * in vmport_gp_check with extensions to make it more general.
>> +     * Since that routine is the only user of this code this can be
>> +     * done later.
>> +     */
>> +    unsigned long inst_len = 15;
> 
> Surely this can be unsigned int?

The code is smaller this way.  In vmx_vmexit_gp_intercept():

    unsigned long inst_len;
...
    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
...
    rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
...

So changing the argument to vmport_gp_check() to "unsigned int" would
add code there.

> And the value be MAX_INST_LEN?

Yes, will change.


> 
>> --- /dev/null
>> +++ b/xen/arch/x86/hvm/vmware/vmport.c
>> @@ -0,0 +1,262 @@
>> +/*
>> + * HVM VMPORT emulation
>> + *
>> + * Copyright (C) 2012 Verizon Corporation
>> + *
>> + * This file is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License Version 2 (GPLv2)
>> + * as published by the Free Software Foundation.
>> + *
>> + * This file is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details. <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/config.h>
> 
> No need for this.

Dropped.

> 
>> +#define MAX_INST_LEN 15
> 
> Please move SVM's identical definition into e.g. asm-x86/processor.h
> or even x86_emulate/x86_emulate.h (so it can also be used in
> x86_emulate/x86_emulate.c), and avoid adding another instance
> here.
> 

Sure.


>> +#ifndef NDEBUG
>> +unsigned int opt_vmport_debug __read_mostly;
>> +integer_param("vmport_debug", opt_vmport_debug);
>> +#endif
> 
> If this was used anywhere, the variable ought to be static. But
> since it seems unused, it ought to be dropped.
> 

Done.

>> +/* More VMware defines */
>> +
>> +#define VMWARE_GUI_AUTO_GRAB              0x001
>> +#define VMWARE_GUI_AUTO_UNGRAB            0x002
>> +#define VMWARE_GUI_AUTO_SCROLL            0x004
>> +#define VMWARE_GUI_AUTO_RAISE             0x008
>> +#define VMWARE_GUI_EXCHANGE_SELECTIONS    0x010
>> +#define VMWARE_GUI_WARP_CURSOR_ON_UNGRAB  0x020
>> +#define VMWARE_GUI_FULL_SCREEN            0x040
>> +
>> +#define VMWARE_GUI_TO_FULL_SCREEN         0x080
>> +#define VMWARE_GUI_TO_WINDOW              0x100
>> +
>> +#define VMWARE_GUI_AUTO_RAISE_DISABLED    0x200
>> +
>> +#define VMWARE_GUI_SYNC_TIME              0x400
> 
> What do all of the above mean? Without any explanation it is
> impossible to understand why reporting any of them set below
> is correct/acceptable.
> 

Will move to QEMU, so dropping them.

>> +int vmport_ioport(int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    uint16_t cmd = regs->rcx;
> 
> As you already have most other variables needed only inside the if()
> below declared in that scope, please be consistent with this one.
> Albeit the value of this variable is questionable anyway - it's being
> used exactly once.
> 

Ok, dropping this variable (it was used more in older versions).


>> +    int rc = X86EMUL_OKAY;
>> +
>> +    if ( regs->_eax == BDOOR_MAGIC )
> 
> With this, is handling other than 32-bit in/out really meaningful/
> correct?
> 

Yes. Harder to use, but since VMware allows it, I allow it also.

>> +        case BDOOR_CMD_GETHWVERSION:
>> +            /* vmware_hw */
>> +            regs->_eax = 0;get_instruction_length
>> +            if ( is_hvm_vcpu(curr) )
> 
> Since you can't get here for PV, I can't see what you need this
> conditional for.
> 

Since I was not 100% sure, I was being safe.  Would converting
this to be a "debug=y" check be ok?


>> +            {
>> +                struct hvm_domain *hd = &d->arch.hvm_domain;
>> +
>> +                regs->_eax = hd->params[HVM_PARAM_VMWARE_HW];
>> +            }
>> +            if ( !regs->_eax )
>> +                regs->_eax = 4;  /* Act like version 4 */
> 
> Why version 4?
>


That is the 1st version that VMware was more consistent in the handling
of the "VMware hardware version".  Any value between 1 and 6 would be
ok.  This should only happen in strange configs.





>> +        case BDOOR_CMD_GETTIME:
>> +            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
>> +            /* hostUsecs */
>> +            regs->_ebx = value % 1000000UL;
>> +            /* hostSecs */
>> +            regs->_eax = value / 1000000ULL;
>> +            /* maxTimeLag */
>> +            regs->_ecx = 1000000;
>> +            /* offset to GMT in minutes */
>> +            regs->_edx = d->time_offset_seconds / 60;
>> +            break;
>> +        case BDOOR_CMD_GETTIMEFULL:
>> +            value = get_localtime_us(d) - d->time_offset_seconds * 1000000ULL;
>> +            /* ... */
> 
> ???
> 

There use to be a lot of setting registers to BDOOR_MAGIC, but since

>> +            regs->_eax = BDOOR_MAGIC;
> 
> regs->_eax already has this value.
> 

You are right.  Will drop this setting and comment.


>> +            /* hostUsecs */
>> +            regs->_ebx =value / 1000000ULL;
>> +            /* maxTimeLag */
>> +            regs->_ecx = 1000000;
>> +            break;
> 
> Perhaps this should share code with BDOOR_CMD_GETTIME; I have
> to admit though that I can't make any sense of why the latter one
> has a FULL suffix when it returns _less_ information.
> 

Sharing of code is not simple.
Since I did not pick the names, VMware did.

Bug found.  The full returns data in si & dx.
will fix. And also makes sharing more complex then not.



>> +        if ( dir == IOREQ_READ )
>> +        {
>> +            switch ( bytes )
>> +            {
>> +            case 1:
>> +                regs->rax = (saved_rax & 0xffffff00) | (regs->rax & 0xff);
>> +                break;
>> +            case 2:
>> +                regs->rax = (saved_rax & 0xffff0000) | (regs->rax & 0xffff);
>> +                break;
> 
> Both of these zero the high 32 bits when they shouldn't. But also see
> below.
> 

You are right, will fix.

>> +            case 4:
>> +                regs->rax = regs->_eax;
>> +                break;
>> +            }
>> +            *val = regs->rax;
>> +        }
>> +        else
>> +            regs->rax = saved_rax;
> 
> This is all rather dubious - instead of clobbering reg->rax within the
> earlier switch, write the value to a local variable and then merge it
> here.

Ok, Will adjust that way.

> But as much as above, the question on what to do with
> operand size being other than 32-bit - in particular for the cases
> where other registers get modified - is relevant here too.

I tested what happens on VMware and it "does what I have here" on sizes
other then 32-bit.  The other registers still change.

> Even more
> so that the "port" function parameter isn't even checked (and hence
> you'd also handle e.g. "in %dx,%al" with %dx being BDOOR_PORT
> + 1, 2, or 3 afaict).
> 

Yes, I need to check and reject if not exactly this port.


>> +int vmport_gp_check(struct cpu_user_regs *regs, struct vcpu *v,
>> +                    unsigned long *inst_len, unsigned long inst_addr,
>> +                    unsigned long ei1, unsigned long ei2)
>> +{
>> +    if ( !v->domain->arch.hvm_domain.is_vmware_port_enabled )
>> +        return X86EMUL_VMPORT_NOT_ENABLED;
>> +
>> +    if ( *inst_len && *inst_len <= MAX_INST_LEN &&
>> +         (regs->rdx & 0xffff) == BDOOR_PORT && ei1 == 0 && ei2 == 0 &&
> 
> regs->_edx may yield slightly better code; I wonder whether we
> shouldn't extend __DECL_REG() to also give us ->dx (and maybe
> even ->dl, ->dh, etc).
> 

I will switch to _edx for now.  I see no need to extend __DECL_REG()
in this patch set.  Since most of the time, QEMU will get involved
and be most of the time spent, the overhead of the the and is not that much.

> These ei1/ei2 checks belong in the callers imo - even if both SVM
> and VMX happen to have them be zero in the cases you're
> interested in, these are still vendor dependent values which
> shouldn't be interpreted by vendor independent code.
> 

Ok, will move the checks.

>> +         regs->_eax == BDOOR_MAGIC )
>> +    {
>> +        int i = 0;
> 
> unsigned int
> 

Sure.

>> +        uint32_t val;
>> +        uint32_t byte_cnt = hvm_guest_x86_mode(v);
> 
> Not the best variable name - x86emul uses op_bytes. Which gets me
> to a fundamental question: With all this custom decoding and
> linearizing of CS:RIP, did you investigate using x86emul with suitably
> set up callbacks instead? That would e.g. at once make you properly
> ignore benign instruction prefixes (you look for 0x66 only below).
> 

Will change the name.  I did not investigate x86emul, I will look into
what it would take.  So one of the risks that come to mind is that if
x86emul incorrectly executes an instruction (a non I/O one) and
does not generate a #GP, this is bad.  I should be able to prevent
the #GP for an I/O in my quick look.

Since this code path is only for avoiding #GP fault when using
the VMware port, the only code that I know of that did not use
"0xed" and any prefix bytes is the testing code I have.

Preventing VMware port from working under unexpected instruction
cases is not been high on the list of tasks to do.


>> +        unsigned char bytes[MAX_INST_LEN];
>> +        unsigned int fetch_len;
>> +        int frc;
>> +
>> +        /* in or out are limited to 32bits */
>> +        if ( byte_cnt > 4 )
>> +            byte_cnt = 4;
>> +
>> +        /*
>> +         * Fetch up to the next page break; we'll fetch from the
>> +         * next page later if we have to.
>> +         */
>> +        fetch_len = min_t(unsigned int, *inst_len,
>> +                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
>> +        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
>> +                                                PFEC_page_present);
>> +        if ( frc != HVMCOPY_okay )
>> +        {
>> +            gdprintk(XENLOG_WARNING,
>> +                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
>> +                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
> 
> Pointless cast. But the value of log messages like this one is
> questionable anyway.
> 

Will drop cast.  I am not sure it is possible to get here. The best I
have come up with is to change the GDT entry for CS to fault, then do
this instruction.  Not sure it would fault, and clearly is an attempt
to break in.

I do know that if Xen is running under VMware (Why anyone would do
this?) this is possible.

With all this, should I just drop this message (or make it debug=y
only)?

>> +            return X86EMUL_VMPORT_FETCH_ERROR_BYTE1;
>> +        }
>> +
>> +        /* Check for operand size prefix */
>> +        while ( (i < MAX_INST_LEN) && (bytes[i] == 0x66) )get_instruction_length
>> +        {
>> +            i++;
>> +            if ( i >= fetch_len )
>> +            {
>> +                frc = hvm_fetch_from_guest_virt_nofault(
>> +                    &bytes[fetch_len], inst_addr + fetch_len,
>> +                    MAX_INST_LEN - fetch_len, PFEC_page_present);
>> +                if ( frc != HVMCOPY_okay )
>> +                {
>> +                    gdprintk(XENLOG_WARNING,
>> +                             "Bad instruction fetch at %#lx + %#x (frc=%d)\n",
>> +                             inst_addr, fetch_len, frc);
>> +                    return X86EMUL_VMPORT_FETCH_ERROR_BYTE2;
>> +                }
>> +                fetch_len = MAX_INST_LEN;
>> +            }
>> +        }
>> +        *inst_len = i + 1;
> 
> i may be MAX_INST_LEN already when you get here.
> 

Yes, I need to terminate the loop sooner.

>> +
>> +        /* Only adjust byte_cnt 1 time */
>> +        if ( bytes[0] == 0x66 )     /* operand size prefix */
>> +        {
>> +            if ( byte_cnt == 4 )
>> +                byte_cnt = 2;
>> +            else
>> +                byte_cnt = 4;
>> +        }
> 
> Iirc REX.W set following 0x66 cancels the effect of the latter. Another
> thing x86emul would be taking care of for you if you used it.
> 

I did not know this.  Most of my testing was done without any check
for prefix(s).  I.E. (Open) VMware Tools only uses the inl.  I do
not know of anybody using 16bit segments and VMware tools.

> Also this byte_cnt handling isn't correct for the real and VM86 mode
> cases (where hvm_guest_x86_mode() returns 0/1 respectively).
> 

Ok, will better use hvm_guest_x86_mode().


>> +        if ( bytes[i] == 0xed )     /* in (%dx),%eax or in (%dx),%ax */
>> +            return vmport_ioport(IOREQ_READ, BDOOR_PORT, byte_cnt, &val);
>> +        else if ( bytes[i] == 0xec )     /* in (%dx),%al */
>> +            return vmport_ioport(IOREQ_READ, BDOOR_PORT, 1, &val);
>> +        else if ( bytes[i] == 0xef )     /* out %eax,(%dx) or out %ax,(%dx) */
>> +            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, byte_cnt, &val);
>> +        else if ( bytes[i] == 0xee )     /* out %al,(%dx) */
>> +            return vmport_ioport(IOREQ_WRITE, BDOOR_PORT, 1, &val);
>> +        else
>> +        {
>> +            *inst_len = 0; /* This is unknown. */
>> +            return X86EMUL_VMPORT_BAD_OPCODE;
>> +        }
> 
> switch() please.

Sure.

> 
>> +static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
>> +                                    struct vcpu *v)
>> +{
>> +    unsigned long exit_qualification;
>> +    unsigned long inst_len;
>> +    unsigned long inst_addr = vmx_rip2pointer(regs, v);
>> +    unsigned long ecode;
>> +    int rc;
>> +#ifndef NDEBUG
>> +    unsigned long orig_inst_len;
>> +    unsigned long vector;
>> +
>> +    __vmread(VM_EXIT_INTR_INFO, &vector);
>> +    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
>> +    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
>> +#endif
> 
> If you use ASSERT() instead of BUG_ON(), I think you can avoid most
> of this preprocessor conditional.
> 

I do not see how.  vector only exists in "debug=y".  So yes if using
ASSERT() I could move the #endif up 2 lines, but that does not
look better to me.

>> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
>> +    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
> 
> get_instruction_length(). But is it architecturally defined that
> #GP intercept vmexits actually set this field?
> 

I could not find a clear statement.  My reading of (directly out of
"Intel® 64 and IA-32 Architectures
Software Developer’s Manual
Volume 3 (3A, 3B & 3C):
System Programming Guide
Order Number: 325384-052US, September 2014"):


• VM-exit instruction length. This field is used in the following cases:
— For fault-like VM exits due to attempts to execute one of the
following instructions that cause VM exits
unconditionally (see Section 25.1.2) or based on the settings of
VM-execution controls (see Section
25.1.3): CLTS, CPUID, GETSEC, HLT, IN, INS, INVD, INVEPT, INVLPG,
INVPCID, INVVPID, LGDT, LIDT, LLDT,
LMSW, LTR, MONITOR, MOV CR, MOV DR, MWAIT, OUT, OUTS, PAUSE, RDMSR,
RDPMC, RDRAND, RDSEED,
RDTSC, RDTSCP, RSM, SGDT, SIDT, SLDT, STR, VMCALL, VMCLEAR, VMLAUNCH,
VMPTRLD, VMPTRST,
VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON, WBINVD, WRMSR, XRSTORS,
XSETBV, and XSAVES.


And

25.1.2
Instructions That Cause VM Exits Unconditionally
The following instructions cause VM exits when they are executed in VMX
non-root operation: CPUID, GETSEC,1
INVD, and XSETBV. This is also true of instructions introduced with VMX,
which include: INVEPT, INVVPID,
VMCALL,2 VMCLEAR, VMLAUNCH, VMPTRLD, VMPTRST, VMRESUME, VMXOFF, and VMXON.
25.1.3
Instructions That Cause VM Exits Conditionally
Certain instructions cause VM exits in VMX non-root operation depending
on the setting of the VM-execution
controls. The following instructions can cause “fault-like” VM exits
based on the conditions described:3
• CLTS. The CLTS instruction causes a VM exit if the bits in position 3
(corresponding to CR0.TS) are set in both
   the CR0 guest/host mask and the CR0 read shadow.
• HLT. The HLT instruction causes a VM exit if the “HLT exiting”
VM-execution control is 1.
  •
IN, INS/INSB/INSW/INSD, OUT, OUTS/OUTSB/OUTSW/OUTSD. The behavior of
each of these instruc-
tions is determined by the settings of the “unconditional I/O exiting”
and “use I/O bitmaps” VM-execution
controls:
— If both controls are 0, the instruction executes normally.
— If the “unconditional I/O exiting” VM-execution control is 1 and the
“use I/O bitmaps” VM-execution control
is 0, the instruction causes a VM exit.
— If the “use I/O bitmaps” VM-execution control is 1, the instruction
causes a VM exit if it attempts to access
an I/O port corresponding to a bit set to 1 in the appropriate I/O
bitmap (see Section 24.6.4). If an I/O
operation “wraps around” the 16-bit I/O-port space (accesses ports FFFFH
and 0000H), the I/O instruction
causes a VM exit (the “unconditional I/O exiting” VM-execution control
is ignored if the “use I/O bitmaps”
VM-execution control is 1).
See Section 25.1.1 for information regarding the priority of VM exits
relative to faults that may be caused by
the INS and OUTS instructions.



to me says that yes, this field is set on a #GP exit on an IN.  But the
#GP case is not called out by name.

My read is that a #GP fault is a "VM Exits Unconditionally" based on the
setting of the exception bit mask.


In all my testing this field that the correct value on the small set of
Intel chips I have tested on.

Since AMD does not (and has the hard coded 15), I could switch to
15 here also.


So not using get_instruction_length() does avoid a possible BUG_ON()
if I am wrong.

So there are 3 options here:
1) Add an ASSERT() like the BUG_ON() in get_instruction_length()
2) Switch to using get_instruction_length()
3) Switch to using MAX_INST_LEN.

Let me know which way to go.


>> @@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>>               if ( v->fpu_dirtied )
>>                   nvcpu->nv_vmexit_pending = 1;
>>           }
>> +        else if ( vector == TRAP_gp_fault )
>> +            nvcpu->nv_vmexit_pending = 1;
> 
> Doesn't that mean an unconditional vmexit even if the L1 hypervisor
> didn't ask for such?
> 

I might.  I have not done any testing here for the nested VMX case.
I could just drop this for now and deside what to do for this code later.





>> --- a/xen/include/asm-x86/hvm/io.h
>> +++ b/xen/include/asm-x86/hvm/io.h
>> @@ -25,7 +25,7 @@
>>   #include <public/hvm/ioreq.h>
>>   #include <public/event_channel.h>
>>
>> -#define MAX_IO_HANDLER             16
>> +#define MAX_IO_HANDLER             17
> 
> If you're really getting beyond 16 (which I don't see, I'm counting
> 14 current users) this should be bumped by more than just 1.
> 
>> +/*
>> + * Additional return values from vmport_gp_check.
>> + *
>> + * Note: return values include:
>> + *   X86EMUL_OKAY
>> + *   X86EMUL_UNHANDLEABLE
>> + *   X86EMUL_EXCEPTION
>> + *   X86EMUL_RETRY
>> + *   X86EMUL_CMPXCHG_FAILED
>> + *
>> + * The additional do not overlap any of the above.
>> + */
>> +#define X86EMUL_VMPORT_NOT_ENABLED              10
>> +#define X86EMUL_VMPORT_FETCH_ERROR_BYTE1        11
>> +#define X86EMUL_VMPORT_FETCH_ERROR_BYTE2        12
>> +#define X86EMUL_VMPORT_BAD_OPCODE               13
>> +#define X86EMUL_VMPORT_BAD_STATE                14
> 
> Going through the patch, you only ever return these, but never
> check for any of them. Why do you add these in the first place,
> risking future collisions even if there are none now?
> 

There used to be a way to log or trace them, but I think
that code has been dropped.  Will remove them.

   -Don Slutz

> Jan
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-21 17:52         ` Don Slutz
@ 2015-01-22  8:32           ` Jan Beulich
  2015-01-26 15:58             ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-22  8:32 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Boris Ostrovsky

>>> On 21.01.15 at 18:52, <dslutz@verizon.com> wrote:
> On 01/16/15 05:09, Jan Beulich wrote:
>>>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:
>>> This is a new domain_create() flag, DOMCRF_vmware_port.  It is
>>> passed to domctl as XEN_DOMCTL_CDF_vmware_port.
>> 
>> Can you explain why a HVM param isn't suitable here?
>> 
> 
> The issue is that you need this flag during construct_vmcb() and
> construct_vmcs().  While Intel has vmx_update_exception_bitmap()
> AMD does not.  So when HVM param's are setup and/or changed there
> currently is no way to adjust AMD's exception bitmap.
> 
> So this is the simpler way.

But the less desirable one from a design/consistency perspective.
Unless other maintainers disagree, I'd like to see this changed.

>>> This is both a more complete support then in currently provided by
>>> QEMU and/or KVM and less.  The missing part requires QEMU changes
>>> and has been left out until the QEMU patches are accepted upstream.
>> 
>> I vaguely recall the question having been asked before, but I can't
>> find it to the answer to it: If qemu has support for this, why can't
>> you build on that rather than adding everything in the hypervisor?
>> 
> 
> The v10 version of this patch set (which is waiting for an adjusted
> QEMU (the released 2.2.0 is one) does use QEMU for more VMware port
> support.  The issues are:

Was there a newer version of these posted than the v8 I looked at?
If so, I must have overlooked the posting (as otherwise I would of
course have commented on the newer version).

> 1) QEMU needs access to parts of CPU registers to handle VMware port.
> 2) You need to allow ring 3 access to this 1 I/O port.
> 3) There is more state in xen that would need to also be sent to
>    QEMU if all support is moved to QEMU.

Understood.

>>> @@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
>>>       return;
>>>   }
>>>
>>> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
>>> +                                    struct vcpu *v)
>>> +{
>>> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>>> +    /*
>>> +     * Just use 15 for the instruction length; vmport_gp_check will
>>> +     * adjust it.  This is because
>>> +     * __get_instruction_length_from_list() has issues, and may
>>> +     * require a double read of the instruction bytes.  At some
>>> +     * point a new routine could be added that is based on the code
>>> +     * in vmport_gp_check with extensions to make it more general.
>>> +     * Since that routine is the only user of this code this can be
>>> +     * done later.
>>> +     */
>>> +    unsigned long inst_len = 15;
>> 
>> Surely this can be unsigned int?
> 
> The code is smaller this way.  In vmx_vmexit_gp_intercept():
> 
>     unsigned long inst_len;
> ...
>     __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
> ...
>     rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
> ...
> 
> So changing the argument to vmport_gp_check() to "unsigned int" would
> add code there.

So be it then. Generic code shouldn't use odd types just because of
vendor specific code needs it, unless this makes things _a lot_ more
complicated.

>>> +    int rc = X86EMUL_OKAY;
>>> +
>>> +    if ( regs->_eax == BDOOR_MAGIC )
>> 
>> With this, is handling other than 32-bit in/out really meaningful/
>> correct?
>> 
> 
> Yes. Harder to use, but since VMware allows it, I allow it also.

But then a comment explaining the non-architectural (from an
instruction set perspective) behavior is the minimum that's
needed for future readers (and reviewers) to understand this.

>>> +        case BDOOR_CMD_GETHWVERSION:
>>> +            /* vmware_hw */
>>> +            regs->_eax = 0;get_instruction_length
>>> +            if ( is_hvm_vcpu(curr) )
>> 
>> Since you can't get here for PV, I can't see what you need this
>> conditional for.
>> 
> 
> Since I was not 100% sure, I was being safe.  Would converting
> this to be a "debug=y" check be ok?

ASSERT() would indeed be the right vehicle.

>>> +            {
>>> +                struct hvm_domain *hd = &d->arch.hvm_domain;
>>> +
>>> +                regs->_eax = hd->params[HVM_PARAM_VMWARE_HW];
>>> +            }
>>> +            if ( !regs->_eax )
>>> +                regs->_eax = 4;  /* Act like version 4 */
>> 
>> Why version 4?
> 
> That is the 1st version that VMware was more consistent in the handling
> of the "VMware hardware version".  Any value between 1 and 6 would be
> ok.  This should only happen in strange configs.

Please make the comment say so then.

>>> +            /* hostUsecs */
>>> +            regs->_ebx =value / 1000000ULL;
>>> +            /* maxTimeLag */
>>> +            regs->_ecx = 1000000;
>>> +            break;
>> 
>> Perhaps this should share code with BDOOR_CMD_GETTIME; I have
>> to admit though that I can't make any sense of why the latter one
>> has a FULL suffix when it returns _less_ information.
>> 
> 
> Sharing of code is not simple.
> Since I did not pick the names, VMware did.
> 
> Bug found.  The full returns data in si & dx.
> will fix. And also makes sharing more complex then not.

Of course if the current code is incomplete, sharing makes less sense
once completed.

>>> +        unsigned char bytes[MAX_INST_LEN];
>>> +        unsigned int fetch_len;
>>> +        int frc;
>>> +
>>> +        /* in or out are limited to 32bits */
>>> +        if ( byte_cnt > 4 )
>>> +            byte_cnt = 4;
>>> +
>>> +        /*
>>> +         * Fetch up to the next page break; we'll fetch from the
>>> +         * next page later if we have to.
>>> +         */
>>> +        fetch_len = min_t(unsigned int, *inst_len,
>>> +                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
>>> +        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
>>> +                                                PFEC_page_present);
>>> +        if ( frc != HVMCOPY_okay )
>>> +        {
>>> +            gdprintk(XENLOG_WARNING,
>>> +                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
>>> +                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
>> 
>> Pointless cast. But the value of log messages like this one is
>> questionable anyway.
>> 
> 
> Will drop cast.  I am not sure it is possible to get here. The best I
> have come up with is to change the GDT entry for CS to fault, then do
> this instruction.  Not sure it would fault, and clearly is an attempt
> to break in.
> 
> I do know that if Xen is running under VMware (Why anyone would do
> this?) this is possible.
> 
> With all this, should I just drop this message (or make it debug=y
> only)?

Yes - dropping would be preferred by me, but I'd accept a debug=y
only one too.

>>> +
>>> +        /* Only adjust byte_cnt 1 time */
>>> +        if ( bytes[0] == 0x66 )     /* operand size prefix */
>>> +        {
>>> +            if ( byte_cnt == 4 )
>>> +                byte_cnt = 2;
>>> +            else
>>> +                byte_cnt = 4;
>>> +        }
>> 
>> Iirc REX.W set following 0x66 cancels the effect of the latter. Another
>> thing x86emul would be taking care of for you if you used it.
> 
> I did not know this.  Most of my testing was done without any check
> for prefix(s).  I.E. (Open) VMware Tools only uses the inl.  I do
> not know of anybody using 16bit segments and VMware tools.

But this isn't the perspective to take when adding code to the
hypervisor - you should always consider what a (perhaps
malicious) guest could do.

>>> +static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
>>> +                                    struct vcpu *v)
>>> +{
>>> +    unsigned long exit_qualification;
>>> +    unsigned long inst_len;
>>> +    unsigned long inst_addr = vmx_rip2pointer(regs, v);
>>> +    unsigned long ecode;
>>> +    int rc;
>>> +#ifndef NDEBUG
>>> +    unsigned long orig_inst_len;
>>> +    unsigned long vector;
>>> +
>>> +    __vmread(VM_EXIT_INTR_INFO, &vector);
>>> +    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
>>> +    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
>>> +#endif
>> 
>> If you use ASSERT() instead of BUG_ON(), I think you can avoid most
>> of this preprocessor conditional.
>> 
> 
> I do not see how.  vector only exists in "debug=y".  So yes if using
> ASSERT() I could move the #endif up 2 lines, but that does not
> look better to me.

I don't follow - ASSERT() is intentionally coded in a way such that
variables used only by it don't cause compiler warnings. And the
optimizer ought to be able to eliminate the then unnecessary
__vmread().

>>> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
>>> +    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
>> 
>> get_instruction_length(). But is it architecturally defined that
>> #GP intercept vmexits actually set this field?
>> 
> 
> I could not find a clear statement.

That's the point of my comment.

>  My reading of (directly out of
> "Intel® 64 and IA-32 Architectures
> Software Developer’s Manual
> Volume 3 (3A, 3B & 3C):
>[...]
> to me says that yes, this field is set on a #GP exit on an IN.  But the
> #GP case is not called out by name.

And it is not any of the cases mentioned.

> My read is that a #GP fault is a "VM Exits Unconditionally" based on the
> setting of the exception bit mask.

Right, but it's not exactly an instruction based exit. Unless Intel
confirms that your extending of the manual says is correct, I'd
rather recommend against relying on unspecified behavior. If
any CPU model ends up behaving differently, this might be
rather hard to diagnose I'm afraid.

> So not using get_instruction_length() does avoid a possible BUG_ON()
> if I am wrong.
> 
> So there are 3 options here:
> 1) Add an ASSERT() like the BUG_ON() in get_instruction_length()
> 2) Switch to using get_instruction_length()
> 3) Switch to using MAX_INST_LEN.
> 
> Let me know which way to go.

As said above - use get_instruction_length() if Intel confirms the
necessary hardware behavior as being architectural. If they
don't, 3) looks like the only viable option.

>>> @@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>>>               if ( v->fpu_dirtied )
>>>                   nvcpu->nv_vmexit_pending = 1;
>>>           }
>>> +        else if ( vector == TRAP_gp_fault )
>>> +            nvcpu->nv_vmexit_pending = 1;
>> 
>> Doesn't that mean an unconditional vmexit even if the L1 hypervisor
>> didn't ask for such?
> 
> I might.  I have not done any testing here for the nested VMX case.
> I could just drop this for now and deside what to do for this code later.

If dropping the code is safe without also forbidding the combination
of nested and VMware emulation.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-22  8:32           ` Jan Beulich
@ 2015-01-26 15:58             ` Don Slutz
  2015-01-26 16:46               ` Jan Beulich
  2015-02-10 19:30               ` [PATCH " Don Slutz
  0 siblings, 2 replies; 37+ messages in thread
From: Don Slutz @ 2015-01-26 15:58 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Suravee Suthikulpanit, Boris Ostrovsky

On 01/22/15 03:32, Jan Beulich wrote:
>>>> On 21.01.15 at 18:52, <dslutz@verizon.com> wrote:
>> On 01/16/15 05:09, Jan Beulich wrote:
>>>>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:
>>>> This is a new domain_create() flag, DOMCRF_vmware_port.  It is
>>>> passed to domctl as XEN_DOMCTL_CDF_vmware_port.
>>> Can you explain why a HVM param isn't suitable here?
>>>
>> The issue is that you need this flag during construct_vmcb() and
>> construct_vmcs().  While Intel has vmx_update_exception_bitmap()
>> AMD does not.  So when HVM param's are setup and/or changed there
>> currently is no way to adjust AMD's exception bitmap.
>>
>> So this is the simpler way.
> But the less desirable one from a design/consistency perspective.
> Unless other maintainers disagree, I'd like to see this changed.

Ok, but will wait some time to see if "Unless other maintainers disagree"

>>>> This is both a more complete support then in currently provided by
>>>> QEMU and/or KVM and less.  The missing part requires QEMU changes
>>>> and has been left out until the QEMU patches are accepted upstream.
>>> I vaguely recall the question having been asked before, but I can't
>>> find it to the answer to it: If qemu has support for this, why can't
>>> you build on that rather than adding everything in the hypervisor?
>>>
>> The v10 version of this patch set (which is waiting for an adjusted
>> QEMU (the released 2.2.0 is one) does use QEMU for more VMware port
>> support.  The issues are:
> Was there a newer version of these posted than the v8 I looked at?
> If so, I must have overlooked the posting (as otherwise I would of
> course have commented on the newer version).
>

The newer version was being worked on, but had not been posted (and had no
changes to this patch).  Since it was never posted, I will just continue
getting v9
(instead of v10) into shape to post.


>> 1) QEMU needs access to parts of CPU registers to handle VMware port.
>> 2) You need to allow ring 3 access to this 1 I/O port.
>> 3) There is more state in xen that would need to also be sent to
>>    QEMU if all support is moved to QEMU.
> Understood.
>
>>>> @@ -2111,6 +2112,31 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
>>>>       return;
>>>>   }
>>>>
>>>> +static void svm_vmexit_gp_intercept(struct cpu_user_regs *regs,
>>>> +                                    struct vcpu *v)
>>>> +{
>>>> +    struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
>>>> +    /*
>>>> +     * Just use 15 for the instruction length; vmport_gp_check will
>>>> +     * adjust it.  This is because
>>>> +     * __get_instruction_length_from_list() has issues, and may
>>>> +     * require a double read of the instruction bytes.  At some
>>>> +     * point a new routine could be added that is based on the code
>>>> +     * in vmport_gp_check with extensions to make it more general.
>>>> +     * Since that routine is the only user of this code this can be
>>>> +     * done later.
>>>> +     */
>>>> +    unsigned long inst_len = 15;
>>> Surely this can be unsigned int?
>> The code is smaller this way.  In vmx_vmexit_gp_intercept():
>>
>>     unsigned long inst_len;
>> ...
>>     __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
>> ...
>>     rc = vmport_gp_check(regs, v, &inst_len, inst_addr,
>> ...
>>
>> So changing the argument to vmport_gp_check() to "unsigned int" would
>> add code there.
> So be it then. Generic code shouldn't use odd types just because of
> vendor specific code needs it, unless this makes things _a lot_ more
> complicated.
>

Ok.  Since It looks like I will not be using get_instruction_length() I will
change this to "unsigned int" (or should I use "unsigned short" or
"unsigned byte"?).

>>>> +    int rc = X86EMUL_OKAY;
>>>> +
>>>> +    if ( regs->_eax == BDOOR_MAGIC )
>>> With this, is handling other than 32-bit in/out really meaningful/
>>> correct?
>>>
>> Yes. Harder to use, but since VMware allows it, I allow it also.
> But then a comment explaining the non-architectural (from an
> instruction set perspective) behavior is the minimum that's
> needed for future readers (and reviewers) to understand this.

Ok will add.


>>>> +        case BDOOR_CMD_GETHWVERSION:
>>>> +            /* vmware_hw */
>>>> +            regs->_eax = 0;get_instruction_length
>>>> +            if ( is_hvm_vcpu(curr) )
>>> Since you can't get here for PV, I can't see what you need this
>>> conditional for.
>>>
>> Since I was not 100% sure, I was being safe.  Would converting
>> this to be a "debug=y" check be ok?
> ASSERT() would indeed be the right vehicle.
>

Will do.

>>>> +            {
>>>> +                struct hvm_domain *hd = &d->arch.hvm_domain;
>>>> +
>>>> +                regs->_eax = hd->params[HVM_PARAM_VMWARE_HW];
>>>> +            }
>>>> +            if ( !regs->_eax )
>>>> +                regs->_eax = 4;  /* Act like version 4 */
>>> Why version 4?
>> That is the 1st version that VMware was more consistent in the handling
>> of the "VMware hardware version".  Any value between 1 and 6 would be
>> ok.  This should only happen in strange configs.
> Please make the comment say so then.
>

Will do.

>>>> +            /* hostUsecs */
>>>> +            regs->_ebx =value / 1000000ULL;
>>>> +            /* maxTimeLag */
>>>> +            regs->_ecx = 1000000;
>>>> +            break;
>>> Perhaps this should share code with BDOOR_CMD_GETTIME; I have
>>> to admit though that I can't make any sense of why the latter one
>>> has a FULL suffix when it returns _less_ information.
>>>
>> Sharing of code is not simple.
>> Since I did not pick the names, VMware did.
>>
>> Bug found.  The full returns data in si & dx.
>> will fix. And also makes sharing more complex then not.
> Of course if the current code is incomplete, sharing makes less sense
> once completed.
>
>>>> +        unsigned char bytes[MAX_INST_LEN];
>>>> +        unsigned int fetch_len;
>>>> +        int frc;
>>>> +
>>>> +        /* in or out are limited to 32bits */
>>>> +        if ( byte_cnt > 4 )
>>>> +            byte_cnt = 4;
>>>> +
>>>> +        /*
>>>> +         * Fetch up to the next page break; we'll fetch from the
>>>> +         * next page later if we have to.
>>>> +         */
>>>> +        fetch_len = min_t(unsigned int, *inst_len,
>>>> +                          PAGE_SIZE - (inst_addr  & ~PAGE_MASK));
>>>> +        frc = hvm_fetch_from_guest_virt_nofault(bytes, inst_addr, fetch_len,
>>>> +                                                PFEC_page_present);
>>>> +        if ( frc != HVMCOPY_okay )
>>>> +        {
>>>> +            gdprintk(XENLOG_WARNING,
>>>> +                     "Bad instruction fetch at %#lx (frc=%d il=%lu fl=%u)\n",
>>>> +                     (unsigned long) inst_addr, frc, *inst_len, fetch_len);
>>> Pointless cast. But the value of log messages like this one is
>>> questionable anyway.
>>>
>> Will drop cast.  I am not sure it is possible to get here. The best I
>> have come up with is to change the GDT entry for CS to fault, then do
>> this instruction.  Not sure it would fault, and clearly is an attempt
>> to break in.
>>
>> I do know that if Xen is running under VMware (Why anyone would do
>> this?) this is possible.
>>
>> With all this, should I just drop this message (or make it debug=y
>> only)?
> Yes - dropping would be preferred by me, but I'd accept a debug=y
> only one too.

Ok, will drop.

>>>> +
>>>> +        /* Only adjust byte_cnt 1 time */
>>>> +        if ( bytes[0] == 0x66 )     /* operand size prefix */
>>>> +        {
>>>> +            if ( byte_cnt == 4 )
>>>> +                byte_cnt = 2;
>>>> +            else
>>>> +                byte_cnt = 4;
>>>> +        }
>>> Iirc REX.W set following 0x66 cancels the effect of the latter. Another
>>> thing x86emul would be taking care of for you if you used it.
>> I did not know this.  Most of my testing was done without any check
>> for prefix(s).  I.E. (Open) VMware Tools only uses the inl.  I do
>> not know of anybody using 16bit segments and VMware tools.
> But this isn't the perspective to take when adding code to the
> hypervisor - you should always consider what a (perhaps
> malicious) guest could do.

Ok, but my read of this statement does not help decide which way
to go.  I see several options:

1) Only allow #GP to work for 0xed (inl (%dx),%eax).
    Pros: No attack surface for malicious guest.
             No need for get_instruction_length().
             No need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    Cons: There may exist user apps. that work on VMware and
               not on xen (16bit segments, realmode, vm86, etc).

2) Only allow #GP to work for all 4 I/O instructions without prefix.
    Pros: No attack surface for malicious guest.
             No need for get_instruction_length().
             No need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    Cons: There may exist user apps. that work on VMware and
                 not on xen (16bit segments, realmode, vm86, etc).

3) Only allow zero or one 0x66 prefix and 0xed (inl (%dx),%eax).
    Pros: No attack surface for malicious guest.
             No need for get_instruction_length().
             No need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    Cons: There may exist user apps. that work on VMware and
                 not on xen (using too many prefixes, using other
                 opcodes).

4) Only allow zero or one 0x66 prefix and all 4 I/O instructions.
    Pros: No attack surface for malicious guest.
             No need for get_instruction_length().
             No need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    Cons: There may exist user apps. that work on VMware and
                 not on xen (using too many prefixes).

5) Only allow zero to 14 0x66 prefix and 0xed (inl (%dx),%eax).
    Pros: No attack surface for malicious guest.
    Cons: There may exist user apps. that work on VMware and
                not on xen (using unneeded prefixes, using other
                opcodes).
    5a:    Would be cleaner with get_instruction_length() on Intel,
                but would need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    5b:     Always pass in MAX_INST_LEN.
   

6) Only allow zero to 14 0x66 prefix and all 4 I/O instructions.
    Pros: No attack surface for malicious guest.
    Cons: There may exist user apps. that work on VMware and
               not on xen (using unneeded  prefixes).
    6a:    Would be cleaner with get_instruction_length() on Intel,
                but would need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    6b:     Always pass in MAX_INST_LEN.

7) Add complete prefix handling, and all 4 I/O instructions
    Pros: Limited attack surface for malicious guest (the handling
              of all prefixes greatly increases the complexity of the
              code).
    Cons: Lots of added code.
    7a:    Would be cleaner with get_instruction_length() on Intel,
                but would need for Intel to confirm the necessary hardware
                behaviour as being architectural.
    7b:     Always pass in MAX_INST_LEN.

8) Use hvm_emulate_one().
    Pros: shares code, reduces new code.
    Cons: Adds a lot of attack surface for malicious guest.


I had picked #6, you asked for #8, but I read your answer as do not
do #8.

I would be happy to go with any of the 8 ways (or a way I did not list
above),
just need to know which one to focus on.


>>>> +static void vmx_vmexit_gp_intercept(struct cpu_user_regs *regs,
>>>> +                                    struct vcpu *v)
>>>> +{
>>>> +    unsigned long exit_qualification;
>>>> +    unsigned long inst_len;
>>>> +    unsigned long inst_addr = vmx_rip2pointer(regs, v);
>>>> +    unsigned long ecode;
>>>> +    int rc;
>>>> +#ifndef NDEBUG
>>>> +    unsigned long orig_inst_len;
>>>> +    unsigned long vector;
>>>> +
>>>> +    __vmread(VM_EXIT_INTR_INFO, &vector);
>>>> +    BUG_ON(!(vector & INTR_INFO_VALID_MASK));
>>>> +    BUG_ON(!(vector & INTR_INFO_DELIVER_CODE_MASK));
>>>> +#endif
>>> If you use ASSERT() instead of BUG_ON(), I think you can avoid most
>>> of this preprocessor conditional.
>>>
>> I do not see how.  vector only exists in "debug=y".  So yes if using
>> ASSERT() I could move the #endif up 2 lines, but that does not
>> look better to me.
> I don't follow - ASSERT() is intentionally coded in a way such that
> variables used only by it don't cause compiler warnings. And the
> optimizer ought to be able to eliminate the then unnecessary
> __vmread().
>

I am more use to explicit conditional code and to not depend on the
compilers
to correctly optimize the code.  Will change.

Since the most likely case is that I will stop using
get_instruction_length()
(__vmread(VM_EXIT_INSTRUCTION_LEN,...)).  This drops the need for
orig_inst_len
also.

>>>> +    __vmread(EXIT_QUALIFICATION, &exit_qualification);
>>>> +    __vmread(VM_EXIT_INSTRUCTION_LEN, &inst_len);
>>> get_instruction_length(). But is it architecturally defined that
>>> #GP intercept vmexits actually set this field?
>>>
>> I could not find a clear statement.
> That's the point of my comment.
>
>>  My reading of (directly out of
>> "Intel® 64 and IA-32 Architectures
>> Software Developer’s Manual
>> Volume 3 (3A, 3B & 3C):
>> [...]
>> to me says that yes, this field is set on a #GP exit on an IN.  But the
>> #GP case is not called out by name.
> And it is not any of the cases mentioned.
>
>> My read is that a #GP fault is a "VM Exits Unconditionally" based on the
>> setting of the exception bit mask.
> Right, but it's not exactly an instruction based exit. Unless Intel
> confirms that your extending of the manual says is correct, I'd
> rather recommend against relying on unspecified behavior. If
> any CPU model ends up behaving differently, this might be
> rather hard to diagnose I'm afraid.
>
>> So not using get_instruction_length() does avoid a possible BUG_ON()
>> if I am wrong.
>>
>> So there are 3 options here:
>> 1) Add an ASSERT() like the BUG_ON() in get_instruction_length()
>> 2) Switch to using get_instruction_length()
>> 3) Switch to using MAX_INST_LEN.
>>
>> Let me know which way to go.
> As said above - use get_instruction_length() if Intel confirms the
> necessary hardware behavior as being architectural. If they
> don't, 3) looks like the only viable option.


So what is the procedure to getting "Intel confirms the necessary hardware
behaviour as being architectural"?



>>>> @@ -2182,6 +2183,8 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>>>>               if ( v->fpu_dirtied )
>>>>                   nvcpu->nv_vmexit_pending = 1;
>>>>           }
>>>> +        else if ( vector == TRAP_gp_fault )
>>>> +            nvcpu->nv_vmexit_pending = 1;
>>> Doesn't that mean an unconditional vmexit even if the L1 hypervisor
>>> didn't ask for such?
>> I might.  I have not done any testing here for the nested VMX case.
>> I could just drop this for now and deside what to do for this code later.
> If dropping the code is safe without also forbidding the combination
> of nested and VMware emulation.

Will have to do a lot more testing to know.  At the time I started the
coding
it was still considered experimental.  Looks like for 4.6 I will need it
to be
fully unit tested.

    -Don Slutz


> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-26 15:58             ` Don Slutz
@ 2015-01-26 16:46               ` Jan Beulich
  2015-01-26 20:19                 ` Don Slutz
  2015-02-10 19:30               ` [PATCH " Don Slutz
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-26 16:46 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Suravee Suthikulpanit, Boris Ostrovsky

>>> On 26.01.15 at 16:58, <dslutz@verizon.com> wrote:
> On 01/22/15 03:32, Jan Beulich wrote:
>>>>> On 21.01.15 at 18:52, <dslutz@verizon.com> wrote:
>>> On 01/16/15 05:09, Jan Beulich wrote:
>>>>>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:
>>>>> +
>>>>> +        /* Only adjust byte_cnt 1 time */
>>>>> +        if ( bytes[0] == 0x66 )     /* operand size prefix */
>>>>> +        {
>>>>> +            if ( byte_cnt == 4 )
>>>>> +                byte_cnt = 2;
>>>>> +            else
>>>>> +                byte_cnt = 4;
>>>>> +        }
>>>> Iirc REX.W set following 0x66 cancels the effect of the latter. Another
>>>> thing x86emul would be taking care of for you if you used it.
>>> I did not know this.  Most of my testing was done without any check
>>> for prefix(s).  I.E. (Open) VMware Tools only uses the inl.  I do
>>> not know of anybody using 16bit segments and VMware tools.
>> But this isn't the perspective to take when adding code to the
>> hypervisor - you should always consider what a (perhaps
>> malicious) guest could do.
> 
> Ok, but my read of this statement does not help decide which way
> to go.  I see several options:
> 
> 1) Only allow #GP to work for 0xed (inl (%dx),%eax).
>     Pros: No attack surface for malicious guest.
>              No need for get_instruction_length().
>              No need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     Cons: There may exist user apps. that work on VMware and
>                not on xen (16bit segments, realmode, vm86, etc).
> 
> 2) Only allow #GP to work for all 4 I/O instructions without prefix.
>     Pros: No attack surface for malicious guest.
>              No need for get_instruction_length().
>              No need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     Cons: There may exist user apps. that work on VMware and
>                  not on xen (16bit segments, realmode, vm86, etc).
> 
> 3) Only allow zero or one 0x66 prefix and 0xed (inl (%dx),%eax).
>     Pros: No attack surface for malicious guest.
>              No need for get_instruction_length().
>              No need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     Cons: There may exist user apps. that work on VMware and
>                  not on xen (using too many prefixes, using other
>                  opcodes).
> 
> 4) Only allow zero or one 0x66 prefix and all 4 I/O instructions.
>     Pros: No attack surface for malicious guest.
>              No need for get_instruction_length().
>              No need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     Cons: There may exist user apps. that work on VMware and
>                  not on xen (using too many prefixes).
> 
> 5) Only allow zero to 14 0x66 prefix and 0xed (inl (%dx),%eax).
>     Pros: No attack surface for malicious guest.
>     Cons: There may exist user apps. that work on VMware and
>                 not on xen (using unneeded prefixes, using other
>                 opcodes).
>     5a:    Would be cleaner with get_instruction_length() on Intel,
>                 but would need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     5b:     Always pass in MAX_INST_LEN.
>    
> 
> 6) Only allow zero to 14 0x66 prefix and all 4 I/O instructions.
>     Pros: No attack surface for malicious guest.
>     Cons: There may exist user apps. that work on VMware and
>                not on xen (using unneeded  prefixes).
>     6a:    Would be cleaner with get_instruction_length() on Intel,
>                 but would need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     6b:     Always pass in MAX_INST_LEN.
> 
> 7) Add complete prefix handling, and all 4 I/O instructions
>     Pros: Limited attack surface for malicious guest (the handling
>               of all prefixes greatly increases the complexity of the
>               code).
>     Cons: Lots of added code.
>     7a:    Would be cleaner with get_instruction_length() on Intel,
>                 but would need for Intel to confirm the necessary hardware
>                 behaviour as being architectural.
>     7b:     Always pass in MAX_INST_LEN.
> 
> 8) Use hvm_emulate_one().
>     Pros: shares code, reduces new code.
>     Cons: Adds a lot of attack surface for malicious guest.
> 
> 
> I had picked #6, you asked for #8, but I read your answer as do not
> do #8.

I don't think it can or should be read that way; in particular - without
having seen it to be the case in whatever code it takes - I don't buy
the argument of this adding a lot of attack surface: The emulator
code is there anyway.

> I would be happy to go with any of the 8 ways (or a way I did not list
> above),
> just need to know which one to focus on.

As stated before - if feasible, 8 would seem the best option. The
second best one would be to support all four I/O insns (assuming
VMware supports all of them too) with any legal (even if pointless
or redundant) prefix combination, and with the prefixes actually
doing something correctly emulated.

>>> So there are 3 options here:
>>> 1) Add an ASSERT() like the BUG_ON() in get_instruction_length()
>>> 2) Switch to using get_instruction_length()
>>> 3) Switch to using MAX_INST_LEN.
>>>
>>> Let me know which way to go.
>> As said above - use get_instruction_length() if Intel confirms the
>> necessary hardware behavior as being architectural. If they
>> don't, 3) looks like the only viable option.
> 
> 
> So what is the procedure to getting "Intel confirms the necessary hardware
> behaviour as being architectural"?

There's no procedure. Ask them explicitly (i.e. perhaps outside of
this thread, where the question may end up being well hidden from
their eyes), and then ping them until they give you a statement one
way or another.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-26 16:46               ` Jan Beulich
@ 2015-01-26 20:19                 ` Don Slutz
  2015-01-27  7:58                   ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2015-01-26 20:19 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Suravee Suthikulpanit, Boris Ostrovsky

On 01/26/15 11:46, Jan Beulich wrote:
>>>> On 26.01.15 at 16:58, <dslutz@verizon.com> wrote:
>> On 01/22/15 03:32, Jan Beulich wrote:
>>>>>> On 21.01.15 at 18:52, <dslutz@verizon.com> wrote:
>>>> On 01/16/15 05:09, Jan Beulich wrote:
>>>>>>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:

...

> 
> As stated before - if feasible, 8 would seem the best option. The
> second best one would be to support all four I/O insns (assuming
> VMware supports all of them too) with any legal (even if pointless
> or redundant) prefix combination, and with the prefixes actually
> doing something correctly emulated.
> 

Ok, I will focus on hvm_emulate_one.


>>>> So there are 3 options here:
>>>> 1) Add an ASSERT() like the BUG_ON() in get_instruction_length()
>>>> 2) Switch to using get_instruction_length()
>>>> 3) Switch to using MAX_INST_LEN.
>>>>
>>>> Let me know which way to go.
>>> As said above - use get_instruction_length() if Intel confirms the
>>> necessary hardware behavior as being architectural. If they
>>> don't, 3) looks like the only viable option.
>>
>>
>> So what is the procedure to getting "Intel confirms the necessary hardware
>> behaviour as being architectural"?
> 
> There's no procedure. Ask them explicitly (i.e. perhaps outside of
> this thread, where the question may end up being well hidden from
> their eyes), and then ping them until they give you a statement one
> way or another.
> 

I am assuming that:

INTEL(R) VT FOR X86 (VT-X)
M:      Jun Nakajima <jun.nakajima@intel.com>
M:      Eddie Dong <eddie.dong@intel.com>
M:      Kevin Tian <kevin.tian@intel.com>

Is to correct list of people to ask.

   -Don Slutz


> Jan
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-26 20:19                 ` Don Slutz
@ 2015-01-27  7:58                   ` Jan Beulich
  2015-01-28  8:19                     ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-27  7:58 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Suravee Suthikulpanit, Boris Ostrovsky

>>> On 26.01.15 at 21:19, <dslutz@verizon.com> wrote:
> On 01/26/15 11:46, Jan Beulich wrote:
>> As stated before - if feasible, 8 would seem the best option. The
>> second best one would be to support all four I/O insns (assuming
>> VMware supports all of them too) with any legal (even if pointless
>> or redundant) prefix combination, and with the prefixes actually
>> doing something correctly emulated.
> 
> Ok, I will focus on hvm_emulate_one.

Just to avoid any misunderstanding - it's another clone of it that
you'll want to create, with presumably only .insn_fetch, .read_io, and
.write_io implemented (and maybe a few others stubbed out, where
the core emulator assumes they're non-NULL).

>>> So what is the procedure to getting "Intel confirms the necessary hardware
>>> behaviour as being architectural"?
>> 
>> There's no procedure. Ask them explicitly (i.e. perhaps outside of
>> this thread, where the question may end up being well hidden from
>> their eyes), and then ping them until they give you a statement one
>> way or another.
>> 
> 
> I am assuming that:
> 
> INTEL(R) VT FOR X86 (VT-X)
> M:      Jun Nakajima <jun.nakajima@intel.com>
> M:      Eddie Dong <eddie.dong@intel.com>
> M:      Kevin Tian <kevin.tian@intel.com>
> 
> Is to correct list of people to ask.

Yes, perhaps with Don Dugger CC-ed.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-27  7:58                   ` Jan Beulich
@ 2015-01-28  8:19                     ` Jan Beulich
  2015-01-28 22:47                       ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-01-28  8:19 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Razvan Cojocaru, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Eddie Dong,
	Suravee Suthikulpanit, Tamas K Lengyel, Boris Ostrovsky

>>> On 27.01.15 at 08:58, <JBeulich@suse.com> wrote:
>>>> On 26.01.15 at 21:19, <dslutz@verizon.com> wrote:
>> On 01/26/15 11:46, Jan Beulich wrote:
>>> As stated before - if feasible, 8 would seem the best option. The
>>> second best one would be to support all four I/O insns (assuming
>>> VMware supports all of them too) with any legal (even if pointless
>>> or redundant) prefix combination, and with the prefixes actually
>>> doing something correctly emulated.
>> 
>> Ok, I will focus on hvm_emulate_one.
> 
> Just to avoid any misunderstanding - it's another clone of it that
> you'll want to create, with presumably only .insn_fetch, .read_io, and
> .write_io implemented (and maybe a few others stubbed out, where
> the core emulator assumes they're non-NULL).

And I just recalled that I think in mem-event context it was already
suggested to perhaps separate instruction decoding from emulation,
which might be a mode also suitable for your needs. Maybe get in
touch with Razvan or Tamas or whoever it was, who might already
be working on that.

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-28  8:19                     ` Jan Beulich
@ 2015-01-28 22:47                       ` Don Slutz
  2015-01-29  0:32                         ` Don Slutz
  0 siblings, 1 reply; 37+ messages in thread
From: Don Slutz @ 2015-01-28 22:47 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Razvan Cojocaru, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Eddie Dong,
	Suravee Suthikulpanit, Tamas K Lengyel, Boris Ostrovsky

On 01/28/15 03:19, Jan Beulich wrote:
>>>> On 27.01.15 at 08:58, <JBeulich@suse.com> wrote:
>>>>> On 26.01.15 at 21:19, <dslutz@verizon.com> wrote:
>>> On 01/26/15 11:46, Jan Beulich wrote:
>>>> As stated before - if feasible, 8 would seem the best option. The
>>>> second best one would be to support all four I/O insns (assuming
>>>> VMware supports all of them too) with any legal (even if pointless
>>>> or redundant) prefix combination, and with the prefixes actually
>>>> doing something correctly emulated.
>>>
>>> Ok, I will focus on hvm_emulate_one.
>>
>> Just to avoid any misunderstanding - it's another clone of it that
>> you'll want to create, with presumably only .insn_fetch, .read_io, and
>> .write_io implemented (and maybe a few others stubbed out, where
>> the core emulator assumes they're non-NULL).
> 
> And I just recalled that I think in mem-event context it was already
> suggested to perhaps separate instruction decoding from emulation,
> which might be a mode also suitable for your needs. Maybe get in
> touch with Razvan or Tamas or whoever it was, who might already
> be working on that.
>

You mean like hvm_emulate_one_no_write() (which is already
there)?

It looks to me to be a starting place of how to do this.
So I currently do not plan on seeking help here.


The delay is not in coding up this, but is that QEMU master (and now
xenbits's qemu staging) do not work with my changes and so far I am
unable to link why this is the case.  I am adding a new hvm param
as part of getting vmport requests to QEMU (HVM_PARAM_VMPORT_REGS_PFN)
which changes QEMU to call xc_map_foreign_range() 1 more time.  However
what is failing is hvmloader's pci setup and scan (~70 ioreqs work and
the next one hangs because it is sent to the default QEMU which does not
"exist" because of the patch in QEMU:


commit 7665d6ba98e20fb05c420de947c1750fd47e5c07
Author: Paul Durrant <paul.durrant@citrix.com>
Date:   Tue Jan 20 11:06:19 2015 +0000

    Xen: Use the ioreq-server API when available

    The ioreq-server API added to Xen 4.5 offers better security than
    the existing Xen/QEMU interface because the shared pages that are
    used to pass emulation request/results back and forth are removed
    from the guest's memory space before any requests are serviced.
    This prevents the guest from mapping these pages (they are in a
    well known location) and attempting to attack QEMU by synthesizing
    its own request structures. Hence, this patch modifies configure
    to detect whether the API is available, and adds the necessary
    code to use the API if it is.

    upstream-commit-id: 3996e85c1822e05c50250f8d2d1e57b6bea1229d

    Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>



Since I had the code for handing off vmport requests to QEMU I was
planning on adding those patches to this patch set.  Now that I have hit
a road block, the best I can do is put out a next RFC version of
these patches with the issue listed.

So I will switch to spending most of my time on reworking (like a new
hvm_emulate_one_vmport() (or hvm_emulate_one_gp()).


   -Don Slutz

> Jan
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH for-4.5 v8 4/7] xen: Add vmware_port support
  2015-01-28 22:47                       ` Don Slutz
@ 2015-01-29  0:32                         ` Don Slutz
  0 siblings, 0 replies; 37+ messages in thread
From: Don Slutz @ 2015-01-29  0:32 UTC (permalink / raw)
  To: Don Slutz, Jan Beulich
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Razvan Cojocaru, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Eddie Dong,
	Suravee Suthikulpanit, Tamas K Lengyel, Boris Ostrovsky

On 01/28/15 17:47, Don Slutz wrote:
> On 01/28/15 03:19, Jan Beulich wrote:
>>>>> On 27.01.15 at 08:58, <JBeulich@suse.com> wrote:
>>>>>> On 26.01.15 at 21:19, <dslutz@verizon.com> wrote:
>>>> On 01/26/15 11:46, Jan Beulich wrote:



> The delay is not in coding up this, but is that QEMU master (and now
> xenbits's qemu staging) do not work with my changes and so far I am
> unable to link why this is the case.  I am adding a new hvm param
> as part of getting vmport requests to QEMU (HVM_PARAM_VMPORT_REGS_PFN)
> which changes QEMU to call xc_map_foreign_range() 1 more time.  However
> what is failing is hvmloader's pci setup and scan (~70 ioreqs work and
> the next one hangs because it is sent to the default QEMU which does not
> "exist" because of the patch in QEMU:
>


I have found the link.  The following will reproduce my issue:

1) xl create -p <config>
2) read one of HVM_PARAM_IOREQ_PFN, HVM_PARAM_BUFIOREQ_PFN, or
   HVM_PARAM_BUFIOREQ_EVTCHN
3) xl unpause new guest

The guest will hang in hvmloader.

More in thread:

Subject: [Qemu-devel] [PATCH v5 2/2] Xen: Use the ioreq-server API when
	available
Message-ID: <1417776605-36309-3-git-send-email-paul.durrant@citrix.com>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1417776605-36309-1-git-send-email-paul.durrant@citrix.com>


    -Don Slutz

P.S. Can post info to xen-devel also if needed.


> 
> commit 7665d6ba98e20fb05c420de947c1750fd47e5c07
> Author: Paul Durrant <paul.durrant@citrix.com>
> Date:   Tue Jan 20 11:06:19 2015 +0000
> 
>     Xen: Use the ioreq-server API when available
> 
>     The ioreq-server API added to Xen 4.5 offers better security than
>     the existing Xen/QEMU interface because the shared pages that are
>     used to pass emulation request/results back and forth are removed
>     from the guest's memory space before any requests are serviced.
>     This prevents the guest from mapping these pages (they are in a
>     well known location) and attempting to attack QEMU by synthesizing
>     its own request structures. Hence, this patch modifies configure
>     to detect whether the API is available, and adds the necessary
>     code to use the API if it is.
> 
>     upstream-commit-id: 3996e85c1822e05c50250f8d2d1e57b6bea1229d
> 
>     Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>     Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>     Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> 
> 
> Since I had the code for handing off vmport requests to QEMU I was
> planning on adding those patches to this patch set.  Now that I have hit
> a road block, the best I can do is put out a next RFC version of
> these patches with the issue listed.
> 
> So I will switch to spending most of my time on reworking (like a new
> hvm_emulate_one_vmport() (or hvm_emulate_one_gp()).
> 
> 
>    -Don Slutz
> 
>> Jan
>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v8 4/7] xen: Add vmware_port support
  2015-01-26 15:58             ` Don Slutz
  2015-01-26 16:46               ` Jan Beulich
@ 2015-02-10 19:30               ` Don Slutz
  2015-02-11  7:56                 ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Don Slutz @ 2015-02-10 19:30 UTC (permalink / raw)
  To: Don Slutz, Jan Beulich
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Suravee Suthikulpanit, Boris Ostrovsky

On 01/26/15 10:58, Don Slutz wrote:
> On 01/22/15 03:32, Jan Beulich wrote:
>>>>> On 21.01.15 at 18:52, <dslutz@verizon.com> wrote:
>>> On 01/16/15 05:09, Jan Beulich wrote:
>>>>>>> On 03.10.14 at 00:40, <dslutz@verizon.com> wrote:
>>>>> This is a new domain_create() flag, DOMCRF_vmware_port.  It is
>>>>> passed to domctl as XEN_DOMCTL_CDF_vmware_port.
>>>> Can you explain why a HVM param isn't suitable here?
>>>>
>>> The issue is that you need this flag during construct_vmcb() and
>>> construct_vmcs().  While Intel has vmx_update_exception_bitmap()
>>> AMD does not.  So when HVM param's are setup and/or changed there
>>> currently is no way to adjust AMD's exception bitmap.
>>>
>>> So this is the simpler way.
>> But the less desirable one from a design/consistency perspective.
>> Unless other maintainers disagree, I'd like to see this changed.
> 
> Ok, but will wait some time to see if "Unless other maintainers disagree"
> 

While coding this is up I have hit issues that I need input on:

As a HVM_PARAM_ item, I would assume I should be following
what HVM_PARAM_VIRIDIAN does.  It has this comment:

            case HVM_PARAM_VIRIDIAN:
                /* This should only ever be set once by the tools and
read by the guest. */

Which is almost true.  However the code allows you to change from 0 to
non-zero any time in the life of the DomU.  I am assuming that this is
why xc_domain_save() and xc_domain_restore() save and restore this
HVM_PARAM_ item.

With the enable of vmware_port the same way, I feel it would be a bug
to allow the enable after "create" to not also adjust QEMU.  Currently
there is no way for the hypervisor to tell QEMU to enable vmware_port
handling.  So to avoid adding this code to xen and QEMU, it looks to
me that adding code to make this a true write only 1 time would be
needed so that you cannot use the hyper call to change later.

So, should I extend this change to cover other HVM_PARAM_?

Is all this additional code (xc_domain_save(), xc_domain_restore(),
write only 1 time) still better then a domain_create() flag?

   -Don Slutz

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v8 4/7] xen: Add vmware_port support
  2015-02-10 19:30               ` [PATCH " Don Slutz
@ 2015-02-11  7:56                 ` Jan Beulich
  2015-02-11 17:04                   ` Andrew Cooper
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2015-02-11  7:56 UTC (permalink / raw)
  To: Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	xen-devel, Eddie Dong, Suravee Suthikulpanit, Boris Ostrovsky

>>> On 10.02.15 at 20:30, <dslutz@verizon.com> wrote:
> While coding this is up I have hit issues that I need input on:
> 
> As a HVM_PARAM_ item, I would assume I should be following
> what HVM_PARAM_VIRIDIAN does.  It has this comment:
> 
>             case HVM_PARAM_VIRIDIAN:
>                 /* This should only ever be set once by the tools and
> read by the guest. */
> 
> Which is almost true.  However the code allows you to change from 0 to
> non-zero any time in the life of the DomU.  I am assuming that this is
> why xc_domain_save() and xc_domain_restore() save and restore this
> HVM_PARAM_ item.
> 
> With the enable of vmware_port the same way, I feel it would be a bug
> to allow the enable after "create" to not also adjust QEMU.  Currently
> there is no way for the hypervisor to tell QEMU to enable vmware_port
> handling.  So to avoid adding this code to xen and QEMU, it looks to
> me that adding code to make this a true write only 1 time would be
> needed so that you cannot use the hyper call to change later.
> 
> So, should I extend this change to cover other HVM_PARAM_?
> 
> Is all this additional code (xc_domain_save(), xc_domain_restore(),
> write only 1 time) still better then a domain_create() flag?

I suppose for your case it's indeed the right approach. Which other
params this may be true for as well I can't immediately say, but I'd
certainly like to ask for adjustments to others to be in separate
patches (and perhaps even a separate series), with proper
rationale for each of them. I guess Andrew will have further input
for you on this matter...

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v8 4/7] xen: Add vmware_port support
  2015-02-11  7:56                 ` Jan Beulich
@ 2015-02-11 17:04                   ` Andrew Cooper
  2015-02-17  7:45                     ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Cooper @ 2015-02-11 17:04 UTC (permalink / raw)
  To: Jan Beulich, Don Slutz
  Cc: Jun Nakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, Ian Jackson, Eddie Dong,
	xen-devel, Suravee Suthikulpanit, Boris Ostrovsky

On 11/02/15 07:56, Jan Beulich wrote:
>>>> On 10.02.15 at 20:30, <dslutz@verizon.com> wrote:
>> While coding this is up I have hit issues that I need input on:
>>
>> As a HVM_PARAM_ item, I would assume I should be following
>> what HVM_PARAM_VIRIDIAN does.  It has this comment:
>>
>>             case HVM_PARAM_VIRIDIAN:
>>                 /* This should only ever be set once by the tools and
>> read by the guest. */
>>
>> Which is almost true.  However the code allows you to change from 0 to
>> non-zero any time in the life of the DomU.  I am assuming that this is
>> why xc_domain_save() and xc_domain_restore() save and restore this
>> HVM_PARAM_ item.
>>
>> With the enable of vmware_port the same way, I feel it would be a bug
>> to allow the enable after "create" to not also adjust QEMU.  Currently
>> there is no way for the hypervisor to tell QEMU to enable vmware_port
>> handling.  So to avoid adding this code to xen and QEMU, it looks to
>> me that adding code to make this a true write only 1 time would be
>> needed so that you cannot use the hyper call to change later.
>>
>> So, should I extend this change to cover other HVM_PARAM_?
>>
>> Is all this additional code (xc_domain_save(), xc_domain_restore(),
>> write only 1 time) still better then a domain_create() flag?
> I suppose for your case it's indeed the right approach. Which other
> params this may be true for as well I can't immediately say, but I'd
> certainly like to ask for adjustments to others to be in separate
> patches (and perhaps even a separate series), with proper
> rationale for each of them. I guess Andrew will have further input
> for you on this matter...

My recommendation is still to use a creation flag.  The described
problem is exactly the reason why I dislike the use of hvmparams for
booleans like this which really do need to be consistent for the
lifetime of the guest.

I had hoped to see whether I could fix some of this up as part of the
fixes to guest cpuid handling, but that work is still a while off and
not of practical consideration for the short term.

~Andrew

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v8 4/7] xen: Add vmware_port support
  2015-02-11 17:04                   ` Andrew Cooper
@ 2015-02-17  7:45                     ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2015-02-17  7:45 UTC (permalink / raw)
  To: Andrew Cooper, Don Slutz
  Cc: JunNakajima, Tim Deegan, Kevin Tian, Keir Fraser, Ian Campbell,
	Stefano Stabellini, George Dunlap, IanJackson, Eddie Dong,
	xen-devel, Suravee Suthikulpanit, Boris Ostrovsky

>>> On 11.02.15 at 18:04, <andrew.cooper3@citrix.com> wrote:
> On 11/02/15 07:56, Jan Beulich wrote:
>>>>> On 10.02.15 at 20:30, <dslutz@verizon.com> wrote:
>>> While coding this is up I have hit issues that I need input on:
>>>
>>> As a HVM_PARAM_ item, I would assume I should be following
>>> what HVM_PARAM_VIRIDIAN does.  It has this comment:
>>>
>>>             case HVM_PARAM_VIRIDIAN:
>>>                 /* This should only ever be set once by the tools and
>>> read by the guest. */
>>>
>>> Which is almost true.  However the code allows you to change from 0 to
>>> non-zero any time in the life of the DomU.  I am assuming that this is
>>> why xc_domain_save() and xc_domain_restore() save and restore this
>>> HVM_PARAM_ item.
>>>
>>> With the enable of vmware_port the same way, I feel it would be a bug
>>> to allow the enable after "create" to not also adjust QEMU.  Currently
>>> there is no way for the hypervisor to tell QEMU to enable vmware_port
>>> handling.  So to avoid adding this code to xen and QEMU, it looks to
>>> me that adding code to make this a true write only 1 time would be
>>> needed so that you cannot use the hyper call to change later.
>>>
>>> So, should I extend this change to cover other HVM_PARAM_?
>>>
>>> Is all this additional code (xc_domain_save(), xc_domain_restore(),
>>> write only 1 time) still better then a domain_create() flag?
>> I suppose for your case it's indeed the right approach. Which other
>> params this may be true for as well I can't immediately say, but I'd
>> certainly like to ask for adjustments to others to be in separate
>> patches (and perhaps even a separate series), with proper
>> rationale for each of them. I guess Andrew will have further input
>> for you on this matter...
> 
> My recommendation is still to use a creation flag.  The described
> problem is exactly the reason why I dislike the use of hvmparams for
> booleans like this which really do need to be consistent for the
> lifetime of the guest.

While I can see your point, HVM params are much better scalable
(and more obviously architecture specific) than creation flags...

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-02-17  7:45 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-02 21:30 [PATCH for-4.5 v7 0/7] Xen VMware tools support Don Slutz
2014-10-02 21:30 ` [PATCH for-4.5 v7 1/7] xen: Add support for VMware cpuid leaves Don Slutz
2015-01-15 16:42   ` Jan Beulich
2015-01-15 21:00     ` Don Slutz
2015-01-16  7:57       ` Jan Beulich
2015-01-16 19:21         ` Don Slutz
2014-10-02 21:30 ` [PATCH for-4.5 v7 2/7] tools: Add vmware_hw support Don Slutz
2014-10-02 22:21   ` Andrew Cooper
2014-10-02 22:56     ` [PATCH for-4.5 v8 " Don Slutz
2014-10-02 21:30 ` [PATCH for-4.5 v7 3/7] vmware: Add VMware provided include files Don Slutz
2015-01-15 16:46   ` Jan Beulich
2015-01-15 21:36     ` Don Slutz
2014-10-02 21:30 ` [PATCH for-4.5 v7 4/7] xen: Add vmware_port support Don Slutz
2014-10-02 21:58   ` Don Slutz
2014-10-02 22:40     ` [PATCH for-4.5 v8 " Don Slutz
2015-01-16 10:09       ` Jan Beulich
2015-01-21 17:52         ` Don Slutz
2015-01-22  8:32           ` Jan Beulich
2015-01-26 15:58             ` Don Slutz
2015-01-26 16:46               ` Jan Beulich
2015-01-26 20:19                 ` Don Slutz
2015-01-27  7:58                   ` Jan Beulich
2015-01-28  8:19                     ` Jan Beulich
2015-01-28 22:47                       ` Don Slutz
2015-01-29  0:32                         ` Don Slutz
2015-02-10 19:30               ` [PATCH " Don Slutz
2015-02-11  7:56                 ` Jan Beulich
2015-02-11 17:04                   ` Andrew Cooper
2015-02-17  7:45                     ` Jan Beulich
2014-10-02 21:30 ` [PATCH for-4.5 v7 5/7] tools: " Don Slutz
2014-10-02 21:30 ` [PATCH for-4.5 v7 6/7] Add xentrace to vmware_port Don Slutz
2014-10-02 21:30 ` [OPTIONAL][PATCH for-4.5 v7 7/7] Add xen-hvm-param Don Slutz
2014-10-16  8:12 ` [PATCH for-4.5 v7 0/7] Xen VMware tools support Jan Beulich
2014-10-16 12:10   ` Don Slutz
2014-10-16 12:17     ` Ian Jackson
2014-10-16 12:22     ` Jan Beulich
2014-10-16 12:58       ` Don Slutz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.