From mboxrd@z Thu Jan  1 00:00:00 1970
From: Don Slutz <don.slutz@gmail.com>
Subject: Re: [PATCH v13 7/8] Add IOREQ_TYPE_VMWARE_PORT
Date: Sun, 10 Jan 2016 14:42:39 -0500
Message-ID: <5692B42F.1030201@Gmail.com>
References: <1448747105-19966-1-git-send-email-Don.Slutz@Gmail.com>
	<1448747105-19966-8-git-send-email-Don.Slutz@Gmail.com>
	<5678164C02000078000C1E86@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <5678164C02000078000C1E86@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>, Tim Deegan <tim@xen.org>, Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Don Slutz <dslutz@verizon.com>, xen-devel@lists.xen.org, Eddie Dong <eddie.dong@intel.com>, Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>, Keir Fraser <keir@xen.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 12/21/15 09:10, Jan Beulich wrote:
>>>> On 28.11.15 at 22:45, <don.slutz@gmail.com> wrote:
>> @@ -133,7 +134,7 @@ static int hvmemul_do_io(
>>          p = vio->io_req;
>>  
>>          /* Verify the emulation request has been correctly re-issued */
>> -        if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
>> +        if ( (p.type != (is_mmio ? IOREQ_TYPE_COPY : is_vmware ? IOREQ_TYPE_VMWARE_PORT : IOREQ_TYPE_PIO)) ||
> 
> Long line needs breaking up.
> 

Already adjusted to:

-        if ( (p.type != is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||

+        if ( (p.type != calc_type) ||


>> @@ -167,26 +168,65 @@ static int hvmemul_do_io(
>>          vio->io_req.state = STATE_IOREQ_NONE;
>>          break;
>>      case X86EMUL_UNHANDLEABLE:
>> -    {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
...
>> +                if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>> +            }
>>          }
>>          break;
> 
> Especially if there is no common code to be broken out at the
> beginning or end of these new if/else branches, then please avoid
> re-indenting the entire existing code block (perhaps by making your
> new code a if ( unlikely(is_vmware) ) ... else if ( ... ) prefix), to aid
> reviewing.
> 

The suggested if above gives:

--------------------------------------------------------------------------
@@ -167,26 +175,64 @@ static int hvmemul_do_io(
         vio->io_req.state = STATE_IOREQ_NONE;
         break;
     case X86EMUL_UNHANDLEABLE:
-    {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
-
-        /* If there is no suitable backing DM, just ignore accesses */
-        if ( !s )
+        if ( unlikely(is_vmware) )
         {
-            rc = hvm_process_io_intercept(&null_handler, &p);
-            vio->io_req.state = STATE_IOREQ_NONE;
+            struct hvm_ioreq_server *s;
+            vmware_regs_t *vr;
+
+            BUILD_BUG_ON(sizeof(ioreq_t) < sizeof(vmware_regs_t));
+
+            p.type = vio->io_req.type = IOREQ_TYPE_VMWARE_PORT;
+            s = hvm_select_ioreq_server(curr->domain, &p);
+            vr = get_vmport_regs_any(s, curr);
+
+            /*
+             * If there is no suitable backing DM, just ignore
accesses.  If
+             * we do not have access to registers to pass to QEMU, just
+             * ignore access.
+             */
+            if ( !s || !vr )
+            {
+                rc = hvm_process_io_intercept(&null_handler, &p);
+                vio->io_req.state = STATE_IOREQ_NONE;
+            }
+            else
+            {
+                struct cpu_user_regs *regs = guest_cpu_user_regs();
+
+                p.data = regs->rax;
+                vr->ebx = regs->_ebx;
+                vr->ecx = regs->_ecx;
+                vr->edx = regs->_edx;
+                vr->esi = regs->_esi;
+                vr->edi = regs->_edi;
+
+                rc = hvm_send_ioreq(s, &p, 0);
+                if ( rc != X86EMUL_RETRY ||
curr->domain->is_shutting_down )
+                    vio->io_req.state = STATE_IOREQ_NONE;
+            }
         }
         else
         {
-            rc = hvm_send_ioreq(s, &p, 0);
-            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
+            struct hvm_ioreq_server *s =
+                hvm_select_ioreq_server(curr->domain, &p);
+
+            /* If there is no suitable backing DM, just ignore accesses */
+            if ( !s )
+            {
+                rc = hvm_process_io_intercept(&null_handler, &p);
                 vio->io_req.state = STATE_IOREQ_NONE;
-            else if ( data_is_addr )
-                rc = X86EMUL_OKAY;
+            }
+            else
+            {
+                rc = hvm_send_ioreq(s, &p, 0);
+                if ( rc != X86EMUL_RETRY ||
curr->domain->is_shutting_down )
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                else if ( data_is_addr )
+                    rc = X86EMUL_OKAY;
+            }
         }
         break;
-    }
     default:
         BUG();
     }
-------------------------------------------------------------------------

Which I feel is just as hard to review from the diff.  The only way I
know of to reduce the diff is:

--------------------------------------------------------------------------
@@ -167,6 +175,44 @@ static int hvmemul_do_io(
         vio->io_req.state = STATE_IOREQ_NONE;
         break;
     case X86EMUL_UNHANDLEABLE:
+        if ( unlikely(is_vmware) )
+        {
+            struct hvm_ioreq_server *s;
+            vmware_regs_t *vr;
+
+            BUILD_BUG_ON(sizeof(ioreq_t) < sizeof(vmware_regs_t));
+
+            p.type = vio->io_req.type = IOREQ_TYPE_VMWARE_PORT;
+            s = hvm_select_ioreq_server(curr->domain, &p);
+            vr = get_vmport_regs_any(s, curr);
+
+            /*
+             * If there is no suitable backing DM, just ignore
accesses.  If
+             * we do not have access to registers to pass to QEMU, just
+             * ignore access.
+             */
+            if ( !s || !vr )
+            {
+                rc = hvm_process_io_intercept(&null_handler, &p);
+                vio->io_req.state = STATE_IOREQ_NONE;
+            }
+            else
+            {
+                struct cpu_user_regs *regs = guest_cpu_user_regs();
+
+                p.data = regs->rax;
+                vr->ebx = regs->_ebx;
+                vr->ecx = regs->_ecx;
+                vr->edx = regs->_edx;
+                vr->esi = regs->_esi;
+                vr->edi = regs->_edi;
+
+                rc = hvm_send_ioreq(s, &p, 0);
+                if ( rc != X86EMUL_RETRY ||
curr->domain->is_shutting_down )
+                    vio->io_req.state = STATE_IOREQ_NONE;
+            }
+            break;
+        }
     {
         struct hvm_ioreq_server *s =
             hvm_select_ioreq_server(curr->domain, &p);
-------------------------------------------------------------------------

Which is clear in the diff but looks to me as a strange construct.

However I am happy to go any of the 3 ways.


>> @@ -536,6 +575,21 @@ void hvm_do_resume(struct vcpu *v)
>>          handle_mmio();
>>          break;
>>      case HVMIO_pio_completion:
>> +        if ( vio->io_req.type == IOREQ_TYPE_VMWARE_PORT ) {
>> +            vmware_regs_t *vr = get_vmport_regs_any(NULL, v);
>> +
>> +            if ( vr )
>> +            {
>> +                struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +
>> +                /* Only change the 32bit part of the register. */
>> +                regs->_ebx = vr->ebx;
>> +                regs->_ecx = vr->ecx;
>> +                regs->_edx = vr->edx;
>> +                regs->_esi = vr->esi;
>> +                regs->_edi = vr->edi;
>> +            }
> 
> Just like in one of the other patches the comment need extending to
> say _why_ you only change the 32-bit parts (for future readers to
> not consider this a bug).
> 

Will change to this:


+                /* The code in QEMU that uses these registers,
+                 * vmport.c and vmmouse.c, only uses only the 32bit
+                 * part of the register.  This is how VMware defined
+                 * the use of these registers.
+                 */
+                regs->_ebx = vr->ebx;


>> @@ -618,22 +672,56 @@ static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
>>          set_bit(i, &d->arch.hvm_domain.ioreq_gmfn.mask);
>>  }
>>  
>> -static void hvm_unmap_ioreq_page(struct hvm_ioreq_server *s, bool_t buf)
>> +typedef enum {
>> +    IOREQ_PAGE_TYPE_IOREQ,
>> +    IOREQ_PAGE_TYPE_BUFIOREQ,
>> +    IOREQ_PAGE_TYPE_VMPORT,
> 
> Lower case please (and this being a local enum the prefix probably
> could be shortened).
> 

Ok, how about ioreq_pt_?

>> +} ioreq_page_type_t;
>> +
>> +static void hvm_unmap_ioreq_page(struct hvm_ioreq_server *s, ioreq_page_type_t buf)
> 
> The parameter should no longer be named "buf".

pt or page_type?

> 
>> @@ -849,19 +937,32 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>>  
>>  static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
>>                                        unsigned long ioreq_pfn,
>> -                                      unsigned long bufioreq_pfn)
>> +                                      unsigned long bufioreq_pfn,
>> +                                      unsigned long vmport_ioreq_pfn)
>>  {
>>      int rc;
>>  
>> -    rc = hvm_map_ioreq_page(s, 0, ioreq_pfn);
>> +    rc = hvm_map_ioreq_page(s, IOREQ_PAGE_TYPE_IOREQ, ioreq_pfn);
>>      if ( rc )
>>          return rc;
>>  
>>      if ( bufioreq_pfn != INVALID_GFN )
>> -        rc = hvm_map_ioreq_page(s, 1, bufioreq_pfn);
>> +        rc = hvm_map_ioreq_page(s, IOREQ_PAGE_TYPE_BUFIOREQ, bufioreq_pfn);
>>  
>>      if ( rc )
>> -        hvm_unmap_ioreq_page(s, 0);
>> +    {
>> +        hvm_unmap_ioreq_page(s, IOREQ_PAGE_TYPE_IOREQ);
>> +        return rc;
>> +    }
>> +
>> +    rc = hvm_map_ioreq_page(s, IOREQ_PAGE_TYPE_VMPORT, vmport_ioreq_pfn);
> 
> I think Paul has already pointed out that this shouldn't be
> unconditional. And that not just nor for every server, but also
> because the caller doesn't seem to guarantee validity of the passed
> in PFN (and its implicit initializer value of zero is certainly not a valid
> one to use here).
> 

Paul did, and the code is currently:

+    if ( s->domain->arch.hvm_domain.is_vmware_port_enabled &&
+         vmport_ioreq_pfn != INVALID_GFN )
+    {
+        rc = hvm_map_ioreq_page(s, IOREQ_PAGE_TYPE_VMPORT,
vmport_ioreq_pfn);

The only way that vmport_ioreq_pfn could be zero is a custom version of
alloc_magic_pages_hvm() or some other way of not using libxc.

Since none of the rest of the uses of the HVM_PARAM's special pages
checked for zero, I did not add a check here.

However if you want a check for zero, I am happy to add one.


>> @@ -948,12 +1056,38 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>>      for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
>>      {
>>          char *name;
>> +        char *type_name = NULL;
>> +        unsigned int limit;
>>  
>> -        rc = asprintf(&name, "ioreq_server %d %s", s->id,
>> -                      (i == HVMOP_IO_RANGE_PORT) ? "port" :
>> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
>> -                      (i == HVMOP_IO_RANGE_PCI) ? "pci" :
>> -                      "");
>> +        switch ( i )
>> +        {
>> +        case HVMOP_IO_RANGE_PORT:
>> +            type_name = "port";
>> +            limit = MAX_NR_IO_RANGES;
>> +            break;
>> +        case HVMOP_IO_RANGE_MEMORY:
>> +            type_name = "memory";
>> +            limit = MAX_NR_IO_RANGES;
>> +            break;
>> +        case HVMOP_IO_RANGE_PCI:
>> +            type_name = "pci";
>> +            limit = MAX_NR_IO_RANGES;
>> +            break;
>> +        case HVMOP_IO_RANGE_VMWARE_PORT:
>> +            type_name = "VMware port";
>> +            limit = 1;
>> +            break;
> 
> Do you really need to set up a (dummy) range set for this?

Yes, this is to allow an ioreq server to at any time enable or disable
sending of an ioreq that is of the type IOREQ_TYPE_VMWARE_PORT to it.


> 
>> +        case HVMOP_IO_RANGE_TIMEOFFSET:
>> +            type_name = "timeoffset";
>> +            limit = 1;
>> +            break;
> 
> This case didn't exist before, and is unrelated to your change.
> 

That is true.  However Paul asked me to add it and since it was a very
small addition (these lines) I saw no reason to add a seperate patch.
It would either have to follow this patch or include the expansion of
s->range and to include missing range handling.

>> @@ -2558,9 +2701,6 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>>      if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
>>          return NULL;
>>  
>> -    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
>> -        return d->arch.hvm_domain.default_ioreq_server;
> 
> I'd prefer if this shortcut could stay.

I do not understand why.  Currently this is the same as

if ( p->type == IOREQ_TYPE_TIMEOFFSET )
  return d->arch.hvm_domain.default_ioreq_server;

As so to keep it for VMware I would add the case of
IOREQ_TYPE_VMWARE_PORT and to add the support for IOREQ_TYPE_TIMEOFFSET
that Paul asked for it would also be added.  Turning this into a long line.

As far as I can see this is the same as a default case later, which is
not needed because you end up with the default_ioreq_server.

The only benifit I can see is to save instructions in the least common
case.  It adds code in most common case, which is backwards from what I
expect and does make the code harder to follow.


> 
>> --- a/xen/arch/x86/hvm/vmware/vmport.c
>> +++ b/xen/arch/x86/hvm/vmware/vmport.c
>> @@ -137,6 +137,20 @@ void vmport_register(struct domain *d)
>>      register_portio_handler(d, BDOOR_PORT, 4, vmport_ioport);
>>  }
>>  
>> +int vmport_check_port(unsigned int port, unsigned int bytes)
> 
> bool_t
> 

Sure.

>> +{
>> +    struct vcpu *curr = current;
> 
> You don't really need this local variable.

ok, and use d instead of currd?

> 
>> +    struct domain *currd = curr->domain;
>> +
>> +    if ( port + bytes > BDOOR_PORT && port < BDOOR_PORT + 4 &&
> 
> This will match for e.g. port == BDOOR_PORT + 1 and bytes == 4,
> which I don't think is correct.
> 

My testing on VMware says that this is what happens there. I.E. the
VMware "device" respondes to 4 ports.  Not that you can cleanly use the
other 3 since EAX is not fully available.


>> +         is_hvm_domain(currd) &&
>> +         currd->arch.hvm_domain.is_vmware_port_enabled )
>> +    {
>> +        return 1;
>> +    }
>> +    return 0;
>> +}
> 
> All of this could be a single return statement.
> 

Will convert to a single return.

>> @@ -66,11 +69,25 @@ struct ioreq {
>>  };
>>  typedef struct ioreq ioreq_t;
>>  
>> +struct vmware_regs {
>> +    uint32_t esi;
>> +    uint32_t edi;
>> +    uint32_t ebx;
>> +    uint32_t ecx;
>> +    uint32_t edx;
>> +};
>> +typedef struct vmware_regs vmware_regs_t;
> 
> I doubt you need the typedef (considering the use below), and I
> don't think having something prefixed vmware_ in the Xen public
> headers is a good idea.
> 

QEMU has used this typedef since QEMU 2.2.0 released on Dec 9, 2014.

Went in by pull request:

http://lists.xenproject.org/archives/html/xen-devel/2014-10/msg03705.html


The thread started at

http://lists.xenproject.org/archives/html/xen-devel/2014-09/msg04346.html

During this time was when the name could be easily changed.  I know of
no simple way to do so now.  Any change like this needs to start in QEMU
and be accepted up-stream and in Xen's version of up-stream QEMU. After
that Xen is changed.


The thread

http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg00476.html

looks at this being more general and just VMware.


> Also throughout the series I didn't find any code addition to
> guarantee (perhaps at build time) that BDOOR_PORT doesn't
> collide with any other use ports (statically assigned ones as well
> as the range used for dynamic assignment to PCI devices).

Since this is optional code, I am having an issue understanding this
statement. Xen will not do anything with BDOOR_PORT when vmware_port=0.

I do not know how at build time to check for run time optional items.

What ports are in use include what QEMU has.

My understanding was that configuration issues like overlapping or
multiple uses of I/O port is the users issue not Xen.

I do not see any code in xen that checks for this for other ports. So
it is not clear what the set of port in xen needs to be checked.  The
default range used for dynamic assignment to PCI devices is 0xc000 -
0xffff, which does not mean that the guest is prevented from adjusting
pci bridges so that BDOOR_PORT is an overlap.  But that is true of a lot
of the rest of the ports in Xen.

register_portio_handler() could be changed to check at run time for Xen.

At this time I have no plans to add code related to this.

    -Don Slutz

> 
> Jan
>