Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.

From: George Dunlap <george.dunlap@citrix.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Paul Durrant <Paul.Durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	Jan Beulich <JBeulich@suse.com>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Thu, 22 Sep 2016 12:32:10 +0100	[thread overview]
Message-ID: <CAFLBxZa48fX5Bt39Xnfyerchdb18E4Fis=qDv6K8-sXt+k+YRQ@mail.gmail.com> (raw)
In-Reply-To: <57E3A06E.1020000@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 4772 bytes --]

On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
>
>
> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>
>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>> wrote:
>>>>
>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>
>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>> only support the emulation of write operations. And it can be
>>>>> further extended to support the emulation of read ones if an
>>>>> ioreq server has such requirement in the future.
>>>>>
>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>> owner's and flags parameter set to 0.
>>>>>
>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>> are only supported for HVMs with HAP enabled.
>>>>>
>>>>> Also note that only after one ioreq server claims its ownership
>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>> be allowed.
>>>>>
>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>> ---
>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>
>>>>> changes in v6:
>>>>>     - Clarify logic in hvmemul_do_io().
>>>>>     - Use recursive lock for ioreq server lock.
>>>>>     - Remove debug print when mapping ioreq server.
>>>>>     - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>     - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>     - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>       to/from HVMMEM_ram_rw are permitted.
>>>>>     - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
>>>>>       to avoid the race condition when a vm exit happens on a write-
>>>>>       protected page, just to find the ioreq server has been unmapped
>>>>>       already.
>>>>>     - Introduce a seperate patch to delay the release of p2m
>>>>>       lock to avoid the race condition.
>>>>>     - Introduce a seperate patch to handle the read-modify-write
>>>>>       operations on a write protected page.
>>>>>
>>>> Why do we need to do this?  Won't the default case just DTRT if it finds
>>>> that the ioreq server has been unmapped?
>>>
>>>
>>> Well, patch 4 will either mark the remaining p2m_ioreq_server entries as
>>> "recal" or
>>> reset to p2m_ram_rw directly. So my understanding is that we do not wish
>>> to
>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>> server
>>> is unmapped.
>>> Yet without this domain_pause/unpause() pair, VM accesses may trigger an
>>> ept
>>> violation
>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>> find
>>> the ioreq
>>> server is NULL. Then we would have to provide handlers which just do the
>>> copy to/from
>>> actions for the VM. This seems awkward to me.
>>
>> So the race you're worried about is this:
>>
>> 1. Guest fault happens
>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>> 3. guest  finds no ioreq server present
>>
>> I think in that case the easiest thing to do would be to simply assume
>> there was a race and re-execute the instruction.  Is that not possible
>> for some reason?
>>
>>   -George
>
>
> Thanks for your reply, George. :)
> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
> condition:
>
> 1>  Like my previous explanation, in the read-modify-write scenario, the
> ioreq server will
> be NULL for the read emulation. But in such case, hypervisor will not
> discard this trap, instead
> it is supposed to do the copy work for the read access. So it would be
> difficult for hypervisor
> to decide if the ioreq server was detached due to a race condition, or if
> the ioreq server should
> be a NULL because we are emulating a read operation first for a
> read-modify-write instruction.

Wouldn't a patch like the attached work (applied on top of the whole series)?

 -George

[-- Attachment #2: 0001-x86-hvm-Handle-both-ioreq-detach-races-and-read-modi.patch --]
[-- Type: text/x-diff, Size: 4162 bytes --]

From 4e4b14641cd94c0c9fe64606b329cdbbf9c6a92b Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Thu, 22 Sep 2016 12:30:26 +0100
Subject: [PATCH] x86/hvm: Handle both ioreq detach races and read-modify-write
 instructions

Compile-tested only.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
 xen/arch/x86/hvm/emulate.c | 68 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 56 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 564c117..120ef86 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -214,40 +214,84 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
     {
+        /* 
+         * Xen isn't emulating the instruction internally, so see if
+         * there's an ioreq server that can handle it.  Rules:
+         *
+         * - PIO and "normal" mmio run through
+         * hvm_select_ioreq_server() to choose the ioreq server by
+         * range.  If no server is found, the access is ignored.
+         *
+         * - p2m_ioreq_server accesses are handled by the current
+         * ioreq_server for the domain, but there are some corner
+         * cases:
+         *
+         *   - If the domain ioreq_server is NULL, assume this is a
+         *   race between unlooking the ioreq server and
+         *   p2m_type_change and re-try the instruction.
+         *
+         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
+         *   like a normal PIO or MMIO that doesn't have an ioreq
+         *   server (i.e., by ignoring it).
+         *
+         *   - If the accesss is a read, this must be part of a
+         *   read-modify-write instruction; emulate the read so that we have
+         *   it
+         */
+
         struct hvm_ioreq_server *s = NULL;
         p2m_type_t p2mt = p2m_invalid;
-
+        
         if ( is_mmio )
         {
             unsigned long gmfn = paddr_to_pfn(addr);
 
             (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
 
-            if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE )
+            if ( p2mt == p2m_ioreq_server )
             {
                 unsigned int flags;
 
                 s = p2m_get_ioreq_server(currd, &flags);
+
+                /* 
+                 * If p2mt is ioreq_server but ioreq_server is NULL,
+                 * we probably lost a race with p2m_type_change; just
+                 * retry the access.
+                 */
+                if ( s == NULL )
+                {
+                    rc = X86EMUL_RETRY;
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                /* 
+                 * This is part of a read-modify-write instruction.
+                 * Emulate the read part so we have the value cached.
+                 */
+                if ( dir == IOREQ_READ )
+                {
+                    rc = hvm_process_io_intercept(&mem_handler, &p);
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                
                 if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) )
+                {
                     s = NULL;
+                }
             }
         }
 
         if ( !s && p2mt != p2m_ioreq_server )
             s = hvm_select_ioreq_server(currd, &p);
 
-        /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
         {
-            /*
-             * For p2m_ioreq_server pages accessed with read-modify-write
-             * instructions, we provide a read handler to copy the data to
-             * the buffer.
-             */
-            if ( p2mt == p2m_ioreq_server )
-                rc = hvm_process_io_intercept(&mem_handler, &p);
-            else
-                rc = hvm_process_io_intercept(&null_handler, &p);
+            /* If there is no suitable backing DM, just ignore accesses */
+            rc = hvm_process_io_intercept(&null_handler, &p);
             vio->io_req.state = STATE_IOREQ_NONE;
         }
         else
-- 
2.1.4


[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel