From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <JBeulich@suse.com>
Subject: [PATCH 1/3] x86/HVM: fix forwarding of internally
 cached requests
Date: Thu, 24 Mar 2016 05:28:56 -0600
Message-ID: <56F3DD8802000078000E007F@prv-mh.provo.novell.com>
References: <56F3DBC502000078000E0060@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="=__Part6A5DC268.2__="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
 by lists.xenproject.org with esmtp (Exim 4.84_2)
 (envelope-from <JBeulich@suse.com>) id 1aj3Rp-0005k0-FK
 for xen-devel@lists.xenproject.org; Thu, 24 Mar 2016 11:29:05 +0000
In-Reply-To: <56F3DBC502000078000E0060@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
To: xen-devel <xen-devel@lists.xenproject.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Keir Fraser <keir@xen.org>, Paul Durrant <paul.durrant@citrix.com>, Chang Jianzhong <changjzh@gmail.com>
List-Id: xen-devel@lists.xenproject.org

This is a MIME message. If you are reading this text, you may want to 
consider changing to a mail reader or gateway that understands how to 
properly handle MIME multipart messages.

--=__Part6A5DC268.2__=
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Forwarding entire batches to the device model when an individual
iteration of them got rejected by internal device emulation handlers
with X86EMUL_UNHANDLEABLE is wrong: The device model would then handle
all iterations, without the internal handler getting to see any past
the one it returned failure for. This causes misbehavior in at least
the MSI-X and VGA code, which want to see all such requests for
internal tracking/caching purposes. But note that this does not apply
to buffered I/O requests.

This in turn means that the condition in hvm_process_io_intercept() of
when to crash the domain was wrong: Since X86EMUL_UNHANDLEABLE can
validly be returned by the individual device handlers, we mustn't
blindly crash the domain if such occurs on other than the initial
iteration. Instead we need to distinguish hvm_copy_*_guest_phys()
failures from device specific ones, and then the former need to always
be fatal to the domain (i.e. also on the first iteration), since
otherwise we again would end up forwarding a request to qemu which the
internal handler didn't get to see.

Also commit 4faffc41d ("x86/hvm: limit reps to avoid the need to handle
retry") went too far in removing code from hvm_process_io_intercept():
When there were successfully handled iterations, the function should
continue to return success with a clipped repeat count.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Chang Jianzhong <changjzh@gmail.com>
---
I assume this also addresses the issue which
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg03189.html=
=20
attempted to deal with in a not really acceptable way.

--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -95,7 +95,7 @@ static const struct hvm_io_handler null_
 };
=20
 static int hvmemul_do_io(
-    bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int size,
+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr =3D current;
@@ -104,7 +104,7 @@ static int hvmemul_do_io(
         .type =3D is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr =3D addr,
         .size =3D size,
-        .count =3D reps,
+        .count =3D *reps,
         .dir =3D dir,
         .df =3D df,
         .data =3D data,
@@ -136,7 +136,7 @@ static int hvmemul_do_io(
         if ( (p.type !=3D is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||
              (p.addr !=3D addr) ||
              (p.size !=3D size) ||
-             (p.count !=3D reps) ||
+             (p.count !=3D *reps) ||
              (p.dir !=3D dir) ||
              (p.df !=3D df) ||
              (p.data_is_ptr !=3D data_is_addr) )
@@ -214,7 +214,7 @@ static int hvmemul_do_io_buffer(
=20
     BUG_ON(buffer =3D=3D NULL);
=20
-    rc =3D hvmemul_do_io(is_mmio, addr, *reps, size, dir, df, 0,
+    rc =3D hvmemul_do_io(is_mmio, addr, reps, size, dir, df, 0,
                        (uintptr_t)buffer);
     if ( rc =3D=3D X86EMUL_UNHANDLEABLE && dir =3D=3D IOREQ_READ )
         memset(buffer, 0xff, size);
@@ -305,13 +305,13 @@ static int hvmemul_do_io_addr(
         count =3D 1;
     }
=20
-    rc =3D hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,
+    rc =3D hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,
                        ram_gpa);
+
     if ( rc =3D=3D X86EMUL_OKAY )
-    {
         v->arch.hvm_vcpu.hvm_io.mmio_retry =3D (count < *reps);
-        *reps =3D count;
-    }
+
+    *reps =3D count;
=20
  out:
     while ( nr_pages )
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -148,7 +148,7 @@ int hvm_process_io_intercept(const struc
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
-                    rc =3D X86EMUL_UNHANDLEABLE;
+                    rc =3D -1; /* !=3D any X86EMUL_* value. */
                     break;
                 }
                 if ( rc !=3D X86EMUL_OKAY )
@@ -178,7 +178,7 @@ int hvm_process_io_intercept(const struc
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
-                    rc =3D X86EMUL_UNHANDLEABLE;
+                    rc =3D -1; /* !=3D any X86EMUL_* value. */
                     break;
                 }
                 if ( rc !=3D X86EMUL_OKAY )
@@ -196,8 +196,22 @@ int hvm_process_io_intercept(const struc
         }
     }
=20
-    if ( i !=3D 0 && rc =3D=3D X86EMUL_UNHANDLEABLE )
+    if ( unlikely(rc < 0) )
         domain_crash(current->domain);
+    else if ( i )
+    {
+        p->count =3D i;
+        rc =3D X86EMUL_OKAY;
+    }
+    else if ( rc =3D=3D X86EMUL_UNHANDLEABLE )
+    {
+        /*
+         * Don't forward entire batches to the device model: This would
+         * prevent the internal handlers to see subsequent iterations of
+         * the request.
+         */
+        p->count =3D 1;
+    }
=20
     return rc;
 }


--=__Part6A5DC268.2__=
Content-Type: text/plain; name="x86-HVM-internal-IO-with-REP.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="x86-HVM-internal-IO-with-REP.patch"

x86/HVM: fix forwarding of internally cached requests=0A=0AForwarding =
entire batches to the device model when an individual=0Aiteration of them =
got rejected by internal device emulation handlers=0Awith X86EMUL_UNHANDLEA=
BLE is wrong: The device model would then handle=0Aall iterations, without =
the internal handler getting to see any past=0Athe one it returned failure =
for. This causes misbehavior in at least=0Athe MSI-X and VGA code, which =
want to see all such requests for=0Ainternal tracking/caching purposes. =
But note that this does not apply=0Ato buffered I/O requests.=0A=0AThis in =
turn means that the condition in hvm_process_io_intercept() of=0Awhen to =
crash the domain was wrong: Since X86EMUL_UNHANDLEABLE can=0Avalidly be =
returned by the individual device handlers, we mustn't=0Ablindly crash the =
domain if such occurs on other than the initial=0Aiteration. Instead we =
need to distinguish hvm_copy_*_guest_phys()=0Afailures from device =
specific ones, and then the former need to always=0Abe fatal to the domain =
(i.e. also on the first iteration), since=0Aotherwise we again would end =
up forwarding a request to qemu which the=0Ainternal handler didn't get to =
see.=0A=0AAlso commit 4faffc41d ("x86/hvm: limit reps to avoid the need to =
handle=0Aretry") went too far in removing code from hvm_process_io_intercep=
t():=0AWhen there were successfully handled iterations, the function =
should=0Acontinue to return success with a clipped repeat count.=0A=0ASigne=
d-off-by: Jan Beulich <jbeulich@suse.com>=0ACc: Chang Jianzhong <changjzh@g=
mail.com>=0A---=0AI assume this also addresses the issue which=0Ahttp://lis=
ts.xenproject.org/archives/html/xen-devel/2016-01/msg03189.html=0Aattempted=
 to deal with in a not really acceptable way.=0A=0A--- a/xen/arch/x86/hvm/e=
mulate.c=0A+++ b/xen/arch/x86/hvm/emulate.c=0A@@ -95,7 +95,7 @@ static =
const struct hvm_io_handler null_=0A };=0A =0A static int hvmemul_do_io(=0A=
-    bool_t is_mmio, paddr_t addr, unsigned long reps, unsigned int =
size,=0A+    bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned =
int size,=0A     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t =
data)=0A {=0A     struct vcpu *curr =3D current;=0A@@ -104,7 +104,7 @@ =
static int hvmemul_do_io(=0A         .type =3D is_mmio ? IOREQ_TYPE_COPY : =
IOREQ_TYPE_PIO,=0A         .addr =3D addr,=0A         .size =3D size,=0A-  =
      .count =3D reps,=0A+        .count =3D *reps,=0A         .dir =3D =
dir,=0A         .df =3D df,=0A         .data =3D data,=0A@@ -136,7 +136,7 =
@@ static int hvmemul_do_io(=0A         if ( (p.type !=3D is_mmio ? =
IOREQ_TYPE_COPY : IOREQ_TYPE_PIO) ||=0A              (p.addr !=3D addr) =
||=0A              (p.size !=3D size) ||=0A-             (p.count !=3D =
reps) ||=0A+             (p.count !=3D *reps) ||=0A              (p.dir =
!=3D dir) ||=0A              (p.df !=3D df) ||=0A              (p.data_is_p=
tr !=3D data_is_addr) )=0A@@ -214,7 +214,7 @@ static int hvmemul_do_io_buff=
er(=0A =0A     BUG_ON(buffer =3D=3D NULL);=0A =0A-    rc =3D hvmemul_do_io(=
is_mmio, addr, *reps, size, dir, df, 0,=0A+    rc =3D hvmemul_do_io(is_mmio=
, addr, reps, size, dir, df, 0,=0A                        (uintptr_t)buffer=
);=0A     if ( rc =3D=3D X86EMUL_UNHANDLEABLE && dir =3D=3D IOREQ_READ =
)=0A         memset(buffer, 0xff, size);=0A@@ -305,13 +305,13 @@ static =
int hvmemul_do_io_addr(=0A         count =3D 1;=0A     }=0A =0A-    rc =3D =
hvmemul_do_io(is_mmio, addr, count, size, dir, df, 1,=0A+    rc =3D =
hvmemul_do_io(is_mmio, addr, &count, size, dir, df, 1,=0A                  =
      ram_gpa);=0A+=0A     if ( rc =3D=3D X86EMUL_OKAY )=0A-    {=0A       =
  v->arch.hvm_vcpu.hvm_io.mmio_retry =3D (count < *reps);=0A-        *reps =
=3D count;=0A-    }=0A+=0A+    *reps =3D count;=0A =0A  out:=0A     while =
( nr_pages )=0A--- a/xen/arch/x86/hvm/intercept.c=0A+++ b/xen/arch/x86/hvm/=
intercept.c=0A@@ -148,7 +148,7 @@ int hvm_process_io_intercept(const =
struc=0A                     ASSERT_UNREACHABLE();=0A                     =
/* fall through */=0A                 default:=0A-                    rc =
=3D X86EMUL_UNHANDLEABLE;=0A+                    rc =3D -1; /* !=3D any =
X86EMUL_* value. */=0A                     break;=0A                 }=0A  =
               if ( rc !=3D X86EMUL_OKAY )=0A@@ -178,7 +178,7 @@ int =
hvm_process_io_intercept(const struc=0A                     ASSERT_UNREACHA=
BLE();=0A                     /* fall through */=0A                 =
default:=0A-                    rc =3D X86EMUL_UNHANDLEABLE;=0A+           =
         rc =3D -1; /* !=3D any X86EMUL_* value. */=0A                     =
break;=0A                 }=0A                 if ( rc !=3D X86EMUL_OKAY =
)=0A@@ -196,8 +196,22 @@ int hvm_process_io_intercept(const struc=0A       =
  }=0A     }=0A =0A-    if ( i !=3D 0 && rc =3D=3D X86EMUL_UNHANDLEABLE =
)=0A+    if ( unlikely(rc < 0) )=0A         domain_crash(current->domain);=
=0A+    else if ( i )=0A+    {=0A+        p->count =3D i;=0A+        rc =
=3D X86EMUL_OKAY;=0A+    }=0A+    else if ( rc =3D=3D X86EMUL_UNHANDLEABLE =
)=0A+    {=0A+        /*=0A+         * Don't forward entire batches to the =
device model: This would=0A+         * prevent the internal handlers to =
see subsequent iterations of=0A+         * the request.=0A+         */=0A+ =
       p->count =3D 1;=0A+    }=0A =0A     return rc;=0A }=0A
--=__Part6A5DC268.2__=
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs
IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y
Zy94ZW4tZGV2ZWwK

--=__Part6A5DC268.2__=--