* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 12:53 ` Mihai Donțu
@ 2016-08-01 12:56 ` Mihai Donțu
2016-08-01 12:57 ` Andrew Cooper
2016-08-01 12:59 ` Jan Beulich
2 siblings, 0 replies; 26+ messages in thread
From: Mihai Donțu @ 2016-08-01 12:56 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Zhi Wang, Jan Beulich, xen-devel
On Monday 01 August 2016 15:53:27 Mihai Donțu wrote:
> On Monday 01 August 2016 10:52:12 Andrew Cooper wrote:
> > On 01/08/16 03:52, Mihai Donțu wrote:
> > > Found that Windows driver was using a SSE2 instruction MOVD.
> > >
> > > Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
> > > Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
> > > ---
> > > Picked from the XenServer 7 patch queue, as suggested by Andrew Cooper
> > >
> > > Changed since v2:
> > > * handle the case where the destination is a GPR
> > > ---
> > > xen/arch/x86/x86_emulate/x86_emulate.c | 38 +++++++++++++++++++++++++++++++---
> > > 1 file changed, 35 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
> > > index 44de3b6..9f89ada 100644
> > > --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> > > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> > > @@ -204,7 +204,7 @@ static uint8_t twobyte_table[256] = {
> > > /* 0x60 - 0x6F */
> > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
> > > /* 0x70 - 0x7F */
> > > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
> > > + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, ImplicitOps|ModRM,
> > > /* 0x80 - 0x87 */
> > > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
> > > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
> > > @@ -4409,6 +4409,10 @@ x86_emulate(
> > > case 0x6f: /* movq mm/m64,mm */
> > > /* {,v}movdq{a,u} xmm/m128,xmm */
> > > /* vmovdq{a,u} ymm/m256,ymm */
> > > + case 0x7e: /* movd mm,r/m32 */
> > > + /* movq mm,r/m64 */
> > > + /* {,v}movd xmm,r/m32 */
> > > + /* {,v}movq xmm,r/m64 */
> > > case 0x7f: /* movq mm,mm/m64 */
> > > /* {,v}movdq{a,u} xmm,xmm/m128 */
> > > /* vmovdq{a,u} ymm,ymm/m256 */
> > > @@ -4432,7 +4436,17 @@ x86_emulate(
> > > host_and_vcpu_must_have(sse2);
> > > buf[0] = 0x66; /* SSE */
> > > get_fpu(X86EMUL_FPU_xmm, &fic);
> > > - ea.bytes = (b == 0xd6 ? 8 : 16);
> > > + switch ( b )
> > > + {
> > > + case 0x7e:
> > > + ea.bytes = 4;
> > > + break;
> > > + case 0xd6:
> > > + ea.bytes = 8;
> > > + break;
> > > + default:
> > > + ea.bytes = 16;
> > > + }
> > > break;
> > > case vex_none:
> > > if ( b != 0xe7 )
> > > @@ -4452,7 +4466,17 @@ x86_emulate(
> > > ((vex.pfx != vex_66) && (vex.pfx != vex_f3)));
> > > host_and_vcpu_must_have(avx);
> > > get_fpu(X86EMUL_FPU_ymm, &fic);
> > > - ea.bytes = (b == 0xd6 ? 8 : (16 << vex.l));
> > > + switch ( b )
> > > + {
> > > + case 0x7e:
> > > + ea.bytes = 4;
> > > + break;
> > > + case 0xd6:
> > > + ea.bytes = 8;
> > > + break;
> > > + default:
> > > + ea.bytes = 16 << vex.l;
> > > + }
> > > }
> > > if ( ea.type == OP_MEM )
> > > {
> > > @@ -4468,6 +4492,14 @@ x86_emulate(
> > > vex.b = 1;
> > > buf[4] &= 0x38;
> > > }
> > > + else if ( b == 0x7e )
> > > + {
> > > + /* convert the GPR destination to (%rAX) */
> > > + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
> > > + rex_prefix &= ~REX_B;
> > > + vex.b = 1;
> > > + buf[4] &= 0x38;
> > > + }
> >
> > Thankyou for doing this. However, looking at it, it has some code in
> > common with the "ea.type == OP_MEM" clause.
> >
> > Would this work?
> >
> > diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
> > b/xen/arch/x86/x86_emulate/x86_emulate.c
> > index fe594ba..90db067 100644
> > --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> > @@ -4453,16 +4453,25 @@ x86_emulate(
> > get_fpu(X86EMUL_FPU_ymm, &fic);
> > ea.bytes = 16 << vex.l;
> > }
> > - if ( ea.type == OP_MEM )
> > + if ( ea.type == OP_MEM || ea.type == OP_REG )
> > {
> > - /* XXX enable once there is ops->ea() or equivalent
> > - generate_exception_if((vex.pfx == vex_66) &&
> > - (ops->ea(ea.mem.seg, ea.mem.off)
> > - & (ea.bytes - 1)), EXC_GP, 0); */
> > - if ( b == 0x6f )
> > - rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
> > - ea.bytes, ctxt);
> > /* convert memory operand to (%rAX) */
> > +
> > + if ( ea.type == OP_MEM)
> > + {
> > + /* XXX enable once there is ops->ea() or equivalent
> > + generate_exception_if((vex.pfx == vex_66) &&
> > + (ops->ea(ea.mem.seg, ea.mem.off)
> > + & (ea.bytes - 1)), EXC_GP, 0); */
> > + if ( b == 0x6f )
> > + rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
> > + ea.bytes, ctxt);
> > + }
> > + else if ( ea.type == OP_REG )
> > + {
> > + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
> > + }
> > +
> > rex_prefix &= ~REX_B;
> > vex.b = 1;
> > buf[4] &= 0x38;
> >
> >
> > This is untested, but avoids duplicating this bit of state maniupulation.
>
> Your suggestion makes sense, but I'm starting to doubt my initial
> patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
> GPR-handling route and I can't seem to be able to easily prevent it
> with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
> to take a harder look at how that class of instructions is coded.
Err, I meant "vmovq xmm1, xmm1".
--
Mihai DONȚU
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 12:53 ` Mihai Donțu
2016-08-01 12:56 ` Mihai Donțu
@ 2016-08-01 12:57 ` Andrew Cooper
2016-08-01 12:59 ` Jan Beulich
2 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2016-08-01 12:57 UTC (permalink / raw)
To: Mihai Donțu; +Cc: Zhi Wang, Jan Beulich, xen-devel
On 01/08/16 13:53, Mihai Donțu wrote:
> On Monday 01 August 2016 10:52:12 Andrew Cooper wrote:
>> On 01/08/16 03:52, Mihai Donțu wrote:
>>> Found that Windows driver was using a SSE2 instruction MOVD.
>>>
>>> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>>> Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
>>> ---
>>> Picked from the XenServer 7 patch queue, as suggested by Andrew Cooper
>>>
>>> Changed since v2:
>>> * handle the case where the destination is a GPR
>>> ---
>>> xen/arch/x86/x86_emulate/x86_emulate.c | 38 +++++++++++++++++++++++++++++++---
>>> 1 file changed, 35 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
>>> index 44de3b6..9f89ada 100644
>>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>>> @@ -204,7 +204,7 @@ static uint8_t twobyte_table[256] = {
>>> /* 0x60 - 0x6F */
>>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>>> /* 0x70 - 0x7F */
>>> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>>> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM, ImplicitOps|ModRM,
>>> /* 0x80 - 0x87 */
>>> ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>>> ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>>> @@ -4409,6 +4409,10 @@ x86_emulate(
>>> case 0x6f: /* movq mm/m64,mm */
>>> /* {,v}movdq{a,u} xmm/m128,xmm */
>>> /* vmovdq{a,u} ymm/m256,ymm */
>>> + case 0x7e: /* movd mm,r/m32 */
>>> + /* movq mm,r/m64 */
>>> + /* {,v}movd xmm,r/m32 */
>>> + /* {,v}movq xmm,r/m64 */
>>> case 0x7f: /* movq mm,mm/m64 */
>>> /* {,v}movdq{a,u} xmm,xmm/m128 */
>>> /* vmovdq{a,u} ymm,ymm/m256 */
>>> @@ -4432,7 +4436,17 @@ x86_emulate(
>>> host_and_vcpu_must_have(sse2);
>>> buf[0] = 0x66; /* SSE */
>>> get_fpu(X86EMUL_FPU_xmm, &fic);
>>> - ea.bytes = (b == 0xd6 ? 8 : 16);
>>> + switch ( b )
>>> + {
>>> + case 0x7e:
>>> + ea.bytes = 4;
>>> + break;
>>> + case 0xd6:
>>> + ea.bytes = 8;
>>> + break;
>>> + default:
>>> + ea.bytes = 16;
>>> + }
>>> break;
>>> case vex_none:
>>> if ( b != 0xe7 )
>>> @@ -4452,7 +4466,17 @@ x86_emulate(
>>> ((vex.pfx != vex_66) && (vex.pfx != vex_f3)));
>>> host_and_vcpu_must_have(avx);
>>> get_fpu(X86EMUL_FPU_ymm, &fic);
>>> - ea.bytes = (b == 0xd6 ? 8 : (16 << vex.l));
>>> + switch ( b )
>>> + {
>>> + case 0x7e:
>>> + ea.bytes = 4;
>>> + break;
>>> + case 0xd6:
>>> + ea.bytes = 8;
>>> + break;
>>> + default:
>>> + ea.bytes = 16 << vex.l;
>>> + }
>>> }
>>> if ( ea.type == OP_MEM )
>>> {
>>> @@ -4468,6 +4492,14 @@ x86_emulate(
>>> vex.b = 1;
>>> buf[4] &= 0x38;
>>> }
>>> + else if ( b == 0x7e )
>>> + {
>>> + /* convert the GPR destination to (%rAX) */
>>> + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>>> + rex_prefix &= ~REX_B;
>>> + vex.b = 1;
>>> + buf[4] &= 0x38;
>>> + }
>> Thankyou for doing this. However, looking at it, it has some code in
>> common with the "ea.type == OP_MEM" clause.
>>
>> Would this work?
>>
>> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
>> b/xen/arch/x86/x86_emulate/x86_emulate.c
>> index fe594ba..90db067 100644
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -4453,16 +4453,25 @@ x86_emulate(
>> get_fpu(X86EMUL_FPU_ymm, &fic);
>> ea.bytes = 16 << vex.l;
>> }
>> - if ( ea.type == OP_MEM )
>> + if ( ea.type == OP_MEM || ea.type == OP_REG )
>> {
>> - /* XXX enable once there is ops->ea() or equivalent
>> - generate_exception_if((vex.pfx == vex_66) &&
>> - (ops->ea(ea.mem.seg, ea.mem.off)
>> - & (ea.bytes - 1)), EXC_GP, 0); */
>> - if ( b == 0x6f )
>> - rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> - ea.bytes, ctxt);
>> /* convert memory operand to (%rAX) */
>> +
>> + if ( ea.type == OP_MEM)
>> + {
>> + /* XXX enable once there is ops->ea() or equivalent
>> + generate_exception_if((vex.pfx == vex_66) &&
>> + (ops->ea(ea.mem.seg, ea.mem.off)
>> + & (ea.bytes - 1)), EXC_GP, 0); */
>> + if ( b == 0x6f )
>> + rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> + ea.bytes, ctxt);
>> + }
>> + else if ( ea.type == OP_REG )
>> + {
>> + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> + }
>> +
>> rex_prefix &= ~REX_B;
>> vex.b = 1;
>> buf[4] &= 0x38;
>>
>>
>> This is untested, but avoids duplicating this bit of state maniupulation.
> Your suggestion makes sense, but I'm starting to doubt my initial
> patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
> GPR-handling route and I can't seem to be able to easily prevent it
> with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
> to take a harder look at how that class of instructions is coded.
>
That is a very good point - my suggestion doesn't differentiate GPR
register destinations from non-GPR register destinations, and it is only
the GPR register destinations we want to turn into memory accesses
what about this?
else if ( ea.type == OP_REG && b == 0x7e )
{
/* GPR register destination - point into the register block. */
*((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
}
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 12:53 ` Mihai Donțu
2016-08-01 12:56 ` Mihai Donțu
2016-08-01 12:57 ` Andrew Cooper
@ 2016-08-01 12:59 ` Jan Beulich
2016-08-01 13:28 ` Mihai Donțu
2 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2016-08-01 12:59 UTC (permalink / raw)
To: Mihai Donțu; +Cc: Andrew Cooper, Zhi Wang, xen-devel
>>> On 01.08.16 at 14:53, <mdontu@bitdefender.com> wrote:
> On Monday 01 August 2016 10:52:12 Andrew Cooper wrote:
>> On 01/08/16 03:52, Mihai Donțu wrote:
>> > Found that Windows driver was using a SSE2 instruction MOVD.
>> >
>> > Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>> > Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
>> > ---
>> > Picked from the XenServer 7 patch queue, as suggested by Andrew Cooper
>> >
>> > Changed since v2:
>> > * handle the case where the destination is a GPR
>> > ---
>> > xen/arch/x86/x86_emulate/x86_emulate.c | 38
> +++++++++++++++++++++++++++++++---
>> > 1 file changed, 35 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
> b/xen/arch/x86/x86_emulate/x86_emulate.c
>> > index 44de3b6..9f89ada 100644
>> > --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> > @@ -204,7 +204,7 @@ static uint8_t twobyte_table[256] = {
>> > /* 0x60 - 0x6F */
>> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> > /* 0x70 - 0x7F */
>> > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> > + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
> ImplicitOps|ModRM,
>> > /* 0x80 - 0x87 */
>> > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>> > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>> > @@ -4409,6 +4409,10 @@ x86_emulate(
>> > case 0x6f: /* movq mm/m64,mm */
>> > /* {,v}movdq{a,u} xmm/m128,xmm */
>> > /* vmovdq{a,u} ymm/m256,ymm */
>> > + case 0x7e: /* movd mm,r/m32 */
>> > + /* movq mm,r/m64 */
>> > + /* {,v}movd xmm,r/m32 */
>> > + /* {,v}movq xmm,r/m64 */
>> > case 0x7f: /* movq mm,mm/m64 */
>> > /* {,v}movdq{a,u} xmm,xmm/m128 */
>> > /* vmovdq{a,u} ymm,ymm/m256 */
>> > @@ -4432,7 +4436,17 @@ x86_emulate(
>> > host_and_vcpu_must_have(sse2);
>> > buf[0] = 0x66; /* SSE */
>> > get_fpu(X86EMUL_FPU_xmm, &fic);
>> > - ea.bytes = (b == 0xd6 ? 8 : 16);
>> > + switch ( b )
>> > + {
>> > + case 0x7e:
>> > + ea.bytes = 4;
>> > + break;
>> > + case 0xd6:
>> > + ea.bytes = 8;
>> > + break;
>> > + default:
>> > + ea.bytes = 16;
>> > + }
>> > break;
>> > case vex_none:
>> > if ( b != 0xe7 )
>> > @@ -4452,7 +4466,17 @@ x86_emulate(
>> > ((vex.pfx != vex_66) && (vex.pfx != vex_f3)));
>> > host_and_vcpu_must_have(avx);
>> > get_fpu(X86EMUL_FPU_ymm, &fic);
>> > - ea.bytes = (b == 0xd6 ? 8 : (16 << vex.l));
>> > + switch ( b )
>> > + {
>> > + case 0x7e:
>> > + ea.bytes = 4;
>> > + break;
>> > + case 0xd6:
>> > + ea.bytes = 8;
>> > + break;
>> > + default:
>> > + ea.bytes = 16 << vex.l;
>> > + }
>> > }
>> > if ( ea.type == OP_MEM )
>> > {
>> > @@ -4468,6 +4492,14 @@ x86_emulate(
>> > vex.b = 1;
>> > buf[4] &= 0x38;
>> > }
>> > + else if ( b == 0x7e )
>> > + {
>> > + /* convert the GPR destination to (%rAX) */
>> > + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> > + rex_prefix &= ~REX_B;
>> > + vex.b = 1;
>> > + buf[4] &= 0x38;
>> > + }
>>
>> Thankyou for doing this. However, looking at it, it has some code in
>> common with the "ea.type == OP_MEM" clause.
>>
>> Would this work?
>>
>> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
>> b/xen/arch/x86/x86_emulate/x86_emulate.c
>> index fe594ba..90db067 100644
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -4453,16 +4453,25 @@ x86_emulate(
>> get_fpu(X86EMUL_FPU_ymm, &fic);
>> ea.bytes = 16 << vex.l;
>> }
>> - if ( ea.type == OP_MEM )
>> + if ( ea.type == OP_MEM || ea.type == OP_REG )
>> {
>> - /* XXX enable once there is ops->ea() or equivalent
>> - generate_exception_if((vex.pfx == vex_66) &&
>> - (ops->ea(ea.mem.seg, ea.mem.off)
>> - & (ea.bytes - 1)), EXC_GP, 0); */
>> - if ( b == 0x6f )
>> - rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> - ea.bytes, ctxt);
>> /* convert memory operand to (%rAX) */
>> +
>> + if ( ea.type == OP_MEM)
>> + {
>> + /* XXX enable once there is ops->ea() or equivalent
>> + generate_exception_if((vex.pfx == vex_66) &&
>> + (ops->ea(ea.mem.seg, ea.mem.off)
>> + & (ea.bytes - 1)), EXC_GP, 0); */
>> + if ( b == 0x6f )
>> + rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> + ea.bytes, ctxt);
>> + }
>> + else if ( ea.type == OP_REG )
>> + {
>> + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> + }
>> +
>> rex_prefix &= ~REX_B;
>> vex.b = 1;
>> buf[4] &= 0x38;
>>
>>
>> This is untested, but avoids duplicating this bit of state maniupulation.
>
> Your suggestion makes sense, but I'm starting to doubt my initial
> patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
> GPR-handling route and I can't seem to be able to easily prevent it
> with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
> to take a harder look at how that class of instructions is coded.
You obviously need to distinguish the two kinds of register sources/
destinations: GPRs need suitable re-writing of the instruction (without
having looked at the most recent version of the patch yet I btw doubt
converting register to memory operands is the most efficient model),
while MMs, XMMs, and YMMs can retain their register encoding.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 12:59 ` Jan Beulich
@ 2016-08-01 13:28 ` Mihai Donțu
2016-08-01 13:43 ` Jan Beulich
0 siblings, 1 reply; 26+ messages in thread
From: Mihai Donțu @ 2016-08-01 13:28 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Zhi Wang, xen-devel
On Monday 01 August 2016 06:59:08 Jan Beulich wrote:
> >>> On 01.08.16 at 14:53, <mdontu@bitdefender.com> wrote:
> > On Monday 01 August 2016 10:52:12 Andrew Cooper wrote:
> >> On 01/08/16 03:52, Mihai Donțu wrote:
> >> > Found that Windows driver was using a SSE2 instruction MOVD.
> >> >
> >> > Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
> >> > Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
> >> > ---
> >> > Picked from the XenServer 7 patch queue, as suggested by Andrew Cooper
> >> >
> >> > Changed since v2:
> >> > * handle the case where the destination is a GPR
> >> > ---
> >> > xen/arch/x86/x86_emulate/x86_emulate.c | 38
> > +++++++++++++++++++++++++++++++---
> >> > 1 file changed, 35 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
> > b/xen/arch/x86/x86_emulate/x86_emulate.c
> >> > index 44de3b6..9f89ada 100644
> >> > --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> >> > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> >> > @@ -204,7 +204,7 @@ static uint8_t twobyte_table[256] = {
> >> > /* 0x60 - 0x6F */
> >> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
> >> > /* 0x70 - 0x7F */
> >> > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
> >> > + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
> > ImplicitOps|ModRM,
> >> > /* 0x80 - 0x87 */
> >> > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
> >> > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
> >> > @@ -4409,6 +4409,10 @@ x86_emulate(
> >> > case 0x6f: /* movq mm/m64,mm */
> >> > /* {,v}movdq{a,u} xmm/m128,xmm */
> >> > /* vmovdq{a,u} ymm/m256,ymm */
> >> > + case 0x7e: /* movd mm,r/m32 */
> >> > + /* movq mm,r/m64 */
> >> > + /* {,v}movd xmm,r/m32 */
> >> > + /* {,v}movq xmm,r/m64 */
> >> > case 0x7f: /* movq mm,mm/m64 */
> >> > /* {,v}movdq{a,u} xmm,xmm/m128 */
> >> > /* vmovdq{a,u} ymm,ymm/m256 */
> >> > @@ -4432,7 +4436,17 @@ x86_emulate(
> >> > host_and_vcpu_must_have(sse2);
> >> > buf[0] = 0x66; /* SSE */
> >> > get_fpu(X86EMUL_FPU_xmm, &fic);
> >> > - ea.bytes = (b == 0xd6 ? 8 : 16);
> >> > + switch ( b )
> >> > + {
> >> > + case 0x7e:
> >> > + ea.bytes = 4;
> >> > + break;
> >> > + case 0xd6:
> >> > + ea.bytes = 8;
> >> > + break;
> >> > + default:
> >> > + ea.bytes = 16;
> >> > + }
> >> > break;
> >> > case vex_none:
> >> > if ( b != 0xe7 )
> >> > @@ -4452,7 +4466,17 @@ x86_emulate(
> >> > ((vex.pfx != vex_66) && (vex.pfx != vex_f3)));
> >> > host_and_vcpu_must_have(avx);
> >> > get_fpu(X86EMUL_FPU_ymm, &fic);
> >> > - ea.bytes = (b == 0xd6 ? 8 : (16 << vex.l));
> >> > + switch ( b )
> >> > + {
> >> > + case 0x7e:
> >> > + ea.bytes = 4;
> >> > + break;
> >> > + case 0xd6:
> >> > + ea.bytes = 8;
> >> > + break;
> >> > + default:
> >> > + ea.bytes = 16 << vex.l;
> >> > + }
> >> > }
> >> > if ( ea.type == OP_MEM )
> >> > {
> >> > @@ -4468,6 +4492,14 @@ x86_emulate(
> >> > vex.b = 1;
> >> > buf[4] &= 0x38;
> >> > }
> >> > + else if ( b == 0x7e )
> >> > + {
> >> > + /* convert the GPR destination to (%rAX) */
> >> > + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
> >> > + rex_prefix &= ~REX_B;
> >> > + vex.b = 1;
> >> > + buf[4] &= 0x38;
> >> > + }
> >>
> >> Thankyou for doing this. However, looking at it, it has some code in
> >> common with the "ea.type == OP_MEM" clause.
> >>
> >> Would this work?
> >>
> >> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
> >> b/xen/arch/x86/x86_emulate/x86_emulate.c
> >> index fe594ba..90db067 100644
> >> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> >> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> >> @@ -4453,16 +4453,25 @@ x86_emulate(
> >> get_fpu(X86EMUL_FPU_ymm, &fic);
> >> ea.bytes = 16 << vex.l;
> >> }
> >> - if ( ea.type == OP_MEM )
> >> + if ( ea.type == OP_MEM || ea.type == OP_REG )
> >> {
> >> - /* XXX enable once there is ops->ea() or equivalent
> >> - generate_exception_if((vex.pfx == vex_66) &&
> >> - (ops->ea(ea.mem.seg, ea.mem.off)
> >> - & (ea.bytes - 1)), EXC_GP, 0); */
> >> - if ( b == 0x6f )
> >> - rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
> >> - ea.bytes, ctxt);
> >> /* convert memory operand to (%rAX) */
> >> +
> >> + if ( ea.type == OP_MEM)
> >> + {
> >> + /* XXX enable once there is ops->ea() or equivalent
> >> + generate_exception_if((vex.pfx == vex_66) &&
> >> + (ops->ea(ea.mem.seg, ea.mem.off)
> >> + & (ea.bytes - 1)), EXC_GP, 0); */
> >> + if ( b == 0x6f )
> >> + rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
> >> + ea.bytes, ctxt);
> >> + }
> >> + else if ( ea.type == OP_REG )
> >> + {
> >> + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
> >> + }
> >> +
> >> rex_prefix &= ~REX_B;
> >> vex.b = 1;
> >> buf[4] &= 0x38;
> >>
> >>
> >> This is untested, but avoids duplicating this bit of state maniupulation.
> >
> > Your suggestion makes sense, but I'm starting to doubt my initial
> > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
> > GPR-handling route and I can't seem to be able to easily prevent it
> > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
> > to take a harder look at how that class of instructions is coded.
>
> You obviously need to distinguish the two kinds of register sources/
> destinations: GPRs need suitable re-writing of the instruction (without
> having looked at the most recent version of the patch yet I btw doubt
> converting register to memory operands is the most efficient model),
> while MMs, XMMs, and YMMs can retain their register encoding.
Regarding efficiency, I'm not married with the approach I've proposed.
If you can give me a few more hints, I can give it a try.
--
Mihai DONȚU
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 13:28 ` Mihai Donțu
@ 2016-08-01 13:43 ` Jan Beulich
2016-08-01 14:48 ` Mihai Donțu
0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2016-08-01 13:43 UTC (permalink / raw)
To: Mihai Donțu; +Cc: Andrew Cooper, Zhi Wang, xen-devel
>>> On 01.08.16 at 15:28, <mdontu@bitdefender.com> wrote:
> On Monday 01 August 2016 06:59:08 Jan Beulich wrote:
>> >>> On 01.08.16 at 14:53, <mdontu@bitdefender.com> wrote:
>> > On Monday 01 August 2016 10:52:12 Andrew Cooper wrote:
>> >> On 01/08/16 03:52, Mihai Donțu wrote:
>> >> > Found that Windows driver was using a SSE2 instruction MOVD.
>> >> >
>> >> > Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
>> >> > Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
>> >> > ---
>> >> > Picked from the XenServer 7 patch queue, as suggested by Andrew Cooper
>> >> >
>> >> > Changed since v2:
>> >> > * handle the case where the destination is a GPR
>> >> > ---
>> >> > xen/arch/x86/x86_emulate/x86_emulate.c | 38
>> > +++++++++++++++++++++++++++++++---
>> >> > 1 file changed, 35 insertions(+), 3 deletions(-)
>> >> >
>> >> > diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
>> > b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> > index 44de3b6..9f89ada 100644
>> >> > --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> > +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> > @@ -204,7 +204,7 @@ static uint8_t twobyte_table[256] = {
>> >> > /* 0x60 - 0x6F */
>> >> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> >> > /* 0x70 - 0x7F */
>> >> > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> >> > + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps|ModRM,
>> > ImplicitOps|ModRM,
>> >> > /* 0x80 - 0x87 */
>> >> > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>> >> > ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
>> >> > @@ -4409,6 +4409,10 @@ x86_emulate(
>> >> > case 0x6f: /* movq mm/m64,mm */
>> >> > /* {,v}movdq{a,u} xmm/m128,xmm */
>> >> > /* vmovdq{a,u} ymm/m256,ymm */
>> >> > + case 0x7e: /* movd mm,r/m32 */
>> >> > + /* movq mm,r/m64 */
>> >> > + /* {,v}movd xmm,r/m32 */
>> >> > + /* {,v}movq xmm,r/m64 */
>> >> > case 0x7f: /* movq mm,mm/m64 */
>> >> > /* {,v}movdq{a,u} xmm,xmm/m128 */
>> >> > /* vmovdq{a,u} ymm,ymm/m256 */
>> >> > @@ -4432,7 +4436,17 @@ x86_emulate(
>> >> > host_and_vcpu_must_have(sse2);
>> >> > buf[0] = 0x66; /* SSE */
>> >> > get_fpu(X86EMUL_FPU_xmm, &fic);
>> >> > - ea.bytes = (b == 0xd6 ? 8 : 16);
>> >> > + switch ( b )
>> >> > + {
>> >> > + case 0x7e:
>> >> > + ea.bytes = 4;
>> >> > + break;
>> >> > + case 0xd6:
>> >> > + ea.bytes = 8;
>> >> > + break;
>> >> > + default:
>> >> > + ea.bytes = 16;
>> >> > + }
>> >> > break;
>> >> > case vex_none:
>> >> > if ( b != 0xe7 )
>> >> > @@ -4452,7 +4466,17 @@ x86_emulate(
>> >> > ((vex.pfx != vex_66) && (vex.pfx != vex_f3)));
>> >> > host_and_vcpu_must_have(avx);
>> >> > get_fpu(X86EMUL_FPU_ymm, &fic);
>> >> > - ea.bytes = (b == 0xd6 ? 8 : (16 << vex.l));
>> >> > + switch ( b )
>> >> > + {
>> >> > + case 0x7e:
>> >> > + ea.bytes = 4;
>> >> > + break;
>> >> > + case 0xd6:
>> >> > + ea.bytes = 8;
>> >> > + break;
>> >> > + default:
>> >> > + ea.bytes = 16 << vex.l;
>> >> > + }
>> >> > }
>> >> > if ( ea.type == OP_MEM )
>> >> > {
>> >> > @@ -4468,6 +4492,14 @@ x86_emulate(
>> >> > vex.b = 1;
>> >> > buf[4] &= 0x38;
>> >> > }
>> >> > + else if ( b == 0x7e )
>> >> > + {
>> >> > + /* convert the GPR destination to (%rAX) */
>> >> > + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> >> > + rex_prefix &= ~REX_B;
>> >> > + vex.b = 1;
>> >> > + buf[4] &= 0x38;
>> >> > + }
>> >>
>> >> Thankyou for doing this. However, looking at it, it has some code in
>> >> common with the "ea.type == OP_MEM" clause.
>> >>
>> >> Would this work?
>> >>
>> >> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> index fe594ba..90db067 100644
>> >> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> >> @@ -4453,16 +4453,25 @@ x86_emulate(
>> >> get_fpu(X86EMUL_FPU_ymm, &fic);
>> >> ea.bytes = 16 << vex.l;
>> >> }
>> >> - if ( ea.type == OP_MEM )
>> >> + if ( ea.type == OP_MEM || ea.type == OP_REG )
>> >> {
>> >> - /* XXX enable once there is ops->ea() or equivalent
>> >> - generate_exception_if((vex.pfx == vex_66) &&
>> >> - (ops->ea(ea.mem.seg, ea.mem.off)
>> >> - & (ea.bytes - 1)), EXC_GP, 0); */
>> >> - if ( b == 0x6f )
>> >> - rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> >> - ea.bytes, ctxt);
>> >> /* convert memory operand to (%rAX) */
>> >> +
>> >> + if ( ea.type == OP_MEM)
>> >> + {
>> >> + /* XXX enable once there is ops->ea() or equivalent
>> >> + generate_exception_if((vex.pfx == vex_66) &&
>> >> + (ops->ea(ea.mem.seg, ea.mem.off)
>> >> + & (ea.bytes - 1)), EXC_GP, 0); */
>> >> + if ( b == 0x6f )
>> >> + rc = ops->read(ea.mem.seg, ea.mem.off+0, mmvalp,
>> >> + ea.bytes, ctxt);
>> >> + }
>> >> + else if ( ea.type == OP_REG )
>> >> + {
>> >> + *((unsigned long *)&mmvalp) = (unsigned long)ea.reg;
>> >> + }
>> >> +
>> >> rex_prefix &= ~REX_B;
>> >> vex.b = 1;
>> >> buf[4] &= 0x38;
>> >>
>> >>
>> >> This is untested, but avoids duplicating this bit of state maniupulation.
>> >
>> > Your suggestion makes sense, but I'm starting to doubt my initial
>> > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
>> > GPR-handling route and I can't seem to be able to easily prevent it
>> > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
>> > to take a harder look at how that class of instructions is coded.
>>
>> You obviously need to distinguish the two kinds of register sources/
>> destinations: GPRs need suitable re-writing of the instruction (without
>> having looked at the most recent version of the patch yet I btw doubt
>> converting register to memory operands is the most efficient model),
>> while MMs, XMMs, and YMMs can retain their register encoding.
>
> Regarding efficiency, I'm not married with the approach I've proposed.
> If you can give me a few more hints, I can give it a try.
I'd rather pick a fixed register and update the regs->... field from that
after the stub was executed. E.g. using rAX and treating it just like a
return value of the "call". But maybe I'm imagining this easier than it
really is; as an alternative I'd then suggest really following what Andrew
said - use a pointer into regs->, not mmvalp. But (as said in the review
mail) you'd then have the problem of the missing zero-extension for
writes to 32-bit GPRs
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 13:43 ` Jan Beulich
@ 2016-08-01 14:48 ` Mihai Donțu
2016-08-01 14:53 ` Andrew Cooper
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Mihai Donțu @ 2016-08-01 14:48 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Zhi Wang, xen-devel
On Monday 01 August 2016 07:43:20 Jan Beulich wrote:
> > > > Your suggestion makes sense, but I'm starting to doubt my initial
> > > > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
> > > > GPR-handling route and I can't seem to be able to easily prevent it
> > > > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
> > > > to take a harder look at how that class of instructions is coded.
> > >
> > > You obviously need to distinguish the two kinds of register sources/
> > > destinations: GPRs need suitable re-writing of the instruction (without
> > > having looked at the most recent version of the patch yet I btw doubt
> > > converting register to memory operands is the most efficient model),
> > > while MMs, XMMs, and YMMs can retain their register encoding.
> >
> > Regarding efficiency, I'm not married with the approach I've proposed.
> > If you can give me a few more hints, I can give it a try.
>
> I'd rather pick a fixed register and update the regs->... field from that
> after the stub was executed. E.g. using rAX and treating it just like a
> return value of the "call". But maybe I'm imagining this easier than it
> really is; as an alternative I'd then suggest really following what Andrew
> said - use a pointer into regs->, not mmvalp. But (as said in the review
> mail) you'd then have the problem of the missing zero-extension for
> writes to 32-bit GPRs
I thought that by re-using (hijacking, really) mmvalp, the patch will
look less intrusive and thus not add too much to an already complex
code.
Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
idea to just zero-out the 64bit register before that? It does not
appear to be any instructions that write just the low dword. Or am I
misunderstanding the zero-extension concept?
--
Mihai DONȚU
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 14:48 ` Mihai Donțu
@ 2016-08-01 14:53 ` Andrew Cooper
2016-08-01 15:10 ` Mihai Donțu
2016-08-01 14:55 ` Mihai Donțu
2016-08-01 14:56 ` Jan Beulich
2 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2016-08-01 14:53 UTC (permalink / raw)
To: Mihai Donțu, Jan Beulich; +Cc: Zhi Wang, xen-devel
On 01/08/16 15:48, Mihai Donțu wrote:
> On Monday 01 August 2016 07:43:20 Jan Beulich wrote:
>>>>> Your suggestion makes sense, but I'm starting to doubt my initial
>>>>> patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
>>>>> GPR-handling route and I can't seem to be able to easily prevent it
>>>>> with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
>>>>> to take a harder look at how that class of instructions is coded.
>>>> You obviously need to distinguish the two kinds of register sources/
>>>> destinations: GPRs need suitable re-writing of the instruction (without
>>>> having looked at the most recent version of the patch yet I btw doubt
>>>> converting register to memory operands is the most efficient model),
>>>> while MMs, XMMs, and YMMs can retain their register encoding.
>>> Regarding efficiency, I'm not married with the approach I've proposed.
>>> If you can give me a few more hints, I can give it a try.
>> I'd rather pick a fixed register and update the regs->... field from that
>> after the stub was executed. E.g. using rAX and treating it just like a
>> return value of the "call". But maybe I'm imagining this easier than it
>> really is; as an alternative I'd then suggest really following what Andrew
>> said - use a pointer into regs->, not mmvalp. But (as said in the review
>> mail) you'd then have the problem of the missing zero-extension for
>> writes to 32-bit GPRs
> I thought that by re-using (hijacking, really) mmvalp, the patch will
> look less intrusive and thus not add too much to an already complex
> code.
>
> Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
> idea to just zero-out the 64bit register before that? It does not
> appear to be any instructions that write just the low dword. Or am I
> misunderstanding the zero-extension concept?
Any write to a 32bit GPR zero extends it to 64 bits. This is specified
by AMD64 and only applies to 32bit writes. 8 and 16 bit writes leave
the upper bits alone.
Therefore movd %mm, %eax will move 32bits from %mm to eax, then zero
extend the upper 32 bits of %rax.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 14:53 ` Andrew Cooper
@ 2016-08-01 15:10 ` Mihai Donțu
0 siblings, 0 replies; 26+ messages in thread
From: Mihai Donțu @ 2016-08-01 15:10 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Zhi Wang, Jan Beulich, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1580 bytes --]
On Monday 01 August 2016 15:53:56 Andrew Cooper wrote:
> On 01/08/16 15:48, Mihai Donțu wrote:
> > > I'd rather pick a fixed register and update the regs->... field from that
> > > after the stub was executed. E.g. using rAX and treating it just like a
> > > return value of the "call". But maybe I'm imagining this easier than it
> > > really is; as an alternative I'd then suggest really following what Andrew
> > > said - use a pointer into regs->, not mmvalp. But (as said in the review
> > > mail) you'd then have the problem of the missing zero-extension for
> > > writes to 32-bit GPRs
> >
> > I thought that by re-using (hijacking, really) mmvalp, the patch will
> > look less intrusive and thus not add too much to an already complex
> > code.
> >
> > Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
> > idea to just zero-out the 64bit register before that? It does not
> > appear to be any instructions that write just the low dword. Or am I
> > misunderstanding the zero-extension concept?
>
> Any write to a 32bit GPR zero extends it to 64 bits. This is specified
> by AMD64 and only applies to 32bit writes. 8 and 16 bit writes leave
> the upper bits alone.
>
> Therefore movd %mm, %eax will move 32bits from %mm to eax, then zero
> extend the upper 32 bits of %rax.
... and given that the stub has (%rAX) as destination, the
zero-extension is not implicit and I have to do it myself. Which also
means two of my tests in a previous patch are wrong (0xbdbdbdbdffffffff
checks). Darn! :-)
--
Mihai DONȚU
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 14:48 ` Mihai Donțu
2016-08-01 14:53 ` Andrew Cooper
@ 2016-08-01 14:55 ` Mihai Donțu
2016-08-01 14:59 ` Jan Beulich
2016-08-01 14:56 ` Jan Beulich
2 siblings, 1 reply; 26+ messages in thread
From: Mihai Donțu @ 2016-08-01 14:55 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Zhi Wang, xen-devel
On Monday 01 August 2016 17:48:33 Mihai Donțu wrote:
> On Monday 01 August 2016 07:43:20 Jan Beulich wrote:
> > > > > Your suggestion makes sense, but I'm starting to doubt my initial
> > > > > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
> > > > > GPR-handling route and I can't seem to be able to easily prevent it
> > > > > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
> > > > > to take a harder look at how that class of instructions is coded.
> > > >
> > > > You obviously need to distinguish the two kinds of register sources/
> > > > destinations: GPRs need suitable re-writing of the instruction (without
> > > > having looked at the most recent version of the patch yet I btw doubt
> > > > converting register to memory operands is the most efficient model),
> > > > while MMs, XMMs, and YMMs can retain their register encoding.
> > >
> > > Regarding efficiency, I'm not married with the approach I've proposed.
> > > If you can give me a few more hints, I can give it a try.
> >
> > I'd rather pick a fixed register and update the regs->... field from that
> > after the stub was executed. E.g. using rAX and treating it just like a
> > return value of the "call". But maybe I'm imagining this easier than it
> > really is; as an alternative I'd then suggest really following what Andrew
> > said - use a pointer into regs->, not mmvalp. But (as said in the review
> > mail) you'd then have the problem of the missing zero-extension for
> > writes to 32-bit GPRs
>
> I thought that by re-using (hijacking, really) mmvalp, the patch will
> look less intrusive and thus not add too much to an already complex
> code.
>
> Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
> idea to just zero-out the 64bit register before that? It does not
> appear to be any instructions that write just the low dword. Or am I
> misunderstanding the zero-extension concept?
>
Just to be sure I'm making myself understood, ea.reg contains the
output of decode_register() which, in turn, returns a pointer in regs.
--
Mihai DONȚU
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 14:55 ` Mihai Donțu
@ 2016-08-01 14:59 ` Jan Beulich
2016-08-01 15:01 ` Andrew Cooper
0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2016-08-01 14:59 UTC (permalink / raw)
To: Mihai Donțu; +Cc: Andrew Cooper, Zhi Wang, xen-devel
>>> On 01.08.16 at 16:55, <mdontu@bitdefender.com> wrote:
> On Monday 01 August 2016 17:48:33 Mihai Donțu wrote:
>> On Monday 01 August 2016 07:43:20 Jan Beulich wrote:
>> > > > > Your suggestion makes sense, but I'm starting to doubt my initial
>> > > > > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
>> > > > > GPR-handling route and I can't seem to be able to easily prevent it
>> > > > > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
>> > > > > to take a harder look at how that class of instructions is coded.
>> > > >
>> > > > You obviously need to distinguish the two kinds of register sources/
>> > > > destinations: GPRs need suitable re-writing of the instruction (without
>> > > > having looked at the most recent version of the patch yet I btw doubt
>> > > > converting register to memory operands is the most efficient model),
>> > > > while MMs, XMMs, and YMMs can retain their register encoding.
>> > >
>> > > Regarding efficiency, I'm not married with the approach I've proposed.
>> > > If you can give me a few more hints, I can give it a try.
>> >
>> > I'd rather pick a fixed register and update the regs->... field from that
>> > after the stub was executed. E.g. using rAX and treating it just like a
>> > return value of the "call". But maybe I'm imagining this easier than it
>> > really is; as an alternative I'd then suggest really following what Andrew
>> > said - use a pointer into regs->, not mmvalp. But (as said in the review
>> > mail) you'd then have the problem of the missing zero-extension for
>> > writes to 32-bit GPRs
>>
>> I thought that by re-using (hijacking, really) mmvalp, the patch will
>> look less intrusive and thus not add too much to an already complex
>> code.
>>
>> Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
>> idea to just zero-out the 64bit register before that? It does not
>> appear to be any instructions that write just the low dword. Or am I
>> misunderstanding the zero-extension concept?
>
> Just to be sure I'm making myself understood, ea.reg contains the
> output of decode_register() which, in turn, returns a pointer in regs.
Hmm, now that you say that maybe I got confused be the expression
you used: If, rather than doing a cast on the address of mmvalp, you
could get away with a simply assignment (and maybe a cast on the
rvalue), then I suppose your code might not be as wrong as it seemed
to me.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 14:59 ` Jan Beulich
@ 2016-08-01 15:01 ` Andrew Cooper
0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2016-08-01 15:01 UTC (permalink / raw)
To: Jan Beulich, Mihai Donțu; +Cc: Zhi Wang, xen-devel
On 01/08/16 15:59, Jan Beulich wrote:
>>>> On 01.08.16 at 16:55, <mdontu@bitdefender.com> wrote:
>> On Monday 01 August 2016 17:48:33 Mihai Donțu wrote:
>>> On Monday 01 August 2016 07:43:20 Jan Beulich wrote:
>>>>>>> Your suggestion makes sense, but I'm starting to doubt my initial
>>>>>>> patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
>>>>>>> GPR-handling route and I can't seem to be able to easily prevent it
>>>>>>> with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
>>>>>>> to take a harder look at how that class of instructions is coded.
>>>>>> You obviously need to distinguish the two kinds of register sources/
>>>>>> destinations: GPRs need suitable re-writing of the instruction (without
>>>>>> having looked at the most recent version of the patch yet I btw doubt
>>>>>> converting register to memory operands is the most efficient model),
>>>>>> while MMs, XMMs, and YMMs can retain their register encoding.
>>>>> Regarding efficiency, I'm not married with the approach I've proposed.
>>>>> If you can give me a few more hints, I can give it a try.
>>>> I'd rather pick a fixed register and update the regs->... field from that
>>>> after the stub was executed. E.g. using rAX and treating it just like a
>>>> return value of the "call". But maybe I'm imagining this easier than it
>>>> really is; as an alternative I'd then suggest really following what Andrew
>>>> said - use a pointer into regs->, not mmvalp. But (as said in the review
>>>> mail) you'd then have the problem of the missing zero-extension for
>>>> writes to 32-bit GPRs
>>> I thought that by re-using (hijacking, really) mmvalp, the patch will
>>> look less intrusive and thus not add too much to an already complex
>>> code.
>>>
>>> Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
>>> idea to just zero-out the 64bit register before that? It does not
>>> appear to be any instructions that write just the low dword. Or am I
>>> misunderstanding the zero-extension concept?
>> Just to be sure I'm making myself understood, ea.reg contains the
>> output of decode_register() which, in turn, returns a pointer in regs.
> Hmm, now that you say that maybe I got confused be the expression
> you used: If, rather than doing a cast on the address of mmvalp, you
> could get away with a simply assignment (and maybe a cast on the
> rvalue), then I suppose your code might not be as wrong as it seemed
> to me.
I had to spend quite a long time convincing myself that it was right.
The code would be easier if mmvalp was a plain void* rather than const
mmval_t *.
~Andrew
>
> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 2/3] x86/emulate: add support of emulating SSE2 instruction {, v}movd mm, r32/m32 and {, v}movq mm, r64
2016-08-01 14:48 ` Mihai Donțu
2016-08-01 14:53 ` Andrew Cooper
2016-08-01 14:55 ` Mihai Donțu
@ 2016-08-01 14:56 ` Jan Beulich
2 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2016-08-01 14:56 UTC (permalink / raw)
To: Mihai Donțu; +Cc: Andrew Cooper, Zhi Wang, xen-devel
>>> On 01.08.16 at 16:48, <mdontu@bitdefender.com> wrote:
> On Monday 01 August 2016 07:43:20 Jan Beulich wrote:
>> > > > Your suggestion makes sense, but I'm starting to doubt my initial
>> > > > patch. :-) I'm testing "movq xmm1, xmm1" and noticing that it takes the
>> > > > GPR-handling route and I can't seem to be able to easily prevent it
>> > > > with !(rex_prefix & REX_B), as rex_prefix == 0 and vex.b == 1. I need
>> > > > to take a harder look at how that class of instructions is coded.
>> > >
>> > > You obviously need to distinguish the two kinds of register sources/
>> > > destinations: GPRs need suitable re-writing of the instruction (without
>> > > having looked at the most recent version of the patch yet I btw doubt
>> > > converting register to memory operands is the most efficient model),
>> > > while MMs, XMMs, and YMMs can retain their register encoding.
>> >
>> > Regarding efficiency, I'm not married with the approach I've proposed.
>> > If you can give me a few more hints, I can give it a try.
>>
>> I'd rather pick a fixed register and update the regs->... field from that
>> after the stub was executed. E.g. using rAX and treating it just like a
>> return value of the "call". But maybe I'm imagining this easier than it
>> really is; as an alternative I'd then suggest really following what Andrew
>> said - use a pointer into regs->, not mmvalp. But (as said in the review
>> mail) you'd then have the problem of the missing zero-extension for
>> writes to 32-bit GPRs
>
> I thought that by re-using (hijacking, really) mmvalp, the patch will
> look less intrusive and thus not add too much to an already complex
> code.
Well, if that was really all it takes, I would agree. But as said in the
other sub-thread, I don't think it's as simple.
> Assuming I'll just pass to the stub "a"(ea.reg), would it be a good
> idea to just zero-out the 64bit register before that? It does not
> appear to be any instructions that write just the low dword. Or am I
> misunderstanding the zero-extension concept?
That's an option, as long as there's no fault delivery possible past
the point where you do that zeroing. Yet I'm afraid there might be
such a path; you'll need to check carefully.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 26+ messages in thread