All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] lguest: simplify lguest_iret
@ 2015-03-21 21:42 Denys Vlasenko
  2015-03-23  3:30 ` Rusty Russell
  0 siblings, 1 reply; 4+ messages in thread
From: Denys Vlasenko @ 2015-03-21 21:42 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Denys Vlasenko, lguest, x86, linux-kernel

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: lguest@lists.ozlabs.org
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/lguest/head_32.S | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/lguest/head_32.S b/arch/x86/lguest/head_32.S
index 6ddfe4f..583732c 100644
--- a/arch/x86/lguest/head_32.S
+++ b/arch/x86/lguest/head_32.S
@@ -168,29 +168,28 @@ ENTRY(lg_restore_fl)
  * So we have to copy eflags from the stack to lguest_data.irq_enabled before
  * we do the "iret".
  *
- * There are two problems with this: firstly, we need to use a register to do
- * the copy and secondly, the whole thing needs to be atomic.  The first
- * problem is easy to solve: push %eax on the stack so we can use it, and then
- * restore it at the end just before the real "iret".
+ * There are two problems with this: firstly, we can't clobber any registers
+ * and secondly, the whole thing needs to be atomic.  The first problem
+ * is solved by using "push memory"/"pop memory" instruction pair for copying.
  *
  * The second is harder: copying eflags to lguest_data.irq_enabled will turn
  * interrupts on before we're finished, so we could be interrupted before we
  * return to userspace or wherever.  Our solution to this is to surround the
  * code with lguest_noirq_start: and lguest_noirq_end: labels.  We tell the
  * Host that it is *never* to interrupt us there, even if interrupts seem to be
- * enabled.
+ * enabled. (It's not necessary to protect pop instruction, since
+ * data gets updated only after it completes, so we end up surrounding
+ * just one instruction, iret).
  */
 ENTRY(lguest_iret)
-	pushl	%eax
-	movl	12(%esp), %eax
-lguest_noirq_start:
+	pushl	2*4(%esp)
 	/*
 	 * Note the %ss: segment prefix here.  Normal data accesses use the
 	 * "ds" segment, but that will have already been restored for whatever
 	 * we're returning to (such as userspace): we can't trust it.  The %ss:
 	 * prefix makes sure we use the stack segment, which is still valid.
 	 */
-	movl	%eax,%ss:lguest_data+LGUEST_DATA_irq_enabled
-	popl	%eax
+	popl	%ss:lguest_data+LGUEST_DATA_irq_enabled
+lguest_noirq_start:
 	iret
 lguest_noirq_end:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] lguest: simplify lguest_iret
  2015-03-21 21:42 [PATCH] lguest: simplify lguest_iret Denys Vlasenko
@ 2015-03-23  3:30 ` Rusty Russell
  2015-03-24 14:26   ` Denys Vlasenko
  0 siblings, 1 reply; 4+ messages in thread
From: Rusty Russell @ 2015-03-23  3:30 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, lguest, x86, linux-kernel

Denys Vlasenko <dvlasenk@redhat.com> writes:
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: lguest@lists.ozlabs.org
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org

Oh, thanks, applied!

And now it's down to one instruction, we could change
try_deliver_interrupt() to handle this case (rather than ignoring the
interrupt altogether): just jump to the handler and leave the
stack set up.

Here's a pair of inline patches which attempt to do that (untested!).

Thanks,
Rusty.

lguest: suppress interrupts for single insn, not range.

The last patch reduced our interrupt-suppression region to one address,
so simplify the code somewhat.

Also, remove the obsolete undefined instruction ranges and the comment
which refers to lguest_guest.S instead of head_32.S.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/arch/x86/include/asm/lguest.h b/arch/x86/include/asm/lguest.h
index e2d4a4afa8c3..3bbc07a57a31 100644
--- a/arch/x86/include/asm/lguest.h
+++ b/arch/x86/include/asm/lguest.h
@@ -20,13 +20,10 @@ extern unsigned long switcher_addr;
 /* Found in switcher.S */
 extern unsigned long default_idt_entries[];
 
-/* Declarations for definitions in lguest_guest.S */
-extern char lguest_noirq_start[], lguest_noirq_end[];
+/* Declarations for definitions in arch/x86/lguest/head_32.S */
+extern char lguest_noirq_iret[];
 extern const char lgstart_cli[], lgend_cli[];
-extern const char lgstart_sti[], lgend_sti[];
-extern const char lgstart_popf[], lgend_popf[];
 extern const char lgstart_pushf[], lgend_pushf[];
-extern const char lgstart_iret[], lgend_iret[];
 
 extern void lguest_iret(void);
 extern void lguest_init(void);
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index ac4453d8520e..4c8cf656f21f 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -87,8 +87,7 @@
 
 struct lguest_data lguest_data = {
 	.hcall_status = { [0 ... LHCALL_RING_SIZE-1] = 0xFF },
-	.noirq_start = (u32)lguest_noirq_start,
-	.noirq_end = (u32)lguest_noirq_end,
+	.noirq_iret = (u32)lguest_noirq_iret,
 	.kernel_address = PAGE_OFFSET,
 	.blocked_interrupts = { 1 }, /* Block timer interrupts */
 	.syscall_vec = SYSCALL_VECTOR,
diff --git a/arch/x86/lguest/head_32.S b/arch/x86/lguest/head_32.S
index 583732cc5d18..128fe93161b4 100644
--- a/arch/x86/lguest/head_32.S
+++ b/arch/x86/lguest/head_32.S
@@ -133,9 +133,8 @@ ENTRY(lg_restore_fl)
 	ret
 /*:*/
 
-/* These demark the EIP range where host should never deliver interrupts. */
-.global lguest_noirq_start
-.global lguest_noirq_end
+/* These demark the EIP where host should never deliver interrupts. */
+.global lguest_noirq_iret
 
 /*M:004
  * When the Host reflects a trap or injects an interrupt into the Guest, it
@@ -174,12 +173,11 @@ ENTRY(lg_restore_fl)
  *
  * The second is harder: copying eflags to lguest_data.irq_enabled will turn
  * interrupts on before we're finished, so we could be interrupted before we
- * return to userspace or wherever.  Our solution to this is to surround the
- * code with lguest_noirq_start: and lguest_noirq_end: labels.  We tell the
+ * return to userspace or wherever.  Our solution to this is to tell the
  * Host that it is *never* to interrupt us there, even if interrupts seem to be
  * enabled. (It's not necessary to protect pop instruction, since
- * data gets updated only after it completes, so we end up surrounding
- * just one instruction, iret).
+ * data gets updated only after it completes, so we only need to protect
+ * one instruction, iret).
  */
 ENTRY(lguest_iret)
 	pushl	2*4(%esp)
@@ -190,6 +188,5 @@ ENTRY(lguest_iret)
 	 * prefix makes sure we use the stack segment, which is still valid.
 	 */
 	popl	%ss:lguest_data+LGUEST_DATA_irq_enabled
-lguest_noirq_start:
+lguest_noirq_iret:
 	iret
-lguest_noirq_end:
diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 1219af493c0f..19a32280731d 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -211,10 +211,9 @@ static void initialize(struct lg_cpu *cpu)
 
 	/*
 	 * The Guest tells us where we're not to deliver interrupts by putting
-	 * the range of addresses into "struct lguest_data".
+	 * the instruction address into "struct lguest_data".
 	 */
-	if (get_user(cpu->lg->noirq_start, &cpu->lg->lguest_data->noirq_start)
-	    || get_user(cpu->lg->noirq_end, &cpu->lg->lguest_data->noirq_end))
+	if (get_user(cpu->lg->noirq_iret, &cpu->lg->lguest_data->noirq_iret))
 		kill_guest(cpu, "bad guest page %p", cpu->lg->lguest_data);
 
 	/*
diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 70dfcdc29f1f..6d4c072b61e1 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -204,8 +204,7 @@ void try_deliver_interrupt(struct lg_cpu *cpu, unsigned int irq, bool more)
 	 * They may be in the middle of an iret, where they asked us never to
 	 * deliver interrupts.
 	 */
-	if (cpu->regs->eip >= cpu->lg->noirq_start &&
-	   (cpu->regs->eip < cpu->lg->noirq_end))
+	if (cpu->regs->eip == cpu->lg->noirq_iret)
 		return;
 
 	/* If they're halted, interrupts restart them. */
@@ -395,8 +394,9 @@ static bool direct_trap(unsigned int num)
  * The Guest has the ability to turn its interrupt gates into trap gates,
  * if it is careful.  The Host will let trap gates can go directly to the
  * Guest, but the Guest needs the interrupts atomically disabled for an
- * interrupt gate.  It can do this by pointing the trap gate at instructions
- * within noirq_start and noirq_end, where it can safely disable interrupts.
+ * interrupt gate.  The Host could provide a mechanism to register more
+ * "no-interrupt" regions, and the Guest could point the trap gate at
+ * instructions within that region, where it can safely disable interrupts.
  */
 
 /*M:006
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index 307e8b39e7d1..ac8ad0461e80 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -102,7 +102,7 @@ struct lguest {
 
 	struct pgdir pgdirs[4];
 
-	unsigned long noirq_start, noirq_end;
+	unsigned long noirq_iret;
 
 	unsigned int stack_pages;
 	u32 tsc_khz;
diff --git a/include/linux/lguest.h b/include/linux/lguest.h
index 9962c6bb1311..6db19f35f7c5 100644
--- a/include/linux/lguest.h
+++ b/include/linux/lguest.h
@@ -61,8 +61,8 @@ struct lguest_data {
 	u32 tsc_khz;
 
 /* Fields initialized by the Guest at boot: */
-	/* Instruction range to suppress interrupts even if enabled */
-	unsigned long noirq_start, noirq_end;
+	/* Instruction to suppress interrupts even if enabled */
+	unsigned long noirq_iret;
 	/* Address above which page tables are all identical. */
 	unsigned long kernel_address;
 	/* The vector to try to use for system calls (0x40 or 0x80). */

lguest: handle traps on the "interrupt suppressed" iret instruction.

Lguest's "iret" is non-atomic, as it needs to restore the interrupt
state before the real iret (the guest can't actually suppress
interrupts).  For this reason, the host discards an interrupt if it
occurs in this (1-instruction) window.

We can do better, by emulating the iret execution, then immediately
setting up the interrupt handler.  In fact, we don't need to do much,
as emulating the iret and setting up th stack for the interrupt handler
basically cancel each other out.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
index 70dfcdc29f1f..906aba0d762f 100644
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -56,21 +56,16 @@ static void push_guest_stack(struct lg_cpu *cpu, unsigned long *gstack, u32 val)
 }
 
 /*H:210
- * The set_guest_interrupt() routine actually delivers the interrupt or
- * trap.  The mechanics of delivering traps and interrupts to the Guest are the
- * same, except some traps have an "error code" which gets pushed onto the
- * stack as well: the caller tells us if this is one.
- *
- * "lo" and "hi" are the two parts of the Interrupt Descriptor Table for this
- * interrupt or trap.  It's split into two parts for traditional reasons: gcc
- * on i386 used to be frightened by 64 bit numbers.
+ * The push_guest_interrupt_stack() routine saves Guest state on the stack for
+ * an interrupt or trap.  The mechanics of delivering traps and interrupts to
+ * the Guest are the same, except some traps have an "error code" which gets
+ * pushed onto the stack as well: the caller tells us if this is one.
  *
  * We set up the stack just like the CPU does for a real interrupt, so it's
  * identical for the Guest (and the standard "iret" instruction will undo
  * it).
  */
-static void set_guest_interrupt(struct lg_cpu *cpu, u32 lo, u32 hi,
-				bool has_err)
+static void push_guest_interrupt_stack(struct lg_cpu *cpu, bool has_err)
 {
 	unsigned long gstack, origstack;
 	u32 eflags, ss, irq_enable;
@@ -130,12 +125,28 @@ static void set_guest_interrupt(struct lg_cpu *cpu, u32 lo, u32 hi,
 	if (has_err)
 		push_guest_stack(cpu, &gstack, cpu->regs->errcode);
 
-	/*
-	 * Now we've pushed all the old state, we change the stack, the code
-	 * segment and the address to execute.
-	 */
+	/* Adjust the stack pointer and stack segment. */
 	cpu->regs->ss = ss;
 	cpu->regs->esp = virtstack + (gstack - origstack);
+}
+
+/*
+ * This actually makes the Guest start executing the given interrupt/trap
+ * handler.
+ *
+ * "lo" and "hi" are the two parts of the Interrupt Descriptor Table for this
+ * interrupt or trap.  It's split into two parts for traditional reasons: gcc
+ * on i386 used to be frightened by 64 bit numbers.
+ */
+static void guest_run_interrupt(struct lg_cpu *cpu, u32 lo, u32 hi)
+{
+	/* If we're already in the kernel, we don't change stacks. */
+	if ((cpu->regs->ss&0x3) != GUEST_PL)
+		cpu->regs->ss = cpu->esp1;
+
+	/*
+	 * Set the code segment and the address to execute.
+	 */
 	cpu->regs->cs = (__KERNEL_CS|GUEST_PL);
 	cpu->regs->eip = idt_address(lo, hi);
 
@@ -158,6 +169,24 @@ static void set_guest_interrupt(struct lg_cpu *cpu, u32 lo, u32 hi,
 			kill_guest(cpu, "Disabling interrupts");
 }
 
+/* This restores the eflags word which was pushed on the stack by a trap */
+static void restore_eflags(struct lg_cpu *cpu)
+{
+	/* This is the physical address of the stack. */
+	unsigned long stack_pa = guest_pa(cpu, cpu->regs->esp);
+
+	/*
+	 * Stack looks like this:
+	 * Address	Contents
+	 * esp		EIP
+	 * esp + 4	CS
+	 * esp + 8	EFLAGS
+	 */
+	cpu->regs->eflags = lgread(cpu, stack_pa + 8, u32);
+	cpu->regs->eflags &=
+		~(X86_EFLAGS_TF|X86_EFLAGS_VM|X86_EFLAGS_RF|X86_EFLAGS_NT);
+}
+
 /*H:205
  * Virtual Interrupts.
  *
@@ -200,14 +229,6 @@ void try_deliver_interrupt(struct lg_cpu *cpu, unsigned int irq, bool more)
 
 	BUG_ON(irq >= LGUEST_IRQS);
 
-	/*
-	 * They may be in the middle of an iret, where they asked us never to
-	 * deliver interrupts.
-	 */
-	if (cpu->regs->eip >= cpu->lg->noirq_start &&
-	   (cpu->regs->eip < cpu->lg->noirq_end))
-		return;
-
 	/* If they're halted, interrupts restart them. */
 	if (cpu->halted) {
 		/* Re-enable interrupts. */
@@ -237,12 +258,28 @@ void try_deliver_interrupt(struct lg_cpu *cpu, unsigned int irq, bool more)
 	if (idt_present(idt->a, idt->b)) {
 		/* OK, mark it no longer pending and deliver it. */
 		clear_bit(irq, cpu->irqs_pending);
+
 		/*
-		 * set_guest_interrupt() takes the interrupt descriptor and a
-		 * flag to say whether this interrupt pushes an error code onto
-		 * the stack as well: virtual interrupts never do.
+		 * They may be about to iret, where they asked us never to
+		 * deliver interrupts.  In this case, we can emulate that iret
+		 * then immediately deliver the interrupt.  This is bascially
+		 * a noop: the iret would pop the interrupt frame and restore
+		 * eflags, and then we'd set it up again.  So just restore the
+		 * eflags word and jump straight to the handler in this case.
 		 */
-		set_guest_interrupt(cpu, idt->a, idt->b, false);
+		if (cpu->regs->eip >= cpu->lg->noirq_start &&
+		    (cpu->regs->eip < cpu->lg->noirq_end)) {
+			restore_eflags(cpu);
+		} else {
+			/*
+			 * set_guest_interrupt() takes a flag to say whether
+			 * this interrupt pushes an error code onto the stack
+			 * as well: virtual interrupts never do.
+			 */
+			push_guest_interrupt_stack(cpu, false);
+		}
+		/* Actually make Guest cpu jump to handler. */
+		guest_run_interrupt(cpu, idt->a, idt->b);
 	}
 
 	/*
@@ -353,8 +390,9 @@ bool deliver_trap(struct lg_cpu *cpu, unsigned int num)
 	 */
 	if (!idt_present(cpu->arch.idt[num].a, cpu->arch.idt[num].b))
 		return false;
-	set_guest_interrupt(cpu, cpu->arch.idt[num].a,
-			    cpu->arch.idt[num].b, has_err(num));
+	push_guest_interrupt_stack(cpu, has_err(num));
+	guest_run_interrupt(cpu, cpu->arch.idt[num].a,
+			    cpu->arch.idt[num].b);
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] lguest: simplify lguest_iret
  2015-03-23  3:30 ` Rusty Russell
@ 2015-03-24 14:26   ` Denys Vlasenko
  2015-03-25  3:38     ` Rusty Russell
  0 siblings, 1 reply; 4+ messages in thread
From: Denys Vlasenko @ 2015-03-24 14:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: lguest, x86, linux-kernel

On 03/23/2015 04:30 AM, Rusty Russell wrote:
> Denys Vlasenko <dvlasenk@redhat.com> writes:
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> CC: lguest@lists.ozlabs.org
>> CC: x86@kernel.org
>> CC: linux-kernel@vger.kernel.org
> 
> Oh, thanks, applied!
> 
> And now it's down to one instruction, we could change
> try_deliver_interrupt() to handle this case (rather than ignoring the
> interrupt altogether): just jump to the handler and leave the
> stack set up.
> 
> Here's a pair of inline patches which attempt to do that (untested!).
> 
> Thanks,
> Rusty.
> 
> lguest: suppress interrupts for single insn, not range.
> 
> The last patch reduced our interrupt-suppression region to one address,
> so simplify the code somewhat.
> 
> Also, remove the obsolete undefined instruction ranges and the comment
> which refers to lguest_guest.S instead of head_32.S.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/arch/x86/include/asm/lguest.h b/arch/x86/include/asm/lguest.h
> index e2d4a4afa8c3..3bbc07a57a31 100644
> --- a/arch/x86/include/asm/lguest.h
> +++ b/arch/x86/include/asm/lguest.h
> @@ -20,13 +20,10 @@ extern unsigned long switcher_addr;
>  /* Found in switcher.S */
>  extern unsigned long default_idt_entries[];
>  
> -/* Declarations for definitions in lguest_guest.S */
> -extern char lguest_noirq_start[], lguest_noirq_end[];
> +/* Declarations for definitions in arch/x86/lguest/head_32.S */
> +extern char lguest_noirq_iret[];
>  extern const char lgstart_cli[], lgend_cli[];
> -extern const char lgstart_sti[], lgend_sti[];
> -extern const char lgstart_popf[], lgend_popf[];
>  extern const char lgstart_pushf[], lgend_pushf[];
> -extern const char lgstart_iret[], lgend_iret[];
>  
>  extern void lguest_iret(void);
>  extern void lguest_init(void);
> diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
> index ac4453d8520e..4c8cf656f21f 100644
> --- a/arch/x86/lguest/boot.c
> +++ b/arch/x86/lguest/boot.c
> @@ -87,8 +87,7 @@
>  
>  struct lguest_data lguest_data = {
>  	.hcall_status = { [0 ... LHCALL_RING_SIZE-1] = 0xFF },
> -	.noirq_start = (u32)lguest_noirq_start,
> -	.noirq_end = (u32)lguest_noirq_end,
> +	.noirq_iret = (u32)lguest_noirq_iret,
>  	.kernel_address = PAGE_OFFSET,
>  	.blocked_interrupts = { 1 }, /* Block timer interrupts */
>  	.syscall_vec = SYSCALL_VECTOR,
> diff --git a/arch/x86/lguest/head_32.S b/arch/x86/lguest/head_32.S
> index 583732cc5d18..128fe93161b4 100644
> --- a/arch/x86/lguest/head_32.S
> +++ b/arch/x86/lguest/head_32.S
> @@ -133,9 +133,8 @@ ENTRY(lg_restore_fl)
>  	ret
>  /*:*/
>  
> -/* These demark the EIP range where host should never deliver interrupts. */
> -.global lguest_noirq_start
> -.global lguest_noirq_end
> +/* These demark the EIP where host should never deliver interrupts. */
> +.global lguest_noirq_iret
>  
>  /*M:004
>   * When the Host reflects a trap or injects an interrupt into the Guest, it
> @@ -174,12 +173,11 @@ ENTRY(lg_restore_fl)
>   *
>   * The second is harder: copying eflags to lguest_data.irq_enabled will turn
>   * interrupts on before we're finished, so we could be interrupted before we
> - * return to userspace or wherever.  Our solution to this is to surround the
> - * code with lguest_noirq_start: and lguest_noirq_end: labels.  We tell the
> + * return to userspace or wherever.  Our solution to this is to tell the
>   * Host that it is *never* to interrupt us there, even if interrupts seem to be
>   * enabled. (It's not necessary to protect pop instruction, since
> - * data gets updated only after it completes, so we end up surrounding
> - * just one instruction, iret).
> + * data gets updated only after it completes, so we only need to protect
> + * one instruction, iret).
>   */
>  ENTRY(lguest_iret)
>  	pushl	2*4(%esp)
> @@ -190,6 +188,5 @@ ENTRY(lguest_iret)
>  	 * prefix makes sure we use the stack segment, which is still valid.
>  	 */
>  	popl	%ss:lguest_data+LGUEST_DATA_irq_enabled
> -lguest_noirq_start:
> +lguest_noirq_iret:
>  	iret
> -lguest_noirq_end:
> diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
> index 1219af493c0f..19a32280731d 100644
> --- a/drivers/lguest/hypercalls.c
> +++ b/drivers/lguest/hypercalls.c
> @@ -211,10 +211,9 @@ static void initialize(struct lg_cpu *cpu)
>  
>  	/*
>  	 * The Guest tells us where we're not to deliver interrupts by putting
> -	 * the range of addresses into "struct lguest_data".
> +	 * the instruction address into "struct lguest_data".
>  	 */
> -	if (get_user(cpu->lg->noirq_start, &cpu->lg->lguest_data->noirq_start)
> -	    || get_user(cpu->lg->noirq_end, &cpu->lg->lguest_data->noirq_end))
> +	if (get_user(cpu->lg->noirq_iret, &cpu->lg->lguest_data->noirq_iret))
>  		kill_guest(cpu, "bad guest page %p", cpu->lg->lguest_data);
>  
>  	/*
> diff --git a/drivers/lguest/interrupts_and_traps.c b/drivers/lguest/interrupts_and_traps.c
> index 70dfcdc29f1f..6d4c072b61e1 100644
> --- a/drivers/lguest/interrupts_and_traps.c
> +++ b/drivers/lguest/interrupts_and_traps.c
> @@ -204,8 +204,7 @@ void try_deliver_interrupt(struct lg_cpu *cpu, unsigned int irq, bool more)
>  	 * They may be in the middle of an iret, where they asked us never to
>  	 * deliver interrupts.
>  	 */
> -	if (cpu->regs->eip >= cpu->lg->noirq_start &&
> -	   (cpu->regs->eip < cpu->lg->noirq_end))
> +	if (cpu->regs->eip == cpu->lg->noirq_iret)
>  		return;
>  
>  	/* If they're halted, interrupts restart them. */
> @@ -395,8 +394,9 @@ static bool direct_trap(unsigned int num)
>   * The Guest has the ability to turn its interrupt gates into trap gates,
>   * if it is careful.  The Host will let trap gates can go directly to the
>   * Guest, but the Guest needs the interrupts atomically disabled for an
> - * interrupt gate.  It can do this by pointing the trap gate at instructions
> - * within noirq_start and noirq_end, where it can safely disable interrupts.
> + * interrupt gate.  The Host could provide a mechanism to register more
> + * "no-interrupt" regions, and the Guest could point the trap gate at
> + * instructions within that region, where it can safely disable interrupts.
>   */
>  
>  /*M:006
> diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
> index 307e8b39e7d1..ac8ad0461e80 100644
> --- a/drivers/lguest/lg.h
> +++ b/drivers/lguest/lg.h
> @@ -102,7 +102,7 @@ struct lguest {
>  
>  	struct pgdir pgdirs[4];
>  
> -	unsigned long noirq_start, noirq_end;
> +	unsigned long noirq_iret;
>  
>  	unsigned int stack_pages;
>  	u32 tsc_khz;
> diff --git a/include/linux/lguest.h b/include/linux/lguest.h
> index 9962c6bb1311..6db19f35f7c5 100644
> --- a/include/linux/lguest.h
> +++ b/include/linux/lguest.h
> @@ -61,8 +61,8 @@ struct lguest_data {
>  	u32 tsc_khz;
>  
>  /* Fields initialized by the Guest at boot: */
> -	/* Instruction range to suppress interrupts even if enabled */
> -	unsigned long noirq_start, noirq_end;
> +	/* Instruction to suppress interrupts even if enabled */
> +	unsigned long noirq_iret;
>  	/* Address above which page tables are all identical. */
>  	unsigned long kernel_address;
>  	/* The vector to try to use for system calls (0x40 or 0x80). */
> 
> lguest: handle traps on the "interrupt suppressed" iret instruction.
> 
> Lguest's "iret" is non-atomic, as it needs to restore the interrupt
> state before the real iret (the guest can't actually suppress
> interrupts).  For this reason, the host discards an interrupt if it
> occurs in this (1-instruction) window.
> 
> We can do better, by emulating the iret execution, then immediately
> setting up the interrupt handler.  In fact, we don't need to do much,
> as emulating the iret and setting up th stack for the interrupt handler
> basically cancel each other out.


>  		/*
> +		 * They may be about to iret, where they asked us never to
> +		 * deliver interrupts.  In this case, we can emulate that iret
> +		 * then immediately deliver the interrupt.  This is bascially
> +		 * a noop: the iret would pop the interrupt frame and restore
> +		 * eflags, and then we'd set it up again.  So just restore the
> +		 * eflags word and jump straight to the handler in this case.
>  		 */
> +		if (cpu->regs->eip >= cpu->lg->noirq_start &&
> +		    (cpu->regs->eip < cpu->lg->noirq_end)) {
> +			restore_eflags(cpu);

In truth, this is not _exactly_ true for irets to CPL3.

If a new interrupt comes right after iret, then
a new transition to CPL0 will happen.

This means ss:esp will be loaded from tss.ss0:tss.sp0.

Meaning, that the new iret frame may be in a different place
than the one which was used by iret.

There is no good reason for CPL0 code to move iret frame around,
but who knows. As an example, look what 32-bit Linux kernel does
with NMI iret frames... it's mind bending.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] lguest: simplify lguest_iret
  2015-03-24 14:26   ` Denys Vlasenko
@ 2015-03-25  3:38     ` Rusty Russell
  0 siblings, 0 replies; 4+ messages in thread
From: Rusty Russell @ 2015-03-25  3:38 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: lguest, x86, linux-kernel

Denys Vlasenko <dvlasenk@redhat.com> writes:
> On 03/23/2015 04:30 AM, Rusty Russell wrote:
>> +		 * They may be about to iret, where they asked us never to
>> +		 * deliver interrupts.  In this case, we can emulate that iret
>> +		 * then immediately deliver the interrupt.  This is bascially
>> +		 * a noop: the iret would pop the interrupt frame and restore
>> +		 * eflags, and then we'd set it up again.  So just restore the
>> +		 * eflags word and jump straight to the handler in this case.
>>  		 */
>> +		if (cpu->regs->eip >= cpu->lg->noirq_start &&
>> +		    (cpu->regs->eip < cpu->lg->noirq_end)) {
>> +			restore_eflags(cpu);
>
> In truth, this is not _exactly_ true for irets to CPL3.
>
> If a new interrupt comes right after iret, then
> a new transition to CPL0 will happen.
>
> This means ss:esp will be loaded from tss.ss0:tss.sp0.
>
> Meaning, that the new iret frame may be in a different place
> than the one which was used by iret.

True.  We could check the to-be-restored-CPL and reset the sp.  Instead,
I've added this comment:

		/*
		 * They may be about to iret, where they asked us never to
		 * deliver interrupts.  In this case, we can emulate that iret
		 * then immediately deliver the interrupt.  This is basically
		 * a noop: the iret would pop the interrupt frame and restore
		 * eflags, and then we'd set it up again.  So just restore the
		 * eflags word and jump straight to the handler in this case.
		 *
		 * Denys Vlasenko points out that this isn't quite right: if
		 * the iret was returning to userspace, then that interrupt
		 * would reset the stack pointer (which the Guest told us
		 * about via LHCALL_SET_STACK).  But unless the Guest is being
		 * *really* weird, that will be the same as the current stack
		 * anyway.
		 */

> There is no good reason for CPL0 code to move iret frame around,
> but who knows. As an example, look what 32-bit Linux kernel does
> with NMI iret frames... it's mind bending.

Fortunately, lguest is allergic to NMIs :)

Thanks!
Rusty.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-25  4:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-21 21:42 [PATCH] lguest: simplify lguest_iret Denys Vlasenko
2015-03-23  3:30 ` Rusty Russell
2015-03-24 14:26   ` Denys Vlasenko
2015-03-25  3:38     ` Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.