All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
@ 2015-11-19 17:34 Andrew Cooper
  2015-11-19 17:36 ` Andrew Cooper
  2015-11-20 10:54 ` Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2015-11-19 17:34 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Atom2, Jan Beulich

A sample Gentoo compliation of Xen contains

    lea    -0x1058(%rsp),%rsp
    orq    $0x0,(%rsp)
    lea    0x1020(%rsp),%rsp

Whatever the reason for silly code like this, it fools the current stack
overflow detection logic in the #DF handler (which triggers reliably on the
'orq' instruction).

Update the overflow condition to declare an overflow if %esp is anywhere
within the guard page, rather than just within the upper 8th of the page.

Additionally, check %esp against the expected stack base in all builds.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Atom2 <ariel.atom2@web2web.at>

Currently untested, therefore RFC

Atom2: If you have a free moment, would you mind giving this patch a spin on a
debug hypervisor?  I would expect it to top erroniously informing you that no
overflow was detected
---
 xen/arch/x86/traps.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e21fb78..d429149 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -383,10 +383,17 @@ void show_stack(const struct cpu_user_regs *regs)
 
 void show_stack_overflow(unsigned int cpu, const struct cpu_user_regs *regs)
 {
-#ifdef MEMORY_GUARD
     unsigned long esp = regs->rsp;
+    unsigned long curr_stack_base = esp & ~(STACK_SIZE - 1);
+#ifdef MEMORY_GUARD
     unsigned long esp_top, esp_bottom;
+#endif
 
+    if ( _p(curr_stack_base) != stack_base[cpu] )
+        printk("Current stack base %p differs from expected %p\n",
+               _p(curr_stack_base), stack_base[cpu]);
+
+#ifdef MEMORY_GUARD
     esp_bottom = (esp | (STACK_SIZE - 1)) + 1;
     esp_top    = esp_bottom - PRIMARY_STACK_SIZE;
 
@@ -394,9 +401,8 @@ void show_stack_overflow(unsigned int cpu, const struct cpu_user_regs *regs)
            (void *)esp_top, (void *)esp_bottom, (void *)esp,
            (void *)per_cpu(init_tss, cpu).esp0);
 
-    /* Trigger overflow trace if %esp is within 512 bytes of the guard page. */
-    if ( ((unsigned long)(esp - esp_top) > 512) &&
-         ((unsigned long)(esp_top - esp) > 512) )
+    /* Trigger overflow trace if %esp is anywhere within the guard page. */
+    if ( (esp & PAGE_MASK) != (esp_top - PAGE_SIZE) )
     {
         printk("No stack overflow detected. Skipping stack trace.\n");
         return;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-19 17:34 [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection Andrew Cooper
@ 2015-11-19 17:36 ` Andrew Cooper
  2015-11-20 10:56   ` Jan Beulich
  2015-11-20 10:54 ` Jan Beulich
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2015-11-19 17:36 UTC (permalink / raw)
  To: Xen-devel; +Cc: Atom2, Jan Beulich

On 19/11/15 17:34, Andrew Cooper wrote:
> A sample Gentoo compliation of Xen contains
>
>     lea    -0x1058(%rsp),%rsp
>     orq    $0x0,(%rsp)
>     lea    0x1020(%rsp),%rsp
>
> Whatever the reason for silly code like this, it fools the current stack
> overflow detection logic in the #DF handler (which triggers reliably on the
> 'orq' instruction).
>
> Update the overflow condition to declare an overflow if %esp is anywhere
> within the guard page, rather than just within the upper 8th of the page.
>
> Additionally, check %esp against the expected stack base in all builds.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Atom2 <ariel.atom2@web2web.at>
>
> Currently untested, therefore RFC
>
> Atom2: If you have a free moment, would you mind giving this patch a spin on a
> debug hypervisor?  I would expect it to top erroniously informing you that no
> overflow was detected
> ---

Another question is whether, given that the sample above moves the stack
by more than 4k, it would be wise to also guard the 4th currently-spare
page between the primary stack and IST stacks.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-19 17:34 [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection Andrew Cooper
  2015-11-19 17:36 ` Andrew Cooper
@ 2015-11-20 10:54 ` Jan Beulich
  2015-11-20 11:03   ` Andrew Cooper
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-11-20 10:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Atom2, Xen-devel

>>> On 19.11.15 at 18:34, <andrew.cooper3@citrix.com> wrote:
> @@ -394,9 +401,8 @@ void show_stack_overflow(unsigned int cpu, const struct cpu_user_regs *regs)
>             (void *)esp_top, (void *)esp_bottom, (void *)esp,
>             (void *)per_cpu(init_tss, cpu).esp0);
>  
> -    /* Trigger overflow trace if %esp is within 512 bytes of the guard page. */
> -    if ( ((unsigned long)(esp - esp_top) > 512) &&
> -         ((unsigned long)(esp_top - esp) > 512) )
> +    /* Trigger overflow trace if %esp is anywhere within the guard page. */
> +    if ( (esp & PAGE_MASK) != (esp_top - PAGE_SIZE) )

Is this correct? I'd suspect this to be wrong when esp is in the
lower of the two primary stack pages.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-19 17:36 ` Andrew Cooper
@ 2015-11-20 10:56   ` Jan Beulich
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2015-11-20 10:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Atom2, Xen-devel

>>> On 19.11.15 at 18:36, <andrew.cooper3@citrix.com> wrote:
> Another question is whether, given that the sample above moves the stack
> by more than 4k, it would be wise to also guard the 4th currently-spare
> page between the primary stack and IST stacks.

I'm not sure that would be worth it - we just can't preclude every
eventuality. What if another compiler decided to move the stack
pointer by more than 8k?

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-20 10:54 ` Jan Beulich
@ 2015-11-20 11:03   ` Andrew Cooper
  2015-11-20 12:23     ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2015-11-20 11:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Atom2, Xen-devel

On 20/11/15 10:54, Jan Beulich wrote:
>>>> On 19.11.15 at 18:34, <andrew.cooper3@citrix.com> wrote:
>> @@ -394,9 +401,8 @@ void show_stack_overflow(unsigned int cpu, const struct cpu_user_regs *regs)
>>             (void *)esp_top, (void *)esp_bottom, (void *)esp,
>>             (void *)per_cpu(init_tss, cpu).esp0);
>>  
>> -    /* Trigger overflow trace if %esp is within 512 bytes of the guard page. */
>> -    if ( ((unsigned long)(esp - esp_top) > 512) &&
>> -         ((unsigned long)(esp_top - esp) > 512) )
>> +    /* Trigger overflow trace if %esp is anywhere within the guard page. */
>> +    if ( (esp & PAGE_MASK) != (esp_top - PAGE_SIZE) )
> Is this correct? I'd suspect this to be wrong when esp is in the
> lower of the two primary stack pages.

If we have hit a double fault from the stack guard pages, one way or
another %esp is somewhere in the guard page.

Although now you point this out, it still might be just in the primary
stack and very close to the boundary, or misaligned across the
boundary.  Being an abort means that %esp in the exception frame might
not be the exact %esp which caused the issue. 

I will reintroduce some slop into the check.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-20 11:03   ` Andrew Cooper
@ 2015-11-20 12:23     ` Jan Beulich
  2015-11-20 12:52       ` Andrew Cooper
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-11-20 12:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Atom2, Xen-devel

>>> On 20.11.15 at 12:03, <andrew.cooper3@citrix.com> wrote:
> On 20/11/15 10:54, Jan Beulich wrote:
>>>>> On 19.11.15 at 18:34, <andrew.cooper3@citrix.com> wrote:
>>> @@ -394,9 +401,8 @@ void show_stack_overflow(unsigned int cpu, const struct 
> cpu_user_regs *regs)
>>>             (void *)esp_top, (void *)esp_bottom, (void *)esp,
>>>             (void *)per_cpu(init_tss, cpu).esp0);
>>>  
>>> -    /* Trigger overflow trace if %esp is within 512 bytes of the guard page. 
> */
>>> -    if ( ((unsigned long)(esp - esp_top) > 512) &&
>>> -         ((unsigned long)(esp_top - esp) > 512) )
>>> +    /* Trigger overflow trace if %esp is anywhere within the guard page. */
>>> +    if ( (esp & PAGE_MASK) != (esp_top - PAGE_SIZE) )
>> Is this correct? I'd suspect this to be wrong when esp is in the
>> lower of the two primary stack pages.
> 
> If we have hit a double fault from the stack guard pages, one way or
> another %esp is somewhere in the guard page.

But the #DF may be for a reason other than having run into a
stack guard page.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-20 12:23     ` Jan Beulich
@ 2015-11-20 12:52       ` Andrew Cooper
  2015-11-20 13:11         ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2015-11-20 12:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Atom2, Xen-devel

On 20/11/15 12:23, Jan Beulich wrote:
>>>> On 20.11.15 at 12:03, <andrew.cooper3@citrix.com> wrote:
>> On 20/11/15 10:54, Jan Beulich wrote:
>>>>>> On 19.11.15 at 18:34, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -394,9 +401,8 @@ void show_stack_overflow(unsigned int cpu, const struct 
>> cpu_user_regs *regs)
>>>>             (void *)esp_top, (void *)esp_bottom, (void *)esp,
>>>>             (void *)per_cpu(init_tss, cpu).esp0);
>>>>  
>>>> -    /* Trigger overflow trace if %esp is within 512 bytes of the guard page. 
>> */
>>>> -    if ( ((unsigned long)(esp - esp_top) > 512) &&
>>>> -         ((unsigned long)(esp_top - esp) > 512) )
>>>> +    /* Trigger overflow trace if %esp is anywhere within the guard page. */
>>>> +    if ( (esp & PAGE_MASK) != (esp_top - PAGE_SIZE) )
>>> Is this correct? I'd suspect this to be wrong when esp is in the
>>> lower of the two primary stack pages.
>> If we have hit a double fault from the stack guard pages, one way or
>> another %esp is somewhere in the guard page.
> But the #DF may be for a reason other than having run into a
> stack guard page.

Indeed, but under such circumstances, we don't want to continue with the
stack overflow analysis.

I had some other ideas about dumping some other state in the #DF
handler, but that is an independent change.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection
  2015-11-20 12:52       ` Andrew Cooper
@ 2015-11-20 13:11         ` Jan Beulich
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2015-11-20 13:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Atom2, Xen-devel

>>> On 20.11.15 at 13:52, <andrew.cooper3@citrix.com> wrote:
> On 20/11/15 12:23, Jan Beulich wrote:
>>>>> On 20.11.15 at 12:03, <andrew.cooper3@citrix.com> wrote:
>>> On 20/11/15 10:54, Jan Beulich wrote:
>>>>>>> On 19.11.15 at 18:34, <andrew.cooper3@citrix.com> wrote:
>>>>> @@ -394,9 +401,8 @@ void show_stack_overflow(unsigned int cpu, const struct 
>>> cpu_user_regs *regs)
>>>>>             (void *)esp_top, (void *)esp_bottom, (void *)esp,
>>>>>             (void *)per_cpu(init_tss, cpu).esp0);
>>>>>  
>>>>> -    /* Trigger overflow trace if %esp is within 512 bytes of the guard page. 
>>> */
>>>>> -    if ( ((unsigned long)(esp - esp_top) > 512) &&
>>>>> -         ((unsigned long)(esp_top - esp) > 512) )
>>>>> +    /* Trigger overflow trace if %esp is anywhere within the guard page. */
>>>>> +    if ( (esp & PAGE_MASK) != (esp_top - PAGE_SIZE) )
>>>> Is this correct? I'd suspect this to be wrong when esp is in the
>>>> lower of the two primary stack pages.
>>> If we have hit a double fault from the stack guard pages, one way or
>>> another %esp is somewhere in the guard page.
>> But the #DF may be for a reason other than having run into a
>> stack guard page.
> 
> Indeed, but under such circumstances, we don't want to continue with the
> stack overflow analysis.

Oh, right, the right side is the guard page (the "top" vs "bottom"
use is somewhat confusing when you consider memory addresses
instead of the direction the stack grows).

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-11-20 13:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-19 17:34 [PATCH RFC] x86/traps: Improve hypervisor stack overflow detection Andrew Cooper
2015-11-19 17:36 ` Andrew Cooper
2015-11-20 10:56   ` Jan Beulich
2015-11-20 10:54 ` Jan Beulich
2015-11-20 11:03   ` Andrew Cooper
2015-11-20 12:23     ` Jan Beulich
2015-11-20 12:52       ` Andrew Cooper
2015-11-20 13:11         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.