All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kexec: Add spinlock for the whole hypercall
@ 2017-04-10 19:44 Eric DeVolder
  2017-04-11  7:46 ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Eric DeVolder @ 2017-04-10 19:44 UTC (permalink / raw)
  To: xen-devel, ian.jackson, wei.liu2; +Cc: daniel.kiper, eric.devolder

When we concurrently try to unload and load crash
images we eventually get:

 Xen call trace:
    [<ffff82d08018b04f>] machine_kexec_add_page+0x3a0/0x3fa
    [<ffff82d08018b184>] machine_kexec_load+0xdb/0x107
    [<ffff82d080116e8d>] kexec.c#kexec_load_slot+0x11/0x42
    [<ffff82d08011724f>] kexec.c#kexec_load+0x119/0x150
    [<ffff82d080117c1e>] kexec.c#do_kexec_op_internal+0xab/0xcf
    [<ffff82d080117c60>] do_kexec_op+0xe/0x1e
    [<ffff82d08025c620>] pv_hypercall+0x20a/0x44a
    [<ffff82d080260116>] cpufreq.c#test_all_events+0/0x30

 Pagetable walk from ffff820040088320:
  L4[0x104] = 00000002979d1063 ffffffffffffffff
  L3[0x001] = 00000002979d0063 ffffffffffffffff
  L2[0x000] = 00000002979c7063 ffffffffffffffff
  L1[0x088] = 80037a91ede97063 ffffffffffffffff

The interesting thing is that the page bits (063) look legit.

The operation on which we blow up is us trying to write
in the L1 and finding that the L2 entry points to some
bizzare MFN. It stinks of a race, and it looks like
the issue is due to no concurrency locks when dealing
with the crash kernel space.

Specifically we concurrently call kimage_alloc_crash_control_page
which iterates over the kexec_crash_area.start -> kexec_crash_area.size
and once found:

  if ( page )
  {
      image->next_crash_page = hole_end;
      clear_domain_page(_mfn(page_to_mfn(page)));
  }

clears. Since the parameters of what MFN to use are provided
by the callers (and the area to search is bounded) the the 'page'
is probably the same. So #1 we concurrently clear the
'control_code_page'.

The next step is us passing this 'control_code_page' to
machine_kexec_add_page. This function requires the MFNs:
page_to_maddr(image->control_code_page).

And this would always return the same virtual address, as
the MFN of the control_code_page is inside of the
kexec_crash_area.start -> kexec_crash_area.size area.

Then machine_kexec_add_page updates the L1 .. which can be done
concurrently and on subsequent calls we mangle it up.

This is all a theory at this time, but testing reveals
that adding the spinlock at the kexec hypercall fixes
the crash.

An alternative solution would be to only add the spinlock
on paths that touch the CRASH region and not everything.

NOTE: The spinlock in kexec_swap_images() was removed as
this function is only reachable on the kexec call, which is
now entirely protected by a spinlock, thus the local spinlock
is no longer necessary.

NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot
when unloading kexec image) to prevent crashes during
simultaneous load/unloads.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 xen/common/kexec.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 072cc8e..5bcc356 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -819,7 +819,6 @@ static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
 static int kexec_swap_images(int type, struct kexec_image *new,
                              struct kexec_image **old)
 {
-    static DEFINE_SPINLOCK(kexec_lock);
     int base, bit, pos;
     int new_slot, old_slot;
 
@@ -831,8 +830,6 @@ static int kexec_swap_images(int type, struct kexec_image *new,
     if ( kexec_load_get_bits(type, &base, &bit) )
         return -EINVAL;
 
-    spin_lock(&kexec_lock);
-
     pos = (test_bit(bit, &kexec_flags) != 0);
     old_slot = base + pos;
     new_slot = base + !pos;
@@ -845,8 +842,6 @@ static int kexec_swap_images(int type, struct kexec_image *new,
     clear_bit(old_slot, &kexec_flags);
     *old = kexec_image[old_slot];
 
-    spin_unlock(&kexec_lock);
-
     return 0;
 }
 
@@ -1187,12 +1182,22 @@ static int do_kexec_op_internal(unsigned long op,
                                 XEN_GUEST_HANDLE_PARAM(void) uarg,
                                 bool_t compat)
 {
+    static DEFINE_SPINLOCK(kexec_lock);
     int ret = -EINVAL;
 
     ret = xsm_kexec(XSM_PRIV);
     if ( ret )
         return ret;
 
+    /*
+     * Because we write directly to the reserved memory
+     * region when loading crash kernels we need a spinlock here to
+     * prevent multiple crash kernels from attempting to load
+     * simultaneously, and to prevent a crash kernel from loading
+     * over the top of a in use crash kernel.
+     */
+    spin_lock(&kexec_lock);
+
     switch ( op )
     {
     case KEXEC_CMD_kexec_get_range:
@@ -1227,6 +1232,8 @@ static int do_kexec_op_internal(unsigned long op,
         break;
     }
 
+    spin_unlock(&kexec_lock);
+
     return ret;
 }
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] kexec: Add spinlock for the whole hypercall
  2017-04-10 19:44 [PATCH] kexec: Add spinlock for the whole hypercall Eric DeVolder
@ 2017-04-11  7:46 ` Jan Beulich
  2017-04-11 11:24   ` Daniel Kiper
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2017-04-11  7:46 UTC (permalink / raw)
  To: eric.devolder
  Cc: Andrew Cooper, daniel.kiper, wei.liu2, ian.jackson, xen-devel

>>> On 10.04.17 at 21:44, <eric.devolder@oracle.com> wrote:

Please don't forget Cc-ing the maintainer(s) of the code being modified.

> @@ -1187,12 +1182,22 @@ static int do_kexec_op_internal(unsigned long op,
>                                  XEN_GUEST_HANDLE_PARAM(void) uarg,
>                                  bool_t compat)
>  {
> +    static DEFINE_SPINLOCK(kexec_lock);
>      int ret = -EINVAL;
>  
>      ret = xsm_kexec(XSM_PRIV);
>      if ( ret )
>          return ret;
>  
> +    /*
> +     * Because we write directly to the reserved memory
> +     * region when loading crash kernels we need a spinlock here to
> +     * prevent multiple crash kernels from attempting to load
> +     * simultaneously, and to prevent a crash kernel from loading
> +     * over the top of a in use crash kernel.
> +     */
> +    spin_lock(&kexec_lock);
> +
>      switch ( op )
>      {
>      case KEXEC_CMD_kexec_get_range:
> @@ -1227,6 +1232,8 @@ static int do_kexec_op_internal(unsigned long op,
>          break;
>      }
>  
> +    spin_unlock(&kexec_lock);
> +
>      return ret;
>  }

How long can a kexec-op take? Are you putting systems at risk of
the watchdog firing? Even if you don't, you put all sorts of
operations inside a lock now which preferably should run outside
of atomic context (memory allocation, guest memory accesses).
If such a global locking approach is really necessary (i.e. if we
can't expect the - privileged - caller to synchronize calls properly),
wouldn't it be better to handle this with a static state variable,
which gets checked/set atomically, and which if already set simply
leads to a continuation being arranged for?

Furthermore - are there problems also with e.g. loading a
default and a crash kernel in parallel? I.e. aren't you doing more
synchronization than really necessary?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kexec: Add spinlock for the whole hypercall
  2017-04-11  7:46 ` Jan Beulich
@ 2017-04-11 11:24   ` Daniel Kiper
  2017-04-11 12:31     ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Kiper @ 2017-04-11 11:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, eric.devolder, wei.liu2, ian.jackson, xen-devel

On Tue, Apr 11, 2017 at 01:46:37AM -0600, Jan Beulich wrote:
> >>> On 10.04.17 at 21:44, <eric.devolder@oracle.com> wrote:
>
> Please don't forget Cc-ing the maintainer(s) of the code being modified.
>
> > @@ -1187,12 +1182,22 @@ static int do_kexec_op_internal(unsigned long op,
> >                                  XEN_GUEST_HANDLE_PARAM(void) uarg,
> >                                  bool_t compat)
> >  {
> > +    static DEFINE_SPINLOCK(kexec_lock);
> >      int ret = -EINVAL;
> >
> >      ret = xsm_kexec(XSM_PRIV);
> >      if ( ret )
> >          return ret;
> >
> > +    /*
> > +     * Because we write directly to the reserved memory
> > +     * region when loading crash kernels we need a spinlock here to
> > +     * prevent multiple crash kernels from attempting to load
> > +     * simultaneously, and to prevent a crash kernel from loading
> > +     * over the top of a in use crash kernel.
> > +     */
> > +    spin_lock(&kexec_lock);
> > +
> >      switch ( op )
> >      {
> >      case KEXEC_CMD_kexec_get_range:
> > @@ -1227,6 +1232,8 @@ static int do_kexec_op_internal(unsigned long op,
> >          break;
> >      }
> >
> > +    spin_unlock(&kexec_lock);
> > +
> >      return ret;
> >  }
>
> How long can a kexec-op take? Are you putting systems at risk of
> the watchdog firing? Even if you don't, you put all sorts of
> operations inside a lock now which preferably should run outside
> of atomic context (memory allocation, guest memory accesses).

Facepalm! I forgot about that.

> If such a global locking approach is really necessary (i.e. if we

Potentially no but otherwise we will further complicate sufficiently
complicated code now. So, I would like to have one sync place.

> can't expect the - privileged - caller to synchronize calls properly),
> wouldn't it be better to handle this with a static state variable,
> which gets checked/set atomically, and which if already set simply
> leads to a continuation being arranged for?

Do you think about something like that:

if ( test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) && hypercall_preempt_check() )
    return hypercall_create_continuation(__HYPERVISOR_kexec_op, "h", uarg);

> Furthermore - are there problems also with e.g. loading a
> default and a crash kernel in parallel? I.e. aren't you doing more

No it should not be any issue there.

> synchronization than really necessary?

I do not think that it is very big issue here but if you wish we can fix it that too.

Daniel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kexec: Add spinlock for the whole hypercall
  2017-04-11 11:24   ` Daniel Kiper
@ 2017-04-11 12:31     ` Jan Beulich
  2017-04-11 12:39       ` Daniel Kiper
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2017-04-11 12:31 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: Andrew Cooper, eric.devolder, wei.liu2, ian.jackson, xen-devel

>>> On 11.04.17 at 13:24, <daniel.kiper@oracle.com> wrote:
> On Tue, Apr 11, 2017 at 01:46:37AM -0600, Jan Beulich wrote:
>> >>> On 10.04.17 at 21:44, <eric.devolder@oracle.com> wrote:
>> wouldn't it be better to handle this with a static state variable,
>> which gets checked/set atomically, and which if already set simply
>> leads to a continuation being arranged for?
> 
> Do you think about something like that:
> 
> if ( test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) && hypercall_preempt_check() )
>     return hypercall_create_continuation(__HYPERVISOR_kexec_op, "h", uarg);

Well, minus the hypercall_preempt_check() call of course. And I'd
expect this to be a test_and_set_bit().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] kexec: Add spinlock for the whole hypercall
  2017-04-11 12:31     ` Jan Beulich
@ 2017-04-11 12:39       ` Daniel Kiper
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Kiper @ 2017-04-11 12:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, eric.devolder, wei.liu2, ian.jackson, xen-devel

On Tue, Apr 11, 2017 at 06:31:36AM -0600, Jan Beulich wrote:
> >>> On 11.04.17 at 13:24, <daniel.kiper@oracle.com> wrote:
> > On Tue, Apr 11, 2017 at 01:46:37AM -0600, Jan Beulich wrote:
> >> >>> On 10.04.17 at 21:44, <eric.devolder@oracle.com> wrote:
> >> wouldn't it be better to handle this with a static state variable,
> >> which gets checked/set atomically, and which if already set simply
> >> leads to a continuation being arranged for?
> >
> > Do you think about something like that:
> >
> > if ( test_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) && hypercall_preempt_check() )
> >     return hypercall_create_continuation(__HYPERVISOR_kexec_op, "h", uarg);
>
> Well, minus the hypercall_preempt_check() call of course. And I'd

OK.

> expect this to be a test_and_set_bit().

Ahhh... Right.

Eric, please try to fix it as Jan suggested.

Daniel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-11 12:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-10 19:44 [PATCH] kexec: Add spinlock for the whole hypercall Eric DeVolder
2017-04-11  7:46 ` Jan Beulich
2017-04-11 11:24   ` Daniel Kiper
2017-04-11 12:31     ` Jan Beulich
2017-04-11 12:39       ` Daniel Kiper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.