All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Elliott Mitchell <ehem+xen@m5p.com>
Cc: xen-devel@lists.xenproject.org
Subject: Re: HVM/PVH Balloon crash
Date: Thu, 30 Sep 2021 09:43:07 +0200	[thread overview]
Message-ID: <e5a0bbc6-30ae-58f4-0326-3c6fafa9be25@suse.com> (raw)
In-Reply-To: <YVD59QVbmdVwzYQI@mattapan.m5p.com>

On 27.09.2021 00:53, Elliott Mitchell wrote:
> On Wed, Sep 15, 2021 at 08:05:05AM +0200, Jan Beulich wrote:
>> On 15.09.2021 04:40, Elliott Mitchell wrote:
>>> On Tue, Sep 07, 2021 at 05:57:10PM +0200, Jan Beulich wrote:
>>>> On 07.09.2021 17:03, Elliott Mitchell wrote:
>>>>>  Could be this system is in an
>>>>> intergenerational hole, and some spot in the PVH/HVM code makes an
>>>>> assumption of the presence of NPT guarantees presence of an operational
>>>>> IOMMU.  Otherwise if there was some copy and paste while writing IOMMU
>>>>> code, some portion of the IOMMU code might be checking for presence of
>>>>> NPT instead of presence of IOMMU.
>>>>
>>>> This is all very speculative; I consider what you suspect not very likely,
>>>> but also not entirely impossible. This is not the least because for a
>>>> long time we've been running without shared page tables on AMD.
>>>>
>>>> I'm afraid without technical data and without knowing how to repro, I
>>>> don't see a way forward here.
>>>
>>> Downtimes are very expensive even for lower-end servers.  Plus there is
>>> the issue the system wasn't meant for development and thus never had
>>> appropriate setup done.
>>>
>>> Experimentation with a system of similar age suggested another candidate.
>>> System has a conventional BIOS.  Might some dependancies on the presence
>>> of UEFI snuck into the NPT code?
>>
>> I can't think of any such, but as all of this is very nebulous I can't
>> really rule out anything.
> 
> Getting everything right to recreate is rather inexact.  Having an
> equivalent of `sysctl` to turn on the serial console while running might
> be handy...
> 
> Luckily get things together and...
> 
> (XEN) mm locking order violation: 48 > 16
> (XEN) Xen BUG at mm-locks.h:82

Would you give the patch below a try? While against current staging it
looks to apply fine to 4.14.3.

Jan

x86/PoD: defer nested P2M flushes

With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
order violation when the PoD lock is held around it. Hence such flushing
needs to be deferred. Steal the approach from p2m_change_type_range().

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -24,6 +24,7 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/trace.h>
+#include <asm/hvm/nestedhvm.h>
 #include <asm/page.h>
 #include <asm/paging.h>
 #include <asm/p2m.h>
@@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct
 static int
 p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn);
 
+static void pod_unlock_and_flush(struct p2m_domain *p2m)
+{
+    pod_unlock(p2m);
+    p2m->defer_nested_flush = false;
+    if ( nestedhvm_enabled(p2m->domain) )
+        p2m_flush_nestedp2m(p2m->domain);
+}
 
 /*
  * This function is needed for two reasons:
@@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma
 
     gfn_lock(p2m, gfn, order);
     pod_lock(p2m);
+    p2m->defer_nested_flush = true;
 
     /*
      * If we don't have any outstanding PoD entries, let things take their
@@ -665,7 +674,7 @@ out_entry_check:
     }
 
 out_unlock:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     gfn_unlock(p2m, gfn, order);
     return ret;
 }
@@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai
      * won't start until we're done.
      */
     if ( unlikely(d->is_dying) )
-        goto out_fail;
-
+    {
+        pod_unlock(p2m);
+        return false;
+    }
 
     /*
      * Because PoD does not have cache list for 1GB pages, it has to remap
@@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai
                               p2m_populate_on_demand, p2m->default_access);
     }
 
+    p2m->defer_nested_flush = true;
+
     /* Only reclaim if we're in actual need of more cache. */
     if ( p2m->pod.entry_count > p2m->pod.count )
         pod_eager_reclaim(p2m);
@@ -1229,8 +1242,9 @@ p2m_pod_demand_populate(struct p2m_domai
         __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
     }
 
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     return true;
+
 out_of_memory:
     pod_unlock(p2m);
 
@@ -1239,12 +1253,14 @@ out_of_memory:
            p2m->pod.entry_count, current->domain->domain_id);
     domain_crash(d);
     return false;
+
 out_fail:
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
     return false;
+
 remap_and_retry:
     BUG_ON(order != PAGE_ORDER_2M);
-    pod_unlock(p2m);
+    pod_unlock_and_flush(p2m);
 
     /*
      * Remap this 2-meg region in singleton chunks. See the comment on the



      parent reply	other threads:[~2021-09-30  7:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-05 22:10 HVM/PVH Ballon crash Elliott Mitchell
2021-09-06  7:52 ` Jan Beulich
2021-09-06 20:47   ` HVM/PVH Balloon crash Elliott Mitchell
2021-09-07  8:03     ` Jan Beulich
2021-09-07 15:03       ` Elliott Mitchell
2021-09-07 15:57         ` Jan Beulich
2021-09-07 21:40           ` Elliott Mitchell
2021-09-15  2:40           ` Elliott Mitchell
2021-09-15  6:05             ` Jan Beulich
2021-09-26 22:53               ` Elliott Mitchell
2021-09-29 13:32                 ` Jan Beulich
2021-09-29 15:31                   ` Elliott Mitchell
2021-09-30  7:08                     ` Jan Beulich
2021-10-02  2:35                       ` Elliott Mitchell
2021-10-07  7:20                         ` Jan Beulich
2021-09-30  7:43                 ` Jan Beulich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5a0bbc6-30ae-58f4-0326-3c6fafa9be25@suse.com \
    --to=jbeulich@suse.com \
    --cc=ehem+xen@m5p.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.