All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering
@ 2018-01-24 15:02 George Dunlap
  2018-01-24 15:43 ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: George Dunlap @ 2018-01-24 15:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

The fix for XSA-242 depends on the same cpu never calling
_put_page_type() while holding a page_lock() for that page; doing so
may cause a deadlock under the right conditions.

Furthermore, even before that, there was never any discipline for the
order in which page locks are grabbed; if there are any paths that
grab the locks for two different pages at once, we risk creating the
conditions for a deadlock to occur.

These are believed to be safe, because it is believed that:
1. No hypervisor paths ever lock two pages at once, and
2. We never call _put_page_type() on a page while holding its page lock.

Add a check to debug builds to catch any violations of these
assumpitons.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v2:
- Make wrapper macros to get rid of ugly #ifdefs
- Use "current_locked_page*" prefix
- Reword commit message

NB this doesn't address Andrew's comment from v1 about adding "more
than just a debug check".  I think we should check in the ASSERT()
while we discuss future potential work, and not let the best become
the enemy of the good.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5a1b472432..eb4b9eeb78 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1813,10 +1813,43 @@ static int free_l4_table(struct page_info *page)
     return rc;
 }
 
+#ifndef NDEBUG
+/*
+ * Check to make sure that we never nest page_lock() calls on a single
+ * cpu (which may deadlock if two cpus attempt to lock the same pages
+ * in a different order), and that we never call _put_page_type() on a
+ * page while we hold its page_lock() (which would deadlock after
+ * XSA-242).
+ */
+static DEFINE_PER_CPU(struct page_info *, current_locked_page);
+
+static inline void current_locked_page_set(struct page_info *page) {
+    this_cpu(current_locked_page) = page;
+}
+
+static inline bool current_locked_page_check(struct page_info *page) {
+    return this_cpu(current_locked_page) == page;
+}
+
+/*
+ * We need a separate "not-equal" check so the empty stubs can always
+ * return true.
+ */
+static inline bool current_locked_page_ne_check(struct page_info *page) {
+    return this_cpu(current_locked_page) != page;
+}
+#else
+#define current_locked_page_set(x)
+#define current_locked_page_check(x) true
+#define current_locked_page_ne_check(x) true
+#endif
+
 int page_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
+    ASSERT(current_locked_page_check(NULL));
+
     do {
         while ( (x = page->u.inuse.type_info) & PGT_locked )
             cpu_relax();
@@ -1827,6 +1860,8 @@ int page_lock(struct page_info *page)
             return 0;
     } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
 
+    current_locked_page_set(page);
+
     return 1;
 }
 
@@ -1834,6 +1869,8 @@ void page_unlock(struct page_info *page)
 {
     unsigned long x, nx, y = page->u.inuse.type_info;
 
+    ASSERT(current_locked_page_check(page));
+
     do {
         x = y;
         ASSERT((x & PGT_count_mask) && (x & PGT_locked));
@@ -1842,6 +1879,8 @@ void page_unlock(struct page_info *page)
         /* We must not drop the last reference here. */
         ASSERT(nx & PGT_count_mask);
     } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
+
+    current_locked_page_set(NULL);
 }
 
 /*
@@ -2420,6 +2459,8 @@ static int _put_page_type(struct page_info *page, bool preemptible,
 {
     unsigned long nx, x, y = page->u.inuse.type_info;
 
+    ASSERT(current_locked_page_ne_check(page));
+
     for ( ; ; )
     {
         x  = y;
-- 
2.15.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering
  2018-01-24 15:02 [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering George Dunlap
@ 2018-01-24 15:43 ` Jan Beulich
  2018-01-24 15:48   ` Andrew Cooper
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2018-01-24 15:43 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, xen-devel

>>> On 24.01.18 at 16:02, <george.dunlap@citrix.com> wrote:
> The fix for XSA-242 depends on the same cpu never calling
> _put_page_type() while holding a page_lock() for that page; doing so
> may cause a deadlock under the right conditions.
> 
> Furthermore, even before that, there was never any discipline for the
> order in which page locks are grabbed; if there are any paths that
> grab the locks for two different pages at once, we risk creating the
> conditions for a deadlock to occur.
> 
> These are believed to be safe, because it is believed that:
> 1. No hypervisor paths ever lock two pages at once, and
> 2. We never call _put_page_type() on a page while holding its page lock.
> 
> Add a check to debug builds to catch any violations of these
> assumpitons.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> v2:
> - Make wrapper macros to get rid of ugly #ifdefs
> - Use "current_locked_page*" prefix
> - Reword commit message
> 
> NB this doesn't address Andrew's comment from v1 about adding "more
> than just a debug check".  I think we should check in the ASSERT()
> while we discuss future potential work, and not let the best become
> the enemy of the good.

I agree, and hence
Acked-by: Jan Beulich <jbeulich@suse.com>
unless Andrew really means to veto this going in.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering
  2018-01-24 15:43 ` Jan Beulich
@ 2018-01-24 15:48   ` Andrew Cooper
  2018-01-24 15:54     ` George Dunlap
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2018-01-24 15:48 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap; +Cc: xen-devel

On 24/01/18 15:43, Jan Beulich wrote:
>>>> On 24.01.18 at 16:02, <george.dunlap@citrix.com> wrote:
>> The fix for XSA-242 depends on the same cpu never calling
>> _put_page_type() while holding a page_lock() for that page; doing so
>> may cause a deadlock under the right conditions.
>>
>> Furthermore, even before that, there was never any discipline for the
>> order in which page locks are grabbed; if there are any paths that
>> grab the locks for two different pages at once, we risk creating the
>> conditions for a deadlock to occur.
>>
>> These are believed to be safe, because it is believed that:
>> 1. No hypervisor paths ever lock two pages at once, and
>> 2. We never call _put_page_type() on a page while holding its page lock.
>>
>> Add a check to debug builds to catch any violations of these
>> assumpitons.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> ---
>> v2:
>> - Make wrapper macros to get rid of ugly #ifdefs
>> - Use "current_locked_page*" prefix
>> - Reword commit message
>>
>> NB this doesn't address Andrew's comment from v1 about adding "more
>> than just a debug check".  I think we should check in the ASSERT()
>> while we discuss future potential work, and not let the best become
>> the enemy of the good.
> I agree, and hence
> Acked-by: Jan Beulich <jbeulich@suse.com>
> unless Andrew really means to veto this going in.

Not especially, but how about:

* Note: If we subsequently find a valid use-case for locking more than
one page at once, this safely logic will need adjustment.

in the comment describing the rational, to explicitly state we have no
particular problem with nested locking?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering
  2018-01-24 15:48   ` Andrew Cooper
@ 2018-01-24 15:54     ` George Dunlap
  2018-01-29 16:39       ` Andrew Cooper
  0 siblings, 1 reply; 5+ messages in thread
From: George Dunlap @ 2018-01-24 15:54 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel

On 01/24/2018 03:48 PM, Andrew Cooper wrote:
> On 24/01/18 15:43, Jan Beulich wrote:
>>>>> On 24.01.18 at 16:02, <george.dunlap@citrix.com> wrote:
>>> The fix for XSA-242 depends on the same cpu never calling
>>> _put_page_type() while holding a page_lock() for that page; doing so
>>> may cause a deadlock under the right conditions.
>>>
>>> Furthermore, even before that, there was never any discipline for the
>>> order in which page locks are grabbed; if there are any paths that
>>> grab the locks for two different pages at once, we risk creating the
>>> conditions for a deadlock to occur.
>>>
>>> These are believed to be safe, because it is believed that:
>>> 1. No hypervisor paths ever lock two pages at once, and
>>> 2. We never call _put_page_type() on a page while holding its page lock.
>>>
>>> Add a check to debug builds to catch any violations of these
>>> assumpitons.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>> ---
>>> v2:
>>> - Make wrapper macros to get rid of ugly #ifdefs
>>> - Use "current_locked_page*" prefix
>>> - Reword commit message
>>>
>>> NB this doesn't address Andrew's comment from v1 about adding "more
>>> than just a debug check".  I think we should check in the ASSERT()
>>> while we discuss future potential work, and not let the best become
>>> the enemy of the good.
>> I agree, and hence
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>> unless Andrew really means to veto this going in.
> 
> Not especially, but how about:
> 
> * Note: If we subsequently find a valid use-case for locking more than
> one page at once, this safely logic will need adjustment.
> 
> in the comment describing the rational, to explicitly state we have no
> particular problem with nested locking?

What about copying the commit message into the comment and modifying it
slightly, thus:

---
We must never call _put_page_type() while holding a page_lock() for that
page; doing so may cause a deadlock under the right conditions.

Furthermore, there is no discipline for the order in which page locks
are grabbed; if there are any paths that grab the locks for two
different pages at once, we risk creating the conditions for a deadlock
to occur.

These are believed to be safe, because it is believed that:
1. No hypervisor paths ever lock two pages at once, and
2. We never call _put_page_type() on a page while holding its page lock.

Add a check to debug builds to catch any violations of these
assumpitons.

NB that if we find valid, safe reasons to hold two page locks at once,
these checks will need to be adjusted.
---

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering
  2018-01-24 15:54     ` George Dunlap
@ 2018-01-29 16:39       ` Andrew Cooper
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2018-01-29 16:39 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: xen-devel

On 24/01/18 15:54, George Dunlap wrote:
> On 01/24/2018 03:48 PM, Andrew Cooper wrote:
>> On 24/01/18 15:43, Jan Beulich wrote:
>>>>>> On 24.01.18 at 16:02, <george.dunlap@citrix.com> wrote:
>>>> The fix for XSA-242 depends on the same cpu never calling
>>>> _put_page_type() while holding a page_lock() for that page; doing so
>>>> may cause a deadlock under the right conditions.
>>>>
>>>> Furthermore, even before that, there was never any discipline for the
>>>> order in which page locks are grabbed; if there are any paths that
>>>> grab the locks for two different pages at once, we risk creating the
>>>> conditions for a deadlock to occur.
>>>>
>>>> These are believed to be safe, because it is believed that:
>>>> 1. No hypervisor paths ever lock two pages at once, and
>>>> 2. We never call _put_page_type() on a page while holding its page lock.
>>>>
>>>> Add a check to debug builds to catch any violations of these
>>>> assumpitons.
>>>>
>>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>>> ---
>>>> v2:
>>>> - Make wrapper macros to get rid of ugly #ifdefs
>>>> - Use "current_locked_page*" prefix
>>>> - Reword commit message
>>>>
>>>> NB this doesn't address Andrew's comment from v1 about adding "more
>>>> than just a debug check".  I think we should check in the ASSERT()
>>>> while we discuss future potential work, and not let the best become
>>>> the enemy of the good.
>>> I agree, and hence
>>> Acked-by: Jan Beulich <jbeulich@suse.com>
>>> unless Andrew really means to veto this going in.
>> Not especially, but how about:
>>
>> * Note: If we subsequently find a valid use-case for locking more than
>> one page at once, this safely logic will need adjustment.
>>
>> in the comment describing the rational, to explicitly state we have no
>> particular problem with nested locking?
> What about copying the commit message into the comment and modifying it
> slightly, thus:
>
> ---
> We must never call _put_page_type() while holding a page_lock() for that
> page; doing so may cause a deadlock under the right conditions.
>
> Furthermore, there is no discipline for the order in which page locks
> are grabbed; if there are any paths that grab the locks for two
> different pages at once, we risk creating the conditions for a deadlock
> to occur.
>
> These are believed to be safe, because it is believed that:
> 1. No hypervisor paths ever lock two pages at once, and
> 2. We never call _put_page_type() on a page while holding its page lock.
>
> Add a check to debug builds to catch any violations of these
> assumpitons.
>
> NB that if we find valid, safe reasons to hold two page locks at once,
> these checks will need to be adjusted.

LGTM.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-29 16:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 15:02 [PATCH v2] x86/mm: Add debug code to detect illegal page_lock and put_page_type ordering George Dunlap
2018-01-24 15:43 ` Jan Beulich
2018-01-24 15:48   ` Andrew Cooper
2018-01-24 15:54     ` George Dunlap
2018-01-29 16:39       ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.