All of lore.kernel.org
 help / color / mirror / Atom feed
* Is this a racing bug in page_make_sharable()?
@ 2012-01-16 14:51 Nai Xia
  0 siblings, 0 replies; 8+ messages in thread
From: Nai Xia @ 2012-01-16 14:51 UTC (permalink / raw)
  To: xen-devel

Hi,

As I understand, the purpose of the code in page_make_sharable()
checking the ref count is to ensure that nobody unexpected is working
on the page, and so we can migrate it to dom_cow, right?

====
   /* Check if the ref count is 2. The first from PGT_allocated, and
    * the second from get_page_and_type at the top of this function */
   if(page->count_info != (PGC_allocated | (2 + expected_refcnt)))
   {
       /* Return type count back to zero */
       put_page_and_type(page);
       spin_unlock(&d->page_alloc_lock);
       return -E2BIG;
   }
====

However, it seems to me that this ref check and the following page
migration is not atomic( although the operations for type_info ref
check seems good) i.e. it's possible that it passed this ref
check but just before it goes to dom_cow, someone else gets this page?
As far as I know, in Linux kernel's similar scenarios, they do ref-freezing
tricks.

And if we don't need to worry about this racing case, then what is
this check trying to ensure?


Thanks,
Nai

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is this a racing bug in page_make_sharable()?
  2013-01-10 17:25           ` Tim Deegan
@ 2013-01-11  2:46             ` Nai Xia
  0 siblings, 0 replies; 8+ messages in thread
From: Nai Xia @ 2013-01-11  2:46 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Lixiuchang, Andres Lagar-Cavilla, Xiaowei Yang, Luohao (brian),
	xen-devel



On 2013年01月11日 01:25, Tim Deegan wrote:
> At 11:36 -0500 on 10 Jan (1357817806), Andres Lagar-Cavilla wrote:
>> Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks.
>>
>> Having said that...
>> On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote:
>>
>>> Hi,
>>>
>>> At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote:
>>>> I think I can construct a bug by interleaving the two code paths:
>>>>
>>>> in guest_remove_page()              |              in page_make_sharable()
>>>> ------------------------------------------------------------------------------------------------------------------------------
>>>> if ( p2m_is_shared(p2mt) )                       .....
>>>> ...                                              .....
>>>> page = mfn_to_page(mfn);                         .....
>>>>                                                  .....
>>>>
>>>>                                                  if (
>>>>                                                  !get_page_and_type(page,
>>>>                                                  d, PGT_shared_page) )
>>>>                                                  // success
>>>>
>>>>                                                  .........
>>>>                                                  if ( page->count_info !=
>>>>                                                  (PGC_allocated | (2 +
>>>>                                                  expected_refcnt)) ) //
>>>>                                                  also pass
>>>>
>>>>
>>>> if ( unlikely(!get_page(page, d)) )
>>>>
>>>> /* go on to remove page */                       /* go on to add page to
>>>> cow domain */
>>>> -------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> is there anything that can already prevent such racing or is this really
>>>> can happen?
>>>
>>> I think this race can happen.
>>
>> Through a p2m entry in a domain. Is this the same domain as the one
>> for which quest_remove_page is executing?  Then all is serialized
>> through the p2m lock, no race.
>
> Right, that's what I was missing.  Because guest_remove_page has called
> get_gfn_query on the gfn, and not yet called put_gfn(),
> page_make_sharable() can't be running.  All is well. :)

I see. Good to see everything working well. :)

Thanks,

Nai

>
> Thanks,
>
> Tim.
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is this a racing bug in page_make_sharable()?
  2013-01-10 16:36         ` Andres Lagar-Cavilla
@ 2013-01-10 17:25           ` Tim Deegan
  2013-01-11  2:46             ` Nai Xia
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2013-01-10 17:25 UTC (permalink / raw)
  To: Andres Lagar-Cavilla
  Cc: Lixiuchang, Xiaowei Yang, Nai Xia, Luohao (brian), xen-devel

At 11:36 -0500 on 10 Jan (1357817806), Andres Lagar-Cavilla wrote:
> Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks.
> 
> Having said that...
> On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote:
> 
> > Hi,
> > 
> > At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote:
> >> I think I can construct a bug by interleaving the two code paths:
> >> 
> >> in guest_remove_page()              |              in page_make_sharable()
> >> ------------------------------------------------------------------------------------------------------------------------------
> >> if ( p2m_is_shared(p2mt) )                       .....
> >> ...                                              .....
> >> page = mfn_to_page(mfn);                         .....
> >>                                                 .....
> >> 
> >>                                                 if ( 
> >>                                                 !get_page_and_type(page, 
> >>                                                 d, PGT_shared_page) )    
> >>                                                 // success
> >> 
> >>                                                 .........
> >>                                                 if ( page->count_info != 
> >>                                                 (PGC_allocated | (2 + 
> >>                                                 expected_refcnt)) ) // 
> >>                                                 also pass
> >> 
> >> 
> >> if ( unlikely(!get_page(page, d)) )
> >> 
> >> /* go on to remove page */                       /* go on to add page to 
> >> cow domain */
> >> -------------------------------------------------------------------------------------------------------------------------------------
> >> 
> >> 
> >> is there anything that can already prevent such racing or is this really 
> >> can happen?
> > 
> > I think this race can happen.  
> 
> Through a p2m entry in a domain. Is this the same domain as the one
> for which quest_remove_page is executing?  Then all is serialized
> through the p2m lock, no race.

Right, that's what I was missing.  Because guest_remove_page has called
get_gfn_query on the gfn, and not yet called put_gfn(),
page_make_sharable() can't be running.  All is well. :)

Thanks,

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is this a racing bug in page_make_sharable()?
  2013-01-10 13:00       ` Tim Deegan
@ 2013-01-10 16:36         ` Andres Lagar-Cavilla
  2013-01-10 17:25           ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Andres Lagar-Cavilla @ 2013-01-10 16:36 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Luohao (brian),
	Lixiuchang, Andres Lagar-Cavilla, Xiaowei Yang, Nai Xia,
	xen-devel

Hi there, thanks for the report. Sorry I didn't respond earlier, fell through the cracks.

Having said that...
On Jan 10, 2013, at 8:00 AM, Tim Deegan <tim@xen.org> wrote:

> Hi,
> 
> At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote:
>> I think I can construct a bug by interleaving the two code paths:
>> 
>> in guest_remove_page()              |              in page_make_sharable()
>> ------------------------------------------------------------------------------------------------------------------------------
>> if ( p2m_is_shared(p2mt) )                       .....
>> ...                                              .....
>> page = mfn_to_page(mfn);                         .....
>>                                                 .....
>> 
>>                                                 if ( 
>>                                                 !get_page_and_type(page, 
>>                                                 d, PGT_shared_page) )    
>>                                                 // success
>> 
>>                                                 .........
>>                                                 if ( page->count_info != 
>>                                                 (PGC_allocated | (2 + 
>>                                                 expected_refcnt)) ) // 
>>                                                 also pass
>> 
>> 
>> if ( unlikely(!get_page(page, d)) )
>> 
>> /* go on to remove page */                       /* go on to add page to 
>> cow domain */
>> -------------------------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> is there anything that can already prevent such racing or is this really 
>> can happen?
> 
> I think this race can happen.  

I beg to disagree.

Both page_make_sharable and page_make_private are protected by the per-mfn page lock (see __grab_shared_page). This lock will serialize strictly page_make_sharable against the page_make_private in quest_remove_page -> mem_sharing_unshare_page -> page_make_private.

So page_make_sharable will execute atomically, add to cow, and then unshare will come in, make the page private again, remove from cow and hand over to the domain for whom the page is being removed.

Or, the page has other shared references in which case page_make_sharable is a no-op, and page_make_private will generate a new different page (mfn).

Or, and this is the tricky case you refer to, page_make_private will complete before page_make_sharable kicks in. The question then is: how did page_make_sharable even got to be called? The mfn is being nominated for sharing. How? Through a p2m entry in a domain. Is this the same domain as the one for which quest_remove_page is executing? Then all is serialized through the p2m lock, no race. Is this a different domain? Then the page is already shared, no race as per above. 

Finally, is this a different domain and the page is not shared? This Should Not Be! As in, never happens with the current APIs, and IMHO is borderline uncheckable for.

Hope that helps,
Andres

> I'm not sure exactly what the effect
> is, though.  I guess the page ends up belonging to dom_cow, but
> without the PGC_allocated bit set.  So when it becomes unshared again,
> it's immediately freed. :(
> 
> Andres, what do you think?
> 
> Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is this a racing bug in page_make_sharable()?
  2012-12-27 15:35     ` Nai Xia
@ 2013-01-10 13:00       ` Tim Deegan
  2013-01-10 16:36         ` Andres Lagar-Cavilla
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2013-01-10 13:00 UTC (permalink / raw)
  To: Nai Xia
  Cc: Xiaowei Yang, Andres Lagar-Cavilla, Luohao (brian),
	Lixiuchang, xen-devel

Hi,

At 23:35 +0800 on 27 Dec (1356651327), Nai Xia wrote:
> I think I can construct a bug by interleaving the two code paths:
> 
> in guest_remove_page()              |              in page_make_sharable()
> ------------------------------------------------------------------------------------------------------------------------------
> if ( p2m_is_shared(p2mt) )                       .....
> ...                                              .....
> page = mfn_to_page(mfn);                         .....
>                                                  .....
> 
>                                                  if ( 
>                                                  !get_page_and_type(page, 
>                                                  d, PGT_shared_page) )    
>                                                  // success
> 
>                                                  .........
>                                                  if ( page->count_info != 
>                                                  (PGC_allocated | (2 + 
>                                                  expected_refcnt)) ) // 
>                                                  also pass
> 
> 
> if ( unlikely(!get_page(page, d)) )
> 
> /* go on to remove page */                       /* go on to add page to 
> cow domain */
> -------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> is there anything that can already prevent such racing or is this really 
> can happen?

I think this race can happen.  I'm not sure exactly what the effect
is, though.  I guess the page ends up belonging to dom_cow, but
without the PGC_allocated bit set.  So when it becomes unshared again,
it's immediately freed. :(

Andres, what do you think?

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is this a racing bug in page_make_sharable()?
  2012-01-17 10:53   ` Tim Deegan
@ 2012-12-27 15:35     ` Nai Xia
  2013-01-10 13:00       ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Nai Xia @ 2012-12-27 15:35 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Lixiuchang, Xiaowei Yang, Luohao (brian), xen-devel, Grzegorz Milos

Hi all,

Last time I reported this bug. And I now see some changes in your Xen
git master branch.
However, I think the problem still remains for ref-checking in page
migration to dom_cow.

I think I can construct a bug by interleaving the two code paths:

in guest_remove_page()              |              in page_make_sharable()
------------------------------------------------------------------------------------------------------------------------------
if ( p2m_is_shared(p2mt) )                       .....
...                                              .....
page = mfn_to_page(mfn);                         .....
                                                  .....

                                                  if ( !get_page_and_type(page, d, PGT_shared_page) )    // success

                                                  .........
                                                  if ( page->count_info != (PGC_allocated | (2 + expected_refcnt)) ) // also pass


if ( unlikely(!get_page(page, d)) )

/* go on to remove page */                       /* go on to add page to cow domain */
-------------------------------------------------------------------------------------------------------------------------------------


is there anything that can already prevent such racing or is this really can happen?


Thanks,

Nai Xia


On 2012年01月17日 18:53, Tim Deegan wrote:
> At 22:43 +0800 on 16 Jan (1326753834), Nai Xia wrote:
>> Hi Grzegorz,
>>
>> As I understand, the purpose of the code in page_make_sharable()
>> checking the ref count is to ensure that nobody unexpected is working
>> on the page, and so we can migrate it to dom_cow, right?
>>
>> ====
>>      /* Check if the ref count is 2. The first from PGT_allocated, and
>>       * the second from get_page_and_type at the top of this function */
>>      if(page->count_info != (PGC_allocated | (2 + expected_refcnt)))
>>      {
>>          /* Return type count back to zero */
>>          put_page_and_type(page);
>>          spin_unlock(&d->page_alloc_lock);
>>          return -E2BIG;
>>      }
>> ====
>>
>> However, it seems to me that this ref check and the following page
>> migration is not atomic( although the operations for type_info ref
>> check seems good) i.e. it's possible that it passed this ref
>> check but just before it goes to dom_cow, someone else gets this page?
>
> Yes, there are a number of races around the p2m code; Anders
> Lagar-Cavilla has been working on fixing the locking around p2m lookups
> to take care of this.
>
> Tim.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is this a racing bug in page_make_sharable()?
  2012-01-16 14:43 ` Nai Xia
@ 2012-01-17 10:53   ` Tim Deegan
  2012-12-27 15:35     ` Nai Xia
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2012-01-17 10:53 UTC (permalink / raw)
  To: Nai Xia; +Cc: xen-devel, Grzegorz Milos

At 22:43 +0800 on 16 Jan (1326753834), Nai Xia wrote:
> Hi Grzegorz,
> 
> As I understand, the purpose of the code in page_make_sharable()
> checking the ref count is to ensure that nobody unexpected is working
> on the page, and so we can migrate it to dom_cow, right?
> 
> ====
>     /* Check if the ref count is 2. The first from PGT_allocated, and
>      * the second from get_page_and_type at the top of this function */
>     if(page->count_info != (PGC_allocated | (2 + expected_refcnt)))
>     {
>         /* Return type count back to zero */
>         put_page_and_type(page);
>         spin_unlock(&d->page_alloc_lock);
>         return -E2BIG;
>     }
> ====
> 
> However, it seems to me that this ref check and the following page
> migration is not atomic( although the operations for type_info ref
> check seems good) i.e. it's possible that it passed this ref
> check but just before it goes to dom_cow, someone else gets this page?

Yes, there are a number of races around the p2m code; Anders
Lagar-Cavilla has been working on fixing the locking around p2m lookups
to take care of this.

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Is this a racing bug in page_make_sharable()?
       [not found] <4F14345B.4040807@gmail.com>
@ 2012-01-16 14:43 ` Nai Xia
  2012-01-17 10:53   ` Tim Deegan
  0 siblings, 1 reply; 8+ messages in thread
From: Nai Xia @ 2012-01-16 14:43 UTC (permalink / raw)
  To: Grzegorz Milos; +Cc: xen-devel

Hi Grzegorz,

As I understand, the purpose of the code in page_make_sharable()
checking the ref count is to ensure that nobody unexpected is working
on the page, and so we can migrate it to dom_cow, right?

====
    /* Check if the ref count is 2. The first from PGT_allocated, and
     * the second from get_page_and_type at the top of this function */
    if(page->count_info != (PGC_allocated | (2 + expected_refcnt)))
    {
        /* Return type count back to zero */
        put_page_and_type(page);
        spin_unlock(&d->page_alloc_lock);
        return -E2BIG;
    }
====

However, it seems to me that this ref check and the following page
migration is not atomic( although the operations for type_info ref
check seems good) i.e. it's possible that it passed this ref
check but just before it goes to dom_cow, someone else gets this page?
As far as I know, in Linux kernel's similar scenarios, they do ref-freezing
tricks.

And if we don't need to worry about this racing case, then what is
this check trying to ensure?


Thanks,
Nai

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-11  2:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-16 14:51 Is this a racing bug in page_make_sharable()? Nai Xia
     [not found] <4F14345B.4040807@gmail.com>
2012-01-16 14:43 ` Nai Xia
2012-01-17 10:53   ` Tim Deegan
2012-12-27 15:35     ` Nai Xia
2013-01-10 13:00       ` Tim Deegan
2013-01-10 16:36         ` Andres Lagar-Cavilla
2013-01-10 17:25           ` Tim Deegan
2013-01-11  2:46             ` Nai Xia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.