xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Domain reference counting breakage
@ 2021-02-22 15:26 Andrew Cooper
  2021-02-22 16:49 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2021-02-22 15:26 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Ian Jackson

At the moment, attempting to create an HVM guest with max_gnttab_frames of 0
causes Xen to explode on the:

  BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);

in _domain_destroy().  Intrumenting Xen a little more to highlight where the
modifcations to d->refcnt occur:

  (d6) --- Xen Test Framework ---
  (d6) Environment: PV 64bit (Long mode 4 levels)
  (d6) Testing domain create:
  (d6) Testing x86 PVH Shadow
  (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402046b5 [domain_create+0x1c3/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
  (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040321b11 [share_xen_page_with_guest+0x175/0x190], stk e010:ffff83003fea7ce8, dr6 ffff0ff1
  (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d04022595b [assign_pages+0x223/0x2b7], stk e010:ffff83003fea7c68, dr6 ffff0ff1
  (d6) (XEN) grant_table.c:1934: Bad grant table sizes: grant 0, maptrack 0
  (d6) (XEN) *** d1 ref 3
  (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402048bc [domain_create+0x3ca/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
  (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040225e11 [free_domheap_pages+0x422/0x44a], stk e010:ffff83003fea7c38, dr6 ffff0ff1
  (d6) (XEN) Xen BUG at domain.c:450
  (d6) (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y  Not tainted ]----
  (d6) (XEN) CPU:    0
  (d6) (XEN) RIP:    e008:[<ffff82d040204366>] common/domain.c#_domain_destroy+0x69/0x6b

the problem becomes apparent.

First of all, there is a reference count leak - share_xen_page_with_guest()'s
reference isn't freed anywhere.

However, the main problem is the 4th #DB above is this atomic_set()

  d->is_dying = DOMDYING_dead;
  if ( hardware_domain == d )
      hardware_domain = old_hwdom;
  printk("*** %pd ref %d\n", d, atomic_read(&d->refcnt));
  atomic_set(&d->refcnt, DOMAIN_DESTROYED);

in the domain_create() error path, which happens before free_domheap_pages()
drops the ref acquired assign_pages(), and destroys still-relevant information
pertaining to the guest.

The best options is probably to use atomic_sub() to subtract (DOMAIN_DESTROYED
+ 1) from the current refcount, which preserves the extra refs taken by
share_xen_page_with_guest() and assign_pages() until they can be freed
appropriately.

~Andrew


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Domain reference counting breakage
  2021-02-22 15:26 Domain reference counting breakage Andrew Cooper
@ 2021-02-22 16:49 ` Jan Beulich
  2021-02-22 17:11   ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2021-02-22 16:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Ian Jackson, Xen-devel

On 22.02.2021 16:26, Andrew Cooper wrote:
> At the moment, attempting to create an HVM guest with max_gnttab_frames of 0
> causes Xen to explode on the:
> 
>   BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);
> 
> in _domain_destroy().  Intrumenting Xen a little more to highlight where the
> modifcations to d->refcnt occur:
> 
>   (d6) --- Xen Test Framework ---
>   (d6) Environment: PV 64bit (Long mode 4 levels)
>   (d6) Testing domain create:
>   (d6) Testing x86 PVH Shadow
>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402046b5 [domain_create+0x1c3/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040321b11 [share_xen_page_with_guest+0x175/0x190], stk e010:ffff83003fea7ce8, dr6 ffff0ff1
>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d04022595b [assign_pages+0x223/0x2b7], stk e010:ffff83003fea7c68, dr6 ffff0ff1
>   (d6) (XEN) grant_table.c:1934: Bad grant table sizes: grant 0, maptrack 0
>   (d6) (XEN) *** d1 ref 3
>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402048bc [domain_create+0x3ca/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040225e11 [free_domheap_pages+0x422/0x44a], stk e010:ffff83003fea7c38, dr6 ffff0ff1
>   (d6) (XEN) Xen BUG at domain.c:450
>   (d6) (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y  Not tainted ]----
>   (d6) (XEN) CPU:    0
>   (d6) (XEN) RIP:    e008:[<ffff82d040204366>] common/domain.c#_domain_destroy+0x69/0x6b
> 
> the problem becomes apparent.
> 
> First of all, there is a reference count leak - share_xen_page_with_guest()'s
> reference isn't freed anywhere.
> 
> However, the main problem is the 4th #DB above is this atomic_set()
> 
>   d->is_dying = DOMDYING_dead;
>   if ( hardware_domain == d )
>       hardware_domain = old_hwdom;
>   printk("*** %pd ref %d\n", d, atomic_read(&d->refcnt));
>   atomic_set(&d->refcnt, DOMAIN_DESTROYED);
> 
> in the domain_create() error path, which happens before free_domheap_pages()
> drops the ref acquired assign_pages(), and destroys still-relevant information
> pertaining to the guest.

I think the original idea was that by the time of the atomic_set()
all operations potentially altering the refcount are done. This
then allowed calling free_xenheap_pages() on e.g. the shared info
page without regard to whether the page's reference (installed by
share_xen_page_with_guest()) was actually dropped (i.e.
regardless of whether it's the domain create error path or proper
domain cleanup). With this assumption, no actual leak of anything
would occur, but of course this isn't the "natural" way of how
things should get cleaned up.

According to this original model, free_domheap_pages() may not be
called anymore past that point (for domain owned pages, which
really means put_page() then; anonymous pages are of course fine
to be freed late).

> The best options is probably to use atomic_sub() to subtract (DOMAIN_DESTROYED
> + 1) from the current refcount, which preserves the extra refs taken by
> share_xen_page_with_guest() and assign_pages() until they can be freed
> appropriately.

First of all - why DOMAIN_DESTROYED+1? There's no extra reference
you ought to be dropping here. Or else what's the counterpart of
acquiring the respective reference?

And then of course this means Xen heap pages cannot be cleaned up
anymore by merely calling free_xenheap_pages() - to get rid of
the associated reference, all of them would need to undergo
put_page_alloc_ref(), which in turn requires obtaining an extra
reference, which in turn introduces another of these ugly
theoretical error cases (because get_page() can in principle fail).

Therefore I wouldn't outright discard the option of sticking to
the original model. It would then better be properly described
somewhere, and we would likely want to put some check in place to
make sure such put_page() can't go unnoticed anymore when sitting
too late on the cleanup path (which may be difficult to arrange
for).

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Domain reference counting breakage
  2021-02-22 16:49 ` Jan Beulich
@ 2021-02-22 17:11   ` Andrew Cooper
  2021-02-23  8:20     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2021-02-22 17:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Ian Jackson, Xen-devel

On 22/02/2021 16:49, Jan Beulich wrote:
> On 22.02.2021 16:26, Andrew Cooper wrote:
>> At the moment, attempting to create an HVM guest with max_gnttab_frames of 0
>> causes Xen to explode on the:
>>
>>   BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);
>>
>> in _domain_destroy().  Intrumenting Xen a little more to highlight where the
>> modifcations to d->refcnt occur:
>>
>>   (d6) --- Xen Test Framework ---
>>   (d6) Environment: PV 64bit (Long mode 4 levels)
>>   (d6) Testing domain create:
>>   (d6) Testing x86 PVH Shadow
>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402046b5 [domain_create+0x1c3/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040321b11 [share_xen_page_with_guest+0x175/0x190], stk e010:ffff83003fea7ce8, dr6 ffff0ff1
>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d04022595b [assign_pages+0x223/0x2b7], stk e010:ffff83003fea7c68, dr6 ffff0ff1
>>   (d6) (XEN) grant_table.c:1934: Bad grant table sizes: grant 0, maptrack 0
>>   (d6) (XEN) *** d1 ref 3
>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402048bc [domain_create+0x3ca/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040225e11 [free_domheap_pages+0x422/0x44a], stk e010:ffff83003fea7c38, dr6 ffff0ff1
>>   (d6) (XEN) Xen BUG at domain.c:450
>>   (d6) (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y  Not tainted ]----
>>   (d6) (XEN) CPU:    0
>>   (d6) (XEN) RIP:    e008:[<ffff82d040204366>] common/domain.c#_domain_destroy+0x69/0x6b
>>
>> the problem becomes apparent.
>>
>> First of all, there is a reference count leak - share_xen_page_with_guest()'s
>> reference isn't freed anywhere.
>>
>> However, the main problem is the 4th #DB above is this atomic_set()
>>
>>   d->is_dying = DOMDYING_dead;
>>   if ( hardware_domain == d )
>>       hardware_domain = old_hwdom;
>>   printk("*** %pd ref %d\n", d, atomic_read(&d->refcnt));
>>   atomic_set(&d->refcnt, DOMAIN_DESTROYED);
>>
>> in the domain_create() error path, which happens before free_domheap_pages()
>> drops the ref acquired assign_pages(), and destroys still-relevant information
>> pertaining to the guest.
> I think the original idea was that by the time of the atomic_set()
> all operations potentially altering the refcount are done. This
> then allowed calling free_xenheap_pages() on e.g. the shared info
> page without regard to whether the page's reference (installed by
> share_xen_page_with_guest()) was actually dropped (i.e.
> regardless of whether it's the domain create error path or proper
> domain cleanup). With this assumption, no actual leak of anything
> would occur, but of course this isn't the "natural" way of how
> things should get cleaned up.
>
> According to this original model, free_domheap_pages() may not be
> called anymore past that point (for domain owned pages, which
> really means put_page() then; anonymous pages are of course fine
> to be freed late).

And I presume this is written down in the usual place?  (I.e. nowhere)

>> The best options is probably to use atomic_sub() to subtract (DOMAIN_DESTROYED
>> + 1) from the current refcount, which preserves the extra refs taken by
>> share_xen_page_with_guest() and assign_pages() until they can be freed
>> appropriately.
> First of all - why DOMAIN_DESTROYED+1? There's no extra reference
> you ought to be dropping here. Or else what's the counterpart of
> acquiring the respective reference?

The original atomic_set(1) needs dropping (somewhere around) here.

> And then of course this means Xen heap pages cannot be cleaned up
> anymore by merely calling free_xenheap_pages() - to get rid of
> the associated reference, all of them would need to undergo
> put_page_alloc_ref(), which in turn requires obtaining an extra
> reference, which in turn introduces another of these ugly
> theoretical error cases (because get_page() can in principle fail).
>
> Therefore I wouldn't outright discard the option of sticking to
> the original model. It would then better be properly described
> somewhere, and we would likely want to put some check in place to
> make sure such put_page() can't go unnoticed anymore when sitting
> too late on the cleanup path (which may be difficult to arrange
> for).

I agree that some rules are in desperate need of writing down.

However, given the catastrophic mess that is our reference counting and
cleanup paths, and how easy it is to get things wrong, I'm very
disinclined to permit a rule which forces divergent cleanup logic.

Making the cleanup paths non-divergent is the fix to removing swathes of
dubious/buggy logic, and removing a steady stream of memory leaks, etc.

In particular, I don't think its acceptable to special case the cleanup
rules in the domain_create() error path simply because we blindly reset
the reference count when it still contains real information.

~Andrew


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Domain reference counting breakage
  2021-02-22 17:11   ` Andrew Cooper
@ 2021-02-23  8:20     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2021-02-23  8:20 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Ian Jackson, Xen-devel

On 22.02.2021 18:11, Andrew Cooper wrote:
> On 22/02/2021 16:49, Jan Beulich wrote:
>> On 22.02.2021 16:26, Andrew Cooper wrote:
>>> At the moment, attempting to create an HVM guest with max_gnttab_frames of 0
>>> causes Xen to explode on the:
>>>
>>>   BUG_ON(atomic_read(&d->refcnt) != DOMAIN_DESTROYED);
>>>
>>> in _domain_destroy().  Intrumenting Xen a little more to highlight where the
>>> modifcations to d->refcnt occur:
>>>
>>>   (d6) --- Xen Test Framework ---
>>>   (d6) Environment: PV 64bit (Long mode 4 levels)
>>>   (d6) Testing domain create:
>>>   (d6) Testing x86 PVH Shadow
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402046b5 [domain_create+0x1c3/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040321b11 [share_xen_page_with_guest+0x175/0x190], stk e010:ffff83003fea7ce8, dr6 ffff0ff1
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d04022595b [assign_pages+0x223/0x2b7], stk e010:ffff83003fea7c68, dr6 ffff0ff1
>>>   (d6) (XEN) grant_table.c:1934: Bad grant table sizes: grant 0, maptrack 0
>>>   (d6) (XEN) *** d1 ref 3
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d0402048bc [domain_create+0x3ca/0x7f1], stk e010:ffff83003fea7d58, dr6 ffff0ff1
>>>   (d6) (XEN) d0v0 Hit #DB in Xen context: e008:ffff82d040225e11 [free_domheap_pages+0x422/0x44a], stk e010:ffff83003fea7c38, dr6 ffff0ff1
>>>   (d6) (XEN) Xen BUG at domain.c:450
>>>   (d6) (XEN) ----[ Xen-4.15-unstable  x86_64  debug=y  Not tainted ]----
>>>   (d6) (XEN) CPU:    0
>>>   (d6) (XEN) RIP:    e008:[<ffff82d040204366>] common/domain.c#_domain_destroy+0x69/0x6b
>>>
>>> the problem becomes apparent.
>>>
>>> First of all, there is a reference count leak - share_xen_page_with_guest()'s
>>> reference isn't freed anywhere.
>>>
>>> However, the main problem is the 4th #DB above is this atomic_set()
>>>
>>>   d->is_dying = DOMDYING_dead;
>>>   if ( hardware_domain == d )
>>>       hardware_domain = old_hwdom;
>>>   printk("*** %pd ref %d\n", d, atomic_read(&d->refcnt));
>>>   atomic_set(&d->refcnt, DOMAIN_DESTROYED);
>>>
>>> in the domain_create() error path, which happens before free_domheap_pages()
>>> drops the ref acquired assign_pages(), and destroys still-relevant information
>>> pertaining to the guest.
>> I think the original idea was that by the time of the atomic_set()
>> all operations potentially altering the refcount are done. This
>> then allowed calling free_xenheap_pages() on e.g. the shared info
>> page without regard to whether the page's reference (installed by
>> share_xen_page_with_guest()) was actually dropped (i.e.
>> regardless of whether it's the domain create error path or proper
>> domain cleanup). With this assumption, no actual leak of anything
>> would occur, but of course this isn't the "natural" way of how
>> things should get cleaned up.
>>
>> According to this original model, free_domheap_pages() may not be
>> called anymore past that point (for domain owned pages, which
>> really means put_page() then; anonymous pages are of course fine
>> to be freed late).
> 
> And I presume this is written down in the usual place?  (I.e. nowhere)

I'm afraid so, hence me starting the explanation with "I think ...".

>>> The best options is probably to use atomic_sub() to subtract (DOMAIN_DESTROYED
>>> + 1) from the current refcount, which preserves the extra refs taken by
>>> share_xen_page_with_guest() and assign_pages() until they can be freed
>>> appropriately.
>> First of all - why DOMAIN_DESTROYED+1? There's no extra reference
>> you ought to be dropping here. Or else what's the counterpart of
>> acquiring the respective reference?
> 
> The original atomic_set(1) needs dropping (somewhere around) here.

Ah, right.

>> And then of course this means Xen heap pages cannot be cleaned up
>> anymore by merely calling free_xenheap_pages() - to get rid of
>> the associated reference, all of them would need to undergo
>> put_page_alloc_ref(), which in turn requires obtaining an extra
>> reference, which in turn introduces another of these ugly
>> theoretical error cases (because get_page() can in principle fail).
>>
>> Therefore I wouldn't outright discard the option of sticking to
>> the original model. It would then better be properly described
>> somewhere, and we would likely want to put some check in place to
>> make sure such put_page() can't go unnoticed anymore when sitting
>> too late on the cleanup path (which may be difficult to arrange
>> for).
> 
> I agree that some rules are in desperate need of writing down.
> 
> However, given the catastrophic mess that is our reference counting and
> cleanup paths, and how easy it is to get things wrong, I'm very
> disinclined to permit a rule which forces divergent cleanup logic.
> 
> Making the cleanup paths non-divergent is the fix to removing swathes of
> dubious/buggy logic, and removing a steady stream of memory leaks, etc.
> 
> In particular, I don't think its acceptable to special case the cleanup
> rules in the domain_create() error path simply because we blindly reset
> the reference count when it still contains real information.

Of course I agree in principle. The question is whether this is a
good time for such a more extensive rework. IOW I wonder if there's
an immediate bug to be fixed now and then some rework to be done
for 4.16.

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-23  8:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 15:26 Domain reference counting breakage Andrew Cooper
2021-02-22 16:49 ` Jan Beulich
2021-02-22 17:11   ` Andrew Cooper
2021-02-23  8:20     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).