mem_sharing: summarized problems when domain is dying

* mem_sharing: summarized problems when domain is dying
@ 2011-01-21 16:19 Jui-Hao Chiang
  2011-01-21 16:29 ` George Dunlap
  2011-01-21 19:45 ` Jui-Hao Chiang
  0 siblings, 2 replies; 12+ messages in thread
From: Jui-Hao Chiang @ 2011-01-21 16:19 UTC (permalink / raw)
  To: Tim Deegan; +Cc: MaoXiaoyun, xen devel

Hi, Tim:

>From tinnycloud's result, here I summarize the current problem and
findings of mem_sharing due to domain dying.
(1) When domain is dying, alloc_domheap_page() and
set_shared_p2m_entry() would just fail. So the shr_lock is not enough
to ensure that the domain won't die in the middle of mem_sharing code.
As tinnycloud's code shows, is that better to use
rcu_lock_domain_by_id before calling the above two functions?

(2) What's the proper behavior of nominate/share/unshare when domain is dying?
The following is just my current guess. Please give comments as well.

(2.1) nominate: return fail; but needs to check blktap2's code to make
sure it understand and act properly (should be minor issue now)

(2.2) share: return success but skip the gfns of dying domain, i.e.,
we don't remove them from the hash list, and don't update their p2m
entry (set_shared_p2m_entry). We believe that the p2m_teardown will
clean up them later.

(2.3) unshare: it's the most problematic part. Because we are not able
to alloc_domheap_page at this moment, the only thing we can do is
simply skip the page and return. But what's the side effect?
(a) If p2m_teardown comes in, there is no problem. Just destroy it and done.
(b) hap_nested_page_fault: if we return fail, will this cause problem
to guest? or we can simply return success to cheat the guest. But
later the guest will trigger another page fault if it write the page
again.
(c) gnttab_map_grant_ref: this function specify must_succeed to
gfn_to_mfn_unshare(), which would BUG if unshare() fails.

Do we really need (b) and (c) in the last steps of domain dying? If
that's the case, we need to have a special alloc_domheap_page for
dying domain.

On Thu, Jan 20, 2011 at 4:19 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote:
> At 07:19 +0000 on 20 Jan (1295507976), MaoXiaoyun wrote:
>> Hi:
>>
>>             The latest BUG in mem_sharing_alloc_page from mem_sharing_unshare_page.
>>             I printed heap info, which shows plenty memory left.
>>             Could domain be NULL during in unshare, or should it be locked by rcu_lock_domain_by_id ?
>>
>
> 'd' probably isn't NULL; more likely is that the domain is not allowed
> to have any more memory.  You should look at the values of d->max_pages
> and d->tot_pages when the failure happens.
>
> Cheers.
>
> Tim.
>

Bests,
Jui-Hao

^ permalink raw reply	[flat|nested] 12+ messages in thread