how to handle paged hypercall args?

* how to handle paged hypercall args?
@ 2010-11-10  8:58 Olaf Hering
  2010-11-11 14:33 ` Olaf Hering
  0 siblings, 1 reply; 39+ messages in thread
From: Olaf Hering @ 2010-11-10  8:58 UTC (permalink / raw)
  To: xen-devel

Hello,

I'm running into the BUG_ON() after an incomplete XENMEM_decrease_reservation
HYPERVISOR_memory_op call in balloon.c:decrease_reservation().

The reason for that is the huge number of nr_extents, were many of them
are paged out pages. Because the are paged out, they can just be dropped
from the xenpaging point of view, no need to page them in before calling
p2m_remove_page() for the paged-out gfn. Whatever strategy is chosen,
the hypercall will be preempted.

Because the hypercall is preempted, the arg is copied several times from
the guest to the stack with copy_from_guest(). Now there is appearently
nothing that stops the xenpaging binary in dom0 from making progress and
eventually nominating the gfn which holds the guests kernel stack page.
This lets __hvm_copy() return HVMCOPY_gfn_paged_out, which means
copy_from_user_hvm() "fails", and this lets the whole hypercall fail.

Now in my particular case, its the first copy_from_user_hvm() and I can
probably come up with a simple patch which let copy_from_user_hvm()
return some sort of -EAGAIN. This could be used in do_memory_op() to
just restart the hypercall once more until the gfn which holds args is
available again. Then my decrease_reservation() bug would have a
workaround and I could move on.

However, I think there is nothing that would prevent the xenpaging
binary from nominating the guest gfn while the actual work is done
during the hypercall and then copy_to_user_hvm would fail.
How should other hypercalls deal with the situation that the guest gfn
gets into the paged-out state? Can they just sleep and do some sort of
polling until the page is accessible again? Was this case considered
while implementing xenpaging?

I'm currently reading through the callers of __hvm_copy(). Some of them
detect HVMCOPY_gfn_paged_out and so some sort of retry. Others just
ignore the return codes, or turn them into generic errors. In the case
of copy_from/to_guest each caller needs an audit if a retry is possible.

Olaf

^ permalink raw reply	[flat|nested] 39+ messages in thread