From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tamas K Lengyel Subject: Re: vm_event regression in 4.7 Date: Fri, 5 Feb 2016 13:34:21 -0700 Message-ID: References: <56B1511A.6010505@citrix.com> <56B1D7DF.6050201@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3542645751380842182==" Return-path: In-Reply-To: <56B1D7DF.6050201@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Ian Campbell , Xen-devel List-Id: xen-devel@lists.xenproject.org --===============3542645751380842182== Content-Type: multipart/alternative; boundary=001a114fbe90af9f87052b0bc6a3 --001a114fbe90af9f87052b0bc6a3 Content-Type: text/plain; charset=UTF-8 On Wed, Feb 3, 2016 at 3:35 AM, Andrew Cooper wrote: > On 03/02/16 01:32, Tamas K Lengyel wrote: > > > > On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper > wrote: > >> On 03/02/2016 00:51, Tamas K Lengyel wrote: >> >> Hello all, >> with the latest master branch of Xen there is a regression enabling >> vm_event on a domain. If an event listener was previously active on the >> domain it is now not possible to reenable events as the domctl returns >> -EINVAL. The problem seems to stem from activating the magic page for >> vm_event using prepare_ring_for_helper as it returns NULL. Further looking >> into where things go wrong within that function it seems the page type >> returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn >> (0xffffffffffffffff) and then it hits "Error path: not a suitable GFN at >> all". >> >> Can anyone point me to which change or what may be causing this? >> >> >> Did the previous event listener replace the page it stole from guest >> physmap for ring purposes when it exited? >> > > Ah, here is what seems to be the problem. Previously it was not required > to do this during teardown. What we had was libxc would check if it can map > the ring page with xc_map_foreign_pages, and it would repopulate the page > if it failed before running xc_vm_event_enable. However, now it seems > xc_map_foreign_pages return non-NULL the second time around as well, either > though the page is not in the physmap. > > > This is the bug then. If there isn't a page in the physmap, > xc_map_foreign_pages() should indicate an error. > > If I enforce libxc to run populate_physmap then I can get vm_event to > initialize properly again. So the change seems to relate somehow the > behavior of xc_map_foreign_pages. > > > This seems likely due to the splitting out of libxenforeignmem from libxc, > which included the the merging of 4? almost identical map_foreign_$FOO() > functions into one. It is likely that there is a subtle change in > behaviour on an error path. > I've added a bunch of debug messages and it gets all the way down to IOCTL_PRIVCMD_MMAPBATCH_V2 without an error in tools/libs/foreignmemory/linux.c. That ioctl returns 0 too, so I'm not sure where the error comes from. Compared to the flow in Xen 4.6 I don't really see what changed.. Tamas --001a114fbe90af9f87052b0bc6a3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Wed, Feb 3, 2016 at 3:35 AM, Andrew Cooper <andrew.cooper3@= citrix.com> wrote:
=20 =20 =20
On 03/02/16 01:32, Tamas K Lengyel wrote:
=20


On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
On 03/02/2016 00:51, Tamas K Lengyel wrote:
Hello all,
with the latest master branch of Xen there is a regression enabling vm_event on a domain. If an event listener was previously active on the domain it is now not possible to reenable events as the domctl returns -EINVAL. The problem seems to stem from activating the magic page for vm_event using prepare_ring_for_helper as it returns NULL. Further looking into where things go wrong within that function it seems the page type returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn (0xffffffffffffffff) and then it hits "Err= or path: not a suitable GFN at all".

Can anyone point me to which change or what may be causing this?

Did the previous event listener replace the page it stole from guest physmap for ring purposes when it exited?

Ah, here is what seems to be the problem. Previously it was not required to do this during teardown. What we had was libxc would check if it can map the ring page with xc_map_foreign_pages, and it would repopulate the page if it failed before running xc_vm_event_enable. However, now it seems xc_map_foreign_pages return non-NULL the second time around as well, either though the page is not in the physmap.

This is the bug then.=C2=A0 If there isn't a page in the physmap, xc_map_foreign_pages() should indicate an error.

If I enforce libxc to run populate_physmap then I can get vm_event to initialize properly again. So the change seems to relate somehow the behavior of xc_map_foreign_pages.

This seems likely due to the splitting out of libxenforeignmem from libxc, which included the the merging of 4? almost identical map_foreign_$FOO() functions into one.=C2=A0 It is likely that there is= a subtle change in behaviour on an error path.

I= 9;ve added a bunch of debug messages and it gets all the way down to IOCTL_= PRIVCMD_MMAPBATCH_V2 without an error in tools/libs/foreignmemory/linux.c. = That ioctl returns 0 too, so I'm not sure where the error comes from. C= ompared to the flow in Xen 4.6 I don't really see what changed..
Tamas

--001a114fbe90af9f87052b0bc6a3-- --===============3542645751380842182== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3542645751380842182==--