From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: vm_event regression in 4.7 Date: Wed, 3 Feb 2016 10:35:11 +0000 Message-ID: <56B1D7DF.6050201@citrix.com> References: <56B1511A.6010505@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8323855292154185228==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tamas K Lengyel Cc: Ian Campbell , Xen-devel List-Id: xen-devel@lists.xenproject.org --===============8323855292154185228== Content-Type: multipart/alternative; boundary="------------070804060109090304040301" --------------070804060109090304040301 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit On 03/02/16 01:32, Tamas K Lengyel wrote: > > > On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper > > wrote: > > On 03/02/2016 00:51, Tamas K Lengyel wrote: >> Hello all, >> with the latest master branch of Xen there is a regression >> enabling vm_event on a domain. If an event listener was >> previously active on the domain it is now not possible to >> reenable events as the domctl returns -EINVAL. The problem seems >> to stem from activating the magic page for vm_event using >> prepare_ring_for_helper as it returns NULL. Further looking into >> where things go wrong within that function it seems the page type >> returned by __get_gfn_type_access is p2m_ram_logdirty with an >> invalid mfn (0xffffffffffffffff) and then it hits "Error path: >> not a suitable GFN at all". >> >> Can anyone point me to which change or what may be causing this? > > Did the previous event listener replace the page it stole from > guest physmap for ring purposes when it exited? > > > Ah, here is what seems to be the problem. Previously it was not > required to do this during teardown. What we had was libxc would check > if it can map the ring page with xc_map_foreign_pages, and it would > repopulate the page if it failed before running xc_vm_event_enable. > However, now it seems xc_map_foreign_pages return non-NULL the second > time around as well, either though the page is not in the physmap. This is the bug then. If there isn't a page in the physmap, xc_map_foreign_pages() should indicate an error. > If I enforce libxc to run populate_physmap then I can get vm_event to > initialize properly again. So the change seems to relate somehow the > behavior of xc_map_foreign_pages. This seems likely due to the splitting out of libxenforeignmem from libxc, which included the the merging of 4? almost identical map_foreign_$FOO() functions into one. It is likely that there is a subtle change in behaviour on an error path. ~Andrew --------------070804060109090304040301 Content-Type: text/html; charset="windows-1252" Content-Length: 4544 Content-Transfer-Encoding: quoted-printable
On 03/02/16 01:32, Tamas K Lengyel wrote:


On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
On 03/02/2016 00:51, Tamas K Lengyel wrote:
Hello all,
with the latest master branch of Xen there is a regression enabling vm_event on a domain. If an event listener was previously active on the domain it is now not possible to reenable events as the domctl returns -EINVAL. The problem seems to stem from activating the magic page for vm_event using prepare_ring_for_helper as it returns NULL. Further looking into where things go wrong within that function it seems the page type returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn (0xffffffffffffffff) and then it hits "Error path: not a suitable GFN at all".

Can anyone point me to which change or what may be causing this=3F

Did the previous event listener replace the page it stole from guest physmap for ring purposes when it exited=3F

Ah, here is what seems to be the problem. Previously it was not required to do this during teardown. What we had was libxc would check if it can map the ring page with xc_map_foreign_pages, and it would repopulate the page if it failed before running xc_vm_event_enable. However, now it seems xc_map_foreign_pages return non-NULL the second time around as well, either though the page is not in the physmap.

This is the bug then.=A0 If there isn't a page in the physmap, xc_map_foreign_pages() should indicate an error.

If I enforce libxc to run populate_physmap then I can get vm_event to initialize properly again. So the change seems to relate somehow the behavior of xc_map_foreign_pages.

This seems likely due to the splitting out of libxenforeignmem from libxc, which included the the merging of 4=3F almost identical map_foreign_$FOO() functions into one.=A0 It is likely that there is a subtle change in behaviour on an error path.

~Andrew
--------------070804060109090304040301-- --===============8323855292154185228== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============8323855292154185228==--