* vm_event regression in 4.7 @ 2016-02-03 0:51 Tamas K Lengyel 2016-02-03 1:00 ` Andrew Cooper 0 siblings, 1 reply; 6+ messages in thread From: Tamas K Lengyel @ 2016-02-03 0:51 UTC (permalink / raw) To: Xen-devel [-- Attachment #1.1: Type: text/plain, Size: 672 bytes --] Hello all, with the latest master branch of Xen there is a regression enabling vm_event on a domain. If an event listener was previously active on the domain it is now not possible to reenable events as the domctl returns -EINVAL. The problem seems to stem from activating the magic page for vm_event using prepare_ring_for_helper as it returns NULL. Further looking into where things go wrong within that function it seems the page type returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn (0xffffffffffffffff) and then it hits "Error path: not a suitable GFN at all". Can anyone point me to which change or what may be causing this? Thanks, Tamas [-- Attachment #1.2: Type: text/html, Size: 811 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vm_event regression in 4.7 2016-02-03 0:51 vm_event regression in 4.7 Tamas K Lengyel @ 2016-02-03 1:00 ` Andrew Cooper 2016-02-03 1:32 ` Tamas K Lengyel 0 siblings, 1 reply; 6+ messages in thread From: Andrew Cooper @ 2016-02-03 1:00 UTC (permalink / raw) To: Tamas K Lengyel, Xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1197 bytes --] On 03/02/2016 00:51, Tamas K Lengyel wrote: > Hello all, > with the latest master branch of Xen there is a regression enabling > vm_event on a domain. If an event listener was previously active on > the domain it is now not possible to reenable events as the domctl > returns -EINVAL. The problem seems to stem from activating the magic > page for vm_event using prepare_ring_for_helper as it returns NULL. > Further looking into where things go wrong within that function it > seems the page type returned by __get_gfn_type_access is > p2m_ram_logdirty with an invalid mfn (0xffffffffffffffff) and then it > hits "Error path: not a suitable GFN at all". > > Can anyone point me to which change or what may be causing this? Did the previous event listener replace the page it stole from guest physmap for ring purposes when it exited? That error specifically means that the gfn chosen for the ring was not present when prepare_ring_for_helper() was called. A first gut feeling would point to the changed in HVM domain construction stemming from the DMLite work, but if event listening works for the first time and then fails, the magic page was suitably present the first time around. ~Andrew [-- Attachment #1.2: Type: text/html, Size: 2071 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vm_event regression in 4.7 2016-02-03 1:00 ` Andrew Cooper @ 2016-02-03 1:32 ` Tamas K Lengyel 2016-02-03 10:35 ` Andrew Cooper 0 siblings, 1 reply; 6+ messages in thread From: Tamas K Lengyel @ 2016-02-03 1:32 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1520 bytes --] On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 03/02/2016 00:51, Tamas K Lengyel wrote: > > Hello all, > with the latest master branch of Xen there is a regression enabling > vm_event on a domain. If an event listener was previously active on the > domain it is now not possible to reenable events as the domctl returns > -EINVAL. The problem seems to stem from activating the magic page for > vm_event using prepare_ring_for_helper as it returns NULL. Further looking > into where things go wrong within that function it seems the page type > returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn > (0xffffffffffffffff) and then it hits "Error path: not a suitable GFN at > all". > > Can anyone point me to which change or what may be causing this? > > > Did the previous event listener replace the page it stole from guest > physmap for ring purposes when it exited? > Ah, here is what seems to be the problem. Previously it was not required to do this during teardown. What we had was libxc would check if it can map the ring page with xc_map_foreign_pages, and it would repopulate the page if it failed before running xc_vm_event_enable. However, now it seems xc_map_foreign_pages return non-NULL the second time around as well, either though the page is not in the physmap. If I enforce libxc to run populate_physmap then I can get vm_event to initialize properly again. So the change seems to relate somehow the behavior of xc_map_foreign_pages. Tamas [-- Attachment #1.2: Type: text/html, Size: 2459 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vm_event regression in 4.7 2016-02-03 1:32 ` Tamas K Lengyel @ 2016-02-03 10:35 ` Andrew Cooper 2016-02-05 20:34 ` Tamas K Lengyel 0 siblings, 1 reply; 6+ messages in thread From: Andrew Cooper @ 2016-02-03 10:35 UTC (permalink / raw) To: Tamas K Lengyel; +Cc: Ian Campbell, Xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2054 bytes --] On 03/02/16 01:32, Tamas K Lengyel wrote: > > > On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper > <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote: > > On 03/02/2016 00:51, Tamas K Lengyel wrote: >> Hello all, >> with the latest master branch of Xen there is a regression >> enabling vm_event on a domain. If an event listener was >> previously active on the domain it is now not possible to >> reenable events as the domctl returns -EINVAL. The problem seems >> to stem from activating the magic page for vm_event using >> prepare_ring_for_helper as it returns NULL. Further looking into >> where things go wrong within that function it seems the page type >> returned by __get_gfn_type_access is p2m_ram_logdirty with an >> invalid mfn (0xffffffffffffffff) and then it hits "Error path: >> not a suitable GFN at all". >> >> Can anyone point me to which change or what may be causing this? > > Did the previous event listener replace the page it stole from > guest physmap for ring purposes when it exited? > > > Ah, here is what seems to be the problem. Previously it was not > required to do this during teardown. What we had was libxc would check > if it can map the ring page with xc_map_foreign_pages, and it would > repopulate the page if it failed before running xc_vm_event_enable. > However, now it seems xc_map_foreign_pages return non-NULL the second > time around as well, either though the page is not in the physmap. This is the bug then. If there isn't a page in the physmap, xc_map_foreign_pages() should indicate an error. > If I enforce libxc to run populate_physmap then I can get vm_event to > initialize properly again. So the change seems to relate somehow the > behavior of xc_map_foreign_pages. This seems likely due to the splitting out of libxenforeignmem from libxc, which included the the merging of 4? almost identical map_foreign_$FOO() functions into one. It is likely that there is a subtle change in behaviour on an error path. ~Andrew [-- Attachment #1.2: Type: text/html, Size: 4474 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vm_event regression in 4.7 2016-02-03 10:35 ` Andrew Cooper @ 2016-02-05 20:34 ` Tamas K Lengyel 2016-02-05 21:08 ` Tamas K Lengyel 0 siblings, 1 reply; 6+ messages in thread From: Tamas K Lengyel @ 2016-02-05 20:34 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Campbell, Xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2359 bytes --] On Wed, Feb 3, 2016 at 3:35 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 03/02/16 01:32, Tamas K Lengyel wrote: > > > > On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper <andrew.cooper3@citrix.com> > wrote: > >> On 03/02/2016 00:51, Tamas K Lengyel wrote: >> >> Hello all, >> with the latest master branch of Xen there is a regression enabling >> vm_event on a domain. If an event listener was previously active on the >> domain it is now not possible to reenable events as the domctl returns >> -EINVAL. The problem seems to stem from activating the magic page for >> vm_event using prepare_ring_for_helper as it returns NULL. Further looking >> into where things go wrong within that function it seems the page type >> returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn >> (0xffffffffffffffff) and then it hits "Error path: not a suitable GFN at >> all". >> >> Can anyone point me to which change or what may be causing this? >> >> >> Did the previous event listener replace the page it stole from guest >> physmap for ring purposes when it exited? >> > > Ah, here is what seems to be the problem. Previously it was not required > to do this during teardown. What we had was libxc would check if it can map > the ring page with xc_map_foreign_pages, and it would repopulate the page > if it failed before running xc_vm_event_enable. However, now it seems > xc_map_foreign_pages return non-NULL the second time around as well, either > though the page is not in the physmap. > > > This is the bug then. If there isn't a page in the physmap, > xc_map_foreign_pages() should indicate an error. > > If I enforce libxc to run populate_physmap then I can get vm_event to > initialize properly again. So the change seems to relate somehow the > behavior of xc_map_foreign_pages. > > > This seems likely due to the splitting out of libxenforeignmem from libxc, > which included the the merging of 4? almost identical map_foreign_$FOO() > functions into one. It is likely that there is a subtle change in > behaviour on an error path. > I've added a bunch of debug messages and it gets all the way down to IOCTL_PRIVCMD_MMAPBATCH_V2 without an error in tools/libs/foreignmemory/linux.c. That ioctl returns 0 too, so I'm not sure where the error comes from. Compared to the flow in Xen 4.6 I don't really see what changed.. Tamas [-- Attachment #1.2: Type: text/html, Size: 4851 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: vm_event regression in 4.7 2016-02-05 20:34 ` Tamas K Lengyel @ 2016-02-05 21:08 ` Tamas K Lengyel 0 siblings, 0 replies; 6+ messages in thread From: Tamas K Lengyel @ 2016-02-05 21:08 UTC (permalink / raw) To: Andrew Cooper; +Cc: Ian Campbell, Xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2619 bytes --] On Fri, Feb 5, 2016 at 1:34 PM, Tamas K Lengyel <tamas.k.lengyel@gmail.com> wrote: > > > > On Wed, Feb 3, 2016 at 3:35 AM, Andrew Cooper <andrew.cooper3@citrix.com> > wrote: > >> On 03/02/16 01:32, Tamas K Lengyel wrote: >> >> >> >> On Tue, Feb 2, 2016 at 6:00 PM, Andrew Cooper <andrew.cooper3@citrix.com> >> wrote: >> >>> On 03/02/2016 00:51, Tamas K Lengyel wrote: >>> >>> Hello all, >>> with the latest master branch of Xen there is a regression enabling >>> vm_event on a domain. If an event listener was previously active on the >>> domain it is now not possible to reenable events as the domctl returns >>> -EINVAL. The problem seems to stem from activating the magic page for >>> vm_event using prepare_ring_for_helper as it returns NULL. Further looking >>> into where things go wrong within that function it seems the page type >>> returned by __get_gfn_type_access is p2m_ram_logdirty with an invalid mfn >>> (0xffffffffffffffff) and then it hits "Error path: not a suitable GFN at >>> all". >>> >>> Can anyone point me to which change or what may be causing this? >>> >>> >>> Did the previous event listener replace the page it stole from guest >>> physmap for ring purposes when it exited? >>> >> >> Ah, here is what seems to be the problem. Previously it was not required >> to do this during teardown. What we had was libxc would check if it can map >> the ring page with xc_map_foreign_pages, and it would repopulate the page >> if it failed before running xc_vm_event_enable. However, now it seems >> xc_map_foreign_pages return non-NULL the second time around as well, either >> though the page is not in the physmap. >> >> >> This is the bug then. If there isn't a page in the physmap, >> xc_map_foreign_pages() should indicate an error. >> >> If I enforce libxc to run populate_physmap then I can get vm_event to >> initialize properly again. So the change seems to relate somehow the >> behavior of xc_map_foreign_pages. >> >> >> This seems likely due to the splitting out of libxenforeignmem from >> libxc, which included the the merging of 4? almost identical >> map_foreign_$FOO() functions into one. It is likely that there is a subtle >> change in behaviour on an error path. >> > > I've added a bunch of debug messages and it gets all the way down to > IOCTL_PRIVCMD_MMAPBATCH_V2 without an error in > tools/libs/foreignmemory/linux.c. That ioctl returns 0 too, so I'm not sure > where the error comes from. Compared to the flow in Xen 4.6 I don't really > see what changed.. > > Never mind, found it. The commit "b701ccc8 tools: Remove xc_map_foreign_batch" caused the regression. Tamas [-- Attachment #1.2: Type: text/html, Size: 5449 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-02-05 21:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-02-03 0:51 vm_event regression in 4.7 Tamas K Lengyel 2016-02-03 1:00 ` Andrew Cooper 2016-02-03 1:32 ` Tamas K Lengyel 2016-02-03 10:35 ` Andrew Cooper 2016-02-05 20:34 ` Tamas K Lengyel 2016-02-05 21:08 ` Tamas K Lengyel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.