xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Razvan Cojocaru <rcojocaru@bitdefender.com>
Cc: "kevin.tian@intel.com" <kevin.tian@intel.com>,
	"tamas@tklengyel.com" <tamas@tklengyel.com>,
	"wei.liu2@citrix.com" <wei.liu2@citrix.com>,
	"jbeulich@suse.com" <jbeulich@suse.com>,
	"george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"Mihai Donțu" <mdontu@bitdefender.com>,
	"Andrei Vlad LUTAS" <vlutas@bitdefender.com>,
	"jun.nakajima@intel.com" <jun.nakajima@intel.com>,
	"Alexandru Stefan ISAILA" <aisaila@bitdefender.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Anshul Makkar" <anshul.makkar@citrix.com>
Subject: Re: [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults
Date: Thu, 22 Nov 2018 11:58:21 +0100	[thread overview]
Message-ID: <20181122105821.6ihjcq5dy2lqjj6j@mac> (raw)
In-Reply-To: <7efdfb5e-044b-f2a3-6562-d3468997096a@bitdefender.com>

On Thu, Nov 22, 2018 at 12:14:59PM +0200, Razvan Cojocaru wrote:
> On 11/22/18 12:05 PM, Roger Pau Monné wrote:
> > On Wed, Nov 21, 2018 at 08:55:48PM +0200, Razvan Cojocaru wrote:
> >> On 11/16/18 7:04 PM, Roger Pau Monné wrote:
> >>>> +            if ( a == v )
> >>>> +                continue;
> >>>> +
> >>>> +            /* Pause, synced. */
> >>>> +            while ( !a->arch.in_host )
> >>> Why not use a->is_running as a way to know whether the vCPU is
> >>> running?
> >>>
> >>> I think the logic of using vcpu_pause and expecting the running vcpu
> >>> to take a vmexit and thus set in_host is wrong because a vcpu that
> >>> wasn't running when vcpu_pause_nosync is called won't get scheduled
> >>> anymore, thus not taking a vmexit and this function will lockup.
> >>>
> >>> I don't think you need the in_host boolean at all.
> >>>
> >>>> +                cpu_relax();
> >>> Is this really better than using vcpu_pause?
> >>>
> >>> I assume this is done to avoid waiting on each vcpu, and instead doing
> >>> it here likely means less wait time?
> >>
> >> The problem with plain vcpu_pause() is that we weren't able to use it,
> >> for the same reason (which remains unclear as of yet) that we couldn't
> >> use a->is_running: we get CPU stuck hypervisor crashes that way. Here's
> >> one that uses the same logic, but loops on a->is_running instead of
> >> !a->arch.in_host:

[...]

> >> Some scheduler magic appears to happen here where it is unclear why
> >> is_running doesn't seem to end up being 0 as expected in our case. We'll
> >> keep digging.
> > 
> > There seems to be some kind of deadlock between
> > vmx_start_reexecute_instruction and hap_track_dirty_vram/handle_mmio.
> > Are you holding a lock while trying to put the other vcpus to sleep?
> 
> d->arch.rexec_lock, but I don't see how that would matter in this case.

The trace from pCPU#0:

(XEN) [ 3668.016989] RFLAGS: 0000000000000202   CONTEXT: hypervisor (d0v0)
[...]
(XEN) [ 3668.275417] Xen call trace:
(XEN) [ 3668.278714]    [<ffff82d0801327d2>] vcpu_sleep_sync+0x40/0x71
(XEN) [ 3668.284952]    [<ffff82d08010735b>] domain.c#do_domain_pause+0x33/0x4f
(XEN) [ 3668.291973]    [<ffff82d08010879a>] domain_pause+0x25/0x27
(XEN) [ 3668.297952]    [<ffff82d080245e69>] hap_track_dirty_vram+0x2c1/0x4a7
(XEN) [ 3668.304797]    [<ffff82d0801dd8f5>] do_hvm_op+0x18be/0x2b58
(XEN) [ 3668.310864]    [<ffff82d080172aca>] pv_hypercall+0x1e5/0x402
(XEN) [ 3668.317017]    [<ffff82d080250899>] entry.o#test_all_events+0/0x3d

Shows there's an hypercall executed from Dom0 that's trying to pause
the domain, thus pausing all the vCPUs.

Then pCPU#3:

(XEN) [ 3669.062841] RFLAGS: 0000000000000202   CONTEXT: hypervisor (d1v0)
[...]
(XEN) [ 3669.322832] Xen call trace:
(XEN) [ 3669.326128]    [<ffff82d08021006a>] vmx_start_reexecute_instruction+0x107/0x68a
(XEN) [ 3669.333925]    [<ffff82d080210b3e>] p2m_mem_access_check+0x551/0x64d
(XEN) [ 3669.340774]    [<ffff82d0801dee9e>] hvm_hap_nested_page_fault+0x2f2/0x631
(XEN) [ 3669.348051]    [<ffff82d080202c00>] vmx_vmexit_handler+0x156c/0x1e45
(XEN) [ 3669.354899]    [<ffff82d08020820c>] vmx_asm_vmexit_handler+0xec/0x250

Seems to be blocked in vmx_start_reexecute_instruction, and thus not
getting paused and triggering the watchdog on pCPU#0?

You should check on which vCPU is the trace from pCPU#0 waiting, if
that's the vCPU running on pCPU#3 (d1v0) you will have to check what's
taking such a long time in vmx_start_reexecute_instruction.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-11-22 10:58 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-16 10:06 [PATCH v1] x86/hvm: Generic instruction re-execution mechanism for execute faults Alexandru Stefan ISAILA
2018-11-16 17:04 ` Roger Pau Monné
2018-11-19 13:30   ` Alexandru Stefan ISAILA
2018-11-19 14:26     ` Jan Beulich
2018-11-19 15:08     ` Roger Pau Monné
2018-11-19 15:56       ` Alexandru Stefan ISAILA
2018-11-21  9:56         ` Roger Pau Monné
2018-11-21 10:28           ` Alexandru Stefan ISAILA
2018-11-21 11:41             ` Roger Pau Monné
2018-11-21 12:00               ` Alexandru Stefan ISAILA
2018-11-19 13:33   ` Jan Beulich
2018-11-21 18:55   ` Razvan Cojocaru
2018-11-22  9:50     ` Alexandru Stefan ISAILA
2018-11-22 10:00       ` Jan Beulich
2018-11-22 10:07       ` Roger Pau Monné
2018-11-22 10:05     ` Roger Pau Monné
2018-11-22 10:14       ` Razvan Cojocaru
2018-11-22 10:58         ` Roger Pau Monné [this message]
2018-11-22 12:48           ` Razvan Cojocaru
2018-11-22 14:49             ` Roger Pau Monné
2018-11-22 15:25               ` Razvan Cojocaru
2018-11-22 15:37                 ` Roger Pau Monné
2018-11-22 16:52                   ` Razvan Cojocaru
2018-11-22 17:08                     ` Roger Pau Monné
2018-11-22 18:24                       ` Razvan Cojocaru
2018-11-23  8:54                         ` Roger Pau Monné
     [not found]                           ` <59739FBC020000C234861ACF@prv1-mh.provo.novell.com>
     [not found]                             ` <F553A58C020000AB0063616D@prv1-mh.provo.novell.com>
     [not found]                               ` <4D445A680200003E34861ACF@prv1-mh.provo.novell.com>
     [not found]                                 ` <DAD49D5A020000780063616D@prv1-mh.provo.novell.com>
     [not found]                                   ` <5400A6CB0200003634861ACF@prv1-mh.provo.novell.com>
     [not found]                                     ` <203C1A92020000400063616D@prv1-mh.provo.novell.com>
     [not found]                                       ` <0DF3BC62020000E934861ACF@prv1-mh.provo.novell.com>
     [not found]                                         ` <C6A2E442020000640063616D@prv1-mh.provo.novell.com>
     [not found]                                           ` <6EEA58AB020000EA34861ACF@prv1-mh.provo.novell.com>
2018-11-27 10:31                           ` Razvan Cojocaru
2018-11-27 11:32                             ` Roger Pau Monné
2018-11-27 11:45                               ` Razvan Cojocaru
2018-11-27 11:59                                 ` Andrew Cooper
2018-11-27 12:12                                   ` Razvan Cojocaru
2018-12-19 16:49                               ` Alexandru Stefan ISAILA
2018-12-19 17:40                                 ` Roger Pau Monné
2018-12-20 14:37                                   ` Alexandru Stefan ISAILA
     [not found]                         ` <838191050200006B34861ACF@prv1-mh.provo.novell.com>
2018-11-23  9:07                           ` Jan Beulich
2018-11-27 10:49                             ` Razvan Cojocaru
2018-11-27 11:28                               ` Jan Beulich
2018-11-27 11:44                                 ` Razvan Cojocaru
2019-05-13 13:58                               ` Razvan Cojocaru
2019-05-13 13:58                                 ` [Xen-devel] " Razvan Cojocaru
2019-05-13 14:06                                 ` Jan Beulich
2019-05-13 14:06                                   ` [Xen-devel] " Jan Beulich
2019-05-13 14:15                                   ` Razvan Cojocaru
2019-05-13 14:15                                     ` [Xen-devel] " Razvan Cojocaru
2019-05-14 13:47                                     ` Razvan Cojocaru
2019-05-14 13:47                                       ` [Xen-devel] " Razvan Cojocaru
2019-05-14 14:16                                       ` Jan Beulich
2019-05-14 14:16                                         ` [Xen-devel] " Jan Beulich
2019-05-14 14:20                                         ` Razvan Cojocaru
2019-05-14 14:20                                           ` [Xen-devel] " Razvan Cojocaru
     [not found]                           ` <A31948D30200007D0063616D@prv1-mh.provo.novell.com>
2018-11-23  9:10                             ` Jan Beulich
     [not found]                             ` <9B05ED9E020000C434861ACF@prv1-mh.provo.novell.com>
     [not found]                               ` <626A217B020000C50063616D@prv1-mh.provo.novell.com>
     [not found]                                 ` <0D3C56BA0200004834861ACF@prv1-mh.provo.novell.com>
2018-12-20  9:07                                   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181122105821.6ihjcq5dy2lqjj6j@mac \
    --to=roger.pau@citrix.com \
    --cc=aisaila@bitdefender.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anshul.makkar@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=mdontu@bitdefender.com \
    --cc=rcojocaru@bitdefender.com \
    --cc=tamas@tklengyel.com \
    --cc=vlutas@bitdefender.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).