* vcpu_show_execution_state() difference between Arm and x86
@ 2021-09-01 13:39 Jan Beulich
2021-09-01 18:11 ` Julien Grall
0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-09-01 13:39 UTC (permalink / raw)
To: Julien Grall, Stefano Stabellini, Andrew Cooper; +Cc: xen-devel
All,
back in 2016 Andrew added code to x86'es variant to avoid interleaving
of output. The same issue ought to exist on Arm. The lock acquired,
or more importantly the turning off of IRQs while doing so, is now
getting in the way of having PVH Dom0's state dumped the 2nd time. For
register state I did find a sufficiently simple (yet not pretty)
workaround. For the stack, where I can't reasonably avoid using p2m
functions, this is going to be more difficult.
Since I expect Arm to want to also have interleave protection at some
point, and since Arm also acquires the p2m lock while accessing Dom0's
stacks, I wonder whether anyone has any clever idea on how to avoid
the (valid) triggering of check_lock()'s assertion without intrusive
changes. (As to intrusive changes - acquiring the p2m lock up front in
recursive mode, plus silencing check_lock() for nested acquires of a
lock that's already being held by a CPU was my initial idea.)
Thanks, Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vcpu_show_execution_state() difference between Arm and x86
2021-09-01 13:39 vcpu_show_execution_state() difference between Arm and x86 Jan Beulich
@ 2021-09-01 18:11 ` Julien Grall
2021-09-02 6:45 ` Jan Beulich
0 siblings, 1 reply; 5+ messages in thread
From: Julien Grall @ 2021-09-01 18:11 UTC (permalink / raw)
To: Jan Beulich, Stefano Stabellini, Andrew Cooper; +Cc: xen-devel
On 01/09/2021 14:39, Jan Beulich wrote:
> All,
>
> back in 2016 Andrew added code to x86'es variant to avoid interleaving
> of output. The same issue ought to exist on Arm.
Agree. I guess we got away so far because it is pretty rare to have two
CPUs running at the same.
> The lock acquired,
> or more importantly the turning off of IRQs while doing so, is now
> getting in the way of having PVH Dom0's state dumped the 2nd time.
I am not quite too sure to understand the problem with PVH dom0. Do you
have a pointer to the issue?
> For
> register state I did find a sufficiently simple (yet not pretty)
> workaround. For the stack, where I can't reasonably avoid using p2m
> functions, this is going to be more difficult. >
> Since I expect Arm to want to also have interleave protection at some
> point, and since Arm also acquires the p2m lock while accessing Dom0's
> stacks, I wonder whether anyone has any clever idea on how to avoid
> the (valid) triggering of check_lock()'s assertion without intrusive
> changes. (As to intrusive changes - acquiring the p2m lock up front in
> recursive mode, plus silencing check_lock() for nested acquires of a
> lock that's already being held by a CPU was my initial idea.)
At least one Arm, the P2M lock is a rwlock which is not yet recursive.
But then it feels to me that this solution is only going to cause us
more trouble in the future.
I looked at the original commit to find out the reason to use the
console lock. AFAICT, this was to allow console_force_unlock() to work
properly. But it is not entirely clear why we couldn't get a new lock
(with IRQ enabled) that could be forced unlocked in that function.
Can either you or Andrew clarify it?
The other solution I can think off is buffering the output for
show_registers and only print it once at the end. The downside is we may
not get any output if there is an issue in the middle of the dump.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vcpu_show_execution_state() difference between Arm and x86
2021-09-01 18:11 ` Julien Grall
@ 2021-09-02 6:45 ` Jan Beulich
2021-09-02 11:48 ` Julien Grall
0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-09-02 6:45 UTC (permalink / raw)
To: Julien Grall; +Cc: xen-devel, Stefano Stabellini, Andrew Cooper
On 01.09.2021 20:11, Julien Grall wrote:
> On 01/09/2021 14:39, Jan Beulich wrote:
>> back in 2016 Andrew added code to x86'es variant to avoid interleaving
>> of output. The same issue ought to exist on Arm.
>
> Agree. I guess we got away so far because it is pretty rare to have two
> CPUs running at the same.
I guess you've meant "crashing", not "running"?
>> The lock acquired,
>> or more importantly the turning off of IRQs while doing so, is now
>> getting in the way of having PVH Dom0's state dumped the 2nd time.
>
> I am not quite too sure to understand the problem with PVH dom0. Do you
> have a pointer to the issue?
Locking in both cases: For registers it was VMX'es vmx_vmcs_enter()
acquiring a (non-IRQ-safe) lock, and for the stack it is the P2M lock.
Neither can / should sensibly be made IRQ-safe.
>> For
>> register state I did find a sufficiently simple (yet not pretty)
>> workaround. For the stack, where I can't reasonably avoid using p2m
>> functions, this is going to be more difficult. >
>> Since I expect Arm to want to also have interleave protection at some
>> point, and since Arm also acquires the p2m lock while accessing Dom0's
>> stacks, I wonder whether anyone has any clever idea on how to avoid
>> the (valid) triggering of check_lock()'s assertion without intrusive
>> changes. (As to intrusive changes - acquiring the p2m lock up front in
>> recursive mode, plus silencing check_lock() for nested acquires of a
>> lock that's already being held by a CPU was my initial idea.)
>
> At least one Arm, the P2M lock is a rwlock which is not yet recursive.
Right; the same is tru on x86. Hence the expected intrusiveness.
> But then it feels to me that this solution is only going to cause us
> more trouble in the future.
Same here - we'd need to be very careful not only when making such
a change, but also going forward. Hence my desire to come up with a
better approach.
> I looked at the original commit to find out the reason to use the
> console lock. AFAICT, this was to allow console_force_unlock() to work
> properly. But it is not entirely clear why we couldn't get a new lock
> (with IRQ enabled) that could be forced unlocked in that function.
>
> Can either you or Andrew clarify it?
AIUI any new lock would need to be IRQ-safe as well, as the lock
would be on paths taken to output stuff when the system crashed.
Hence it would be pointless to introduce yet another lock when the
one we have is already good enough. But I may be missing aspects,
in which case I'd have to defer to Andrew.
> The other solution I can think off is buffering the output for
> show_registers and only print it once at the end. The downside is we may
> not get any output if there is an issue in the middle of the dump.
Indeed; I'd prefer to avoid that, the more that it may be hard to
predict how much output there's going to be.
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vcpu_show_execution_state() difference between Arm and x86
2021-09-02 6:45 ` Jan Beulich
@ 2021-09-02 11:48 ` Julien Grall
2021-09-02 11:55 ` Jan Beulich
0 siblings, 1 reply; 5+ messages in thread
From: Julien Grall @ 2021-09-02 11:48 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Stefano Stabellini, Andrew Cooper, Bertrand Marquis
Hi Jan,
On 02/09/2021 07:45, Jan Beulich wrote:
> On 01.09.2021 20:11, Julien Grall wrote:
>> On 01/09/2021 14:39, Jan Beulich wrote:
>>> back in 2016 Andrew added code to x86'es variant to avoid interleaving
>>> of output. The same issue ought to exist on Arm.
>>
>> Agree. I guess we got away so far because it is pretty rare to have two
>> CPUs running at the same.
>
> I guess you've meant "crashing", not "running"?
Yes.
>
>>> The lock acquired,
>>> or more importantly the turning off of IRQs while doing so, is now
>>> getting in the way of having PVH Dom0's state dumped the 2nd time.
>>
>> I am not quite too sure to understand the problem with PVH dom0. Do you
>> have a pointer to the issue?
>
> Locking in both cases: For registers it was VMX'es vmx_vmcs_enter()
> acquiring a (non-IRQ-safe) lock, and for the stack it is the P2M lock.
> Neither can / should sensibly be made IRQ-safe.
>
>>> For
>>> register state I did find a sufficiently simple (yet not pretty)
>>> workaround. For the stack, where I can't reasonably avoid using p2m
>>> functions, this is going to be more difficult. >
>>> Since I expect Arm to want to also have interleave protection at some
>>> point, and since Arm also acquires the p2m lock while accessing Dom0's
>>> stacks, I wonder whether anyone has any clever idea on how to avoid
>>> the (valid) triggering of check_lock()'s assertion without intrusive
>>> changes. (As to intrusive changes - acquiring the p2m lock up front in
>>> recursive mode, plus silencing check_lock() for nested acquires of a
>>> lock that's already being held by a CPU was my initial idea.)
>>
>> At least one Arm, the P2M lock is a rwlock which is not yet recursive.
>
> Right; the same is tru on x86. Hence the expected intrusiveness.
>
>> But then it feels to me that this solution is only going to cause us
>> more trouble in the future.
>
> Same here - we'd need to be very careful not only when making such
> a change, but also going forward. Hence my desire to come up with a
> better approach.
>
>> I looked at the original commit to find out the reason to use the
>> console lock. AFAICT, this was to allow console_force_unlock() to work
>> properly. But it is not entirely clear why we couldn't get a new lock
>> (with IRQ enabled) that could be forced unlocked in that function.
>>
>> Can either you or Andrew clarify it?
>
> AIUI any new lock would need to be IRQ-safe as well, as the lock
> would be on paths taken to output stuff when the system crashed.
Hmmm... Just to confirm, what you are saying is some of the callers of
vcpu_show_execution_state() & co may do it with IRQ-disabled. Is that
correct?
I have tried to play with it on Arm but then I realized that
check_lock() is not properly working on Arm because we don't call
spin_debug_enable() at boot. :/ So it would make sense why we never saw
any issue there...
> Hence it would be pointless to introduce yet another lock when the
> one we have is already good enough. But I may be missing aspects,
> in which case I'd have to defer to Andrew.
>
>> The other solution I can think off is buffering the output for
>> show_registers and only print it once at the end. The downside is we may
>> not get any output if there is an issue in the middle of the dump.
>
> Indeed; I'd prefer to avoid that, the more that it may be hard to
> predict how much output there's going to be.
And it is not going to work as we couldn't grab the P2M lock with IRQ
disabled.
On Arm, the only problem is going to be the P2M lock for dump the guest
trace. A possible controversial approach for Arm is to just not dump the
guest stack or move it outside of the new lock and dump when IRQ is
enabled (I am not aware of any places where the guest stack will be
dumped and we have IRQ disabled).
Bertrand, Stefano? Do you use the guest stack in the dump? If so, what
are the cases?
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vcpu_show_execution_state() difference between Arm and x86
2021-09-02 11:48 ` Julien Grall
@ 2021-09-02 11:55 ` Jan Beulich
0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2021-09-02 11:55 UTC (permalink / raw)
To: Julien Grall
Cc: xen-devel, Stefano Stabellini, Andrew Cooper, Bertrand Marquis
On 02.09.2021 13:48, Julien Grall wrote:
> On 02/09/2021 07:45, Jan Beulich wrote:
>> On 01.09.2021 20:11, Julien Grall wrote:
>>> I looked at the original commit to find out the reason to use the
>>> console lock. AFAICT, this was to allow console_force_unlock() to work
>>> properly. But it is not entirely clear why we couldn't get a new lock
>>> (with IRQ enabled) that could be forced unlocked in that function.
>>>
>>> Can either you or Andrew clarify it?
>>
>> AIUI any new lock would need to be IRQ-safe as well, as the lock
>> would be on paths taken to output stuff when the system crashed.
>
> Hmmm... Just to confirm, what you are saying is some of the callers of
> vcpu_show_execution_state() & co may do it with IRQ-disabled. Is that
> correct?
No, that's not what I was saying. What I was saying is that crash-
safety requires an IRQ-safe lock, because the approach taken here
should imo match that in show_execution_state(). And _that_
function can be called with IRQs in either state.
> I have tried to play with it on Arm but then I realized that
> check_lock() is not properly working on Arm because we don't call
> spin_debug_enable() at boot. :/ So it would make sense why we never saw
> any issue there...
Oops.
>> Hence it would be pointless to introduce yet another lock when the
>> one we have is already good enough. But I may be missing aspects,
>> in which case I'd have to defer to Andrew.
>>
>>> The other solution I can think off is buffering the output for
>>> show_registers and only print it once at the end. The downside is we may
>>> not get any output if there is an issue in the middle of the dump.
>>
>> Indeed; I'd prefer to avoid that, the more that it may be hard to
>> predict how much output there's going to be.
>
> And it is not going to work as we couldn't grab the P2M lock with IRQ
> disabled.
>
> On Arm, the only problem is going to be the P2M lock for dump the guest
> trace. A possible controversial approach for Arm is to just not dump the
> guest stack or move it outside of the new lock and dump when IRQ is
> enabled (I am not aware of any places where the guest stack will be
> dumped and we have IRQ disabled).
Well, you could certainly omit the stack dump on Arm. I for one find
it quite useful every now and then. On x86, that is.
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-09-02 11:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 13:39 vcpu_show_execution_state() difference between Arm and x86 Jan Beulich
2021-09-01 18:11 ` Julien Grall
2021-09-02 6:45 ` Jan Beulich
2021-09-02 11:48 ` Julien Grall
2021-09-02 11:55 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.