All of lore.kernel.org
 help / color / mirror / Atom feed
* vcpu_show_execution_state() difference between Arm and x86
@ 2021-09-01 13:39 Jan Beulich
  2021-09-01 18:11 ` Julien Grall
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-09-01 13:39 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Andrew Cooper; +Cc: xen-devel

All,

back in 2016 Andrew added code to x86'es variant to avoid interleaving
of output. The same issue ought to exist on Arm. The lock acquired,
or more importantly the turning off of IRQs while doing so, is now
getting in the way of having PVH Dom0's state dumped the 2nd time. For
register state I did find a sufficiently simple (yet not pretty)
workaround. For the stack, where I can't reasonably avoid using p2m
functions, this is going to be more difficult.

Since I expect Arm to want to also have interleave protection at some
point, and since Arm also acquires the p2m lock while accessing Dom0's
stacks, I wonder whether anyone has any clever idea on how to avoid
the (valid) triggering of check_lock()'s assertion without intrusive
changes. (As to intrusive changes - acquiring the p2m lock up front in
recursive mode, plus silencing check_lock() for nested acquires of a
lock that's already being held by a CPU was my initial idea.)

Thanks, Jan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vcpu_show_execution_state() difference between Arm and x86
  2021-09-01 13:39 vcpu_show_execution_state() difference between Arm and x86 Jan Beulich
@ 2021-09-01 18:11 ` Julien Grall
  2021-09-02  6:45   ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Grall @ 2021-09-01 18:11 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini, Andrew Cooper; +Cc: xen-devel



On 01/09/2021 14:39, Jan Beulich wrote:
> All,
> 
> back in 2016 Andrew added code to x86'es variant to avoid interleaving
> of output. The same issue ought to exist on Arm.

Agree. I guess we got away so far because it is pretty rare to have two 
CPUs running at the same.

> The lock acquired,
> or more importantly the turning off of IRQs while doing so, is now
> getting in the way of having PVH Dom0's state dumped the 2nd time. 

I am not quite too sure to understand the problem with PVH dom0. Do you 
have a pointer to the issue?

> For
> register state I did find a sufficiently simple (yet not pretty)
> workaround. For the stack, where I can't reasonably avoid using p2m
> functions, this is going to be more difficult. >
> Since I expect Arm to want to also have interleave protection at some
> point, and since Arm also acquires the p2m lock while accessing Dom0's
> stacks, I wonder whether anyone has any clever idea on how to avoid
> the (valid) triggering of check_lock()'s assertion without intrusive
> changes. (As to intrusive changes - acquiring the p2m lock up front in
> recursive mode, plus silencing check_lock() for nested acquires of a
> lock that's already being held by a CPU was my initial idea.)

At least one Arm, the P2M lock is a rwlock which is not yet recursive. 
But then it feels to me that this solution is only going to cause us 
more trouble in the future.

I looked at the original commit to find out the reason to use the 
console lock. AFAICT, this was to allow console_force_unlock() to work 
properly. But it is not entirely clear why we couldn't get a new lock 
(with IRQ enabled) that could be forced unlocked in that function.

Can either you or Andrew clarify it?

The other solution I can think off is buffering the output for 
show_registers and only print it once at the end. The downside is we may 
not get any output if there is an issue in the middle of the dump.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vcpu_show_execution_state() difference between Arm and x86
  2021-09-01 18:11 ` Julien Grall
@ 2021-09-02  6:45   ` Jan Beulich
  2021-09-02 11:48     ` Julien Grall
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-09-02  6:45 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Stefano Stabellini, Andrew Cooper

On 01.09.2021 20:11, Julien Grall wrote:
> On 01/09/2021 14:39, Jan Beulich wrote:
>> back in 2016 Andrew added code to x86'es variant to avoid interleaving
>> of output. The same issue ought to exist on Arm.
> 
> Agree. I guess we got away so far because it is pretty rare to have two 
> CPUs running at the same.

I guess you've meant "crashing", not "running"?

>> The lock acquired,
>> or more importantly the turning off of IRQs while doing so, is now
>> getting in the way of having PVH Dom0's state dumped the 2nd time. 
> 
> I am not quite too sure to understand the problem with PVH dom0. Do you 
> have a pointer to the issue?

Locking in both cases: For registers it was VMX'es vmx_vmcs_enter()
acquiring a (non-IRQ-safe) lock, and for the stack it is the P2M lock.
Neither can / should sensibly be made IRQ-safe.

>> For
>> register state I did find a sufficiently simple (yet not pretty)
>> workaround. For the stack, where I can't reasonably avoid using p2m
>> functions, this is going to be more difficult. >
>> Since I expect Arm to want to also have interleave protection at some
>> point, and since Arm also acquires the p2m lock while accessing Dom0's
>> stacks, I wonder whether anyone has any clever idea on how to avoid
>> the (valid) triggering of check_lock()'s assertion without intrusive
>> changes. (As to intrusive changes - acquiring the p2m lock up front in
>> recursive mode, plus silencing check_lock() for nested acquires of a
>> lock that's already being held by a CPU was my initial idea.)
> 
> At least one Arm, the P2M lock is a rwlock which is not yet recursive. 

Right; the same is tru on x86. Hence the expected intrusiveness.

> But then it feels to me that this solution is only going to cause us 
> more trouble in the future.

Same here - we'd need to be very careful not only when making such
a change, but also going forward. Hence my desire to come up with a
better approach.

> I looked at the original commit to find out the reason to use the 
> console lock. AFAICT, this was to allow console_force_unlock() to work 
> properly. But it is not entirely clear why we couldn't get a new lock 
> (with IRQ enabled) that could be forced unlocked in that function.
> 
> Can either you or Andrew clarify it?

AIUI any new lock would need to be IRQ-safe as well, as the lock
would be on paths taken to output stuff when the system crashed.
Hence it would be pointless to introduce yet another lock when the
one we have is already good enough. But I may be missing aspects,
in which case I'd have to defer to Andrew.

> The other solution I can think off is buffering the output for 
> show_registers and only print it once at the end. The downside is we may 
> not get any output if there is an issue in the middle of the dump.

Indeed; I'd prefer to avoid that, the more that it may be hard to
predict how much output there's going to be.

Jan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vcpu_show_execution_state() difference between Arm and x86
  2021-09-02  6:45   ` Jan Beulich
@ 2021-09-02 11:48     ` Julien Grall
  2021-09-02 11:55       ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Grall @ 2021-09-02 11:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Stefano Stabellini, Andrew Cooper, Bertrand Marquis

Hi Jan,

On 02/09/2021 07:45, Jan Beulich wrote:
> On 01.09.2021 20:11, Julien Grall wrote:
>> On 01/09/2021 14:39, Jan Beulich wrote:
>>> back in 2016 Andrew added code to x86'es variant to avoid interleaving
>>> of output. The same issue ought to exist on Arm.
>>
>> Agree. I guess we got away so far because it is pretty rare to have two
>> CPUs running at the same.
> 
> I guess you've meant "crashing", not "running"?

Yes.

> 
>>> The lock acquired,
>>> or more importantly the turning off of IRQs while doing so, is now
>>> getting in the way of having PVH Dom0's state dumped the 2nd time.
>>
>> I am not quite too sure to understand the problem with PVH dom0. Do you
>> have a pointer to the issue?
> 
> Locking in both cases: For registers it was VMX'es vmx_vmcs_enter()
> acquiring a (non-IRQ-safe) lock, and for the stack it is the P2M lock.
> Neither can / should sensibly be made IRQ-safe.
> 
>>> For
>>> register state I did find a sufficiently simple (yet not pretty)
>>> workaround. For the stack, where I can't reasonably avoid using p2m
>>> functions, this is going to be more difficult. >
>>> Since I expect Arm to want to also have interleave protection at some
>>> point, and since Arm also acquires the p2m lock while accessing Dom0's
>>> stacks, I wonder whether anyone has any clever idea on how to avoid
>>> the (valid) triggering of check_lock()'s assertion without intrusive
>>> changes. (As to intrusive changes - acquiring the p2m lock up front in
>>> recursive mode, plus silencing check_lock() for nested acquires of a
>>> lock that's already being held by a CPU was my initial idea.)
>>
>> At least one Arm, the P2M lock is a rwlock which is not yet recursive.
> 
> Right; the same is tru on x86. Hence the expected intrusiveness.
> 
>> But then it feels to me that this solution is only going to cause us
>> more trouble in the future.
> 
> Same here - we'd need to be very careful not only when making such
> a change, but also going forward. Hence my desire to come up with a
> better approach.
> 
>> I looked at the original commit to find out the reason to use the
>> console lock. AFAICT, this was to allow console_force_unlock() to work
>> properly. But it is not entirely clear why we couldn't get a new lock
>> (with IRQ enabled) that could be forced unlocked in that function.
>>
>> Can either you or Andrew clarify it?
> 
> AIUI any new lock would need to be IRQ-safe as well, as the lock
> would be on paths taken to output stuff when the system crashed.

Hmmm... Just to confirm, what you are saying is some of the callers of 
vcpu_show_execution_state() & co may do it with IRQ-disabled. Is that 
correct?

I have tried to play with it on Arm but then I realized that 
check_lock() is not properly working on Arm because we don't call 
spin_debug_enable() at boot. :/ So it would make sense why we never saw 
any issue there...

> Hence it would be pointless to introduce yet another lock when the
> one we have is already good enough. But I may be missing aspects,
> in which case I'd have to defer to Andrew.
> 
>> The other solution I can think off is buffering the output for
>> show_registers and only print it once at the end. The downside is we may
>> not get any output if there is an issue in the middle of the dump.
> 
> Indeed; I'd prefer to avoid that, the more that it may be hard to
> predict how much output there's going to be.

And it is not going to work as we couldn't grab the P2M lock with IRQ 
disabled.

On Arm, the only problem is going to be the P2M lock for dump the guest 
trace. A possible controversial approach for Arm is to just not dump the 
guest stack or move it outside of the new lock and dump when IRQ is 
enabled (I am not aware of any places where the guest stack will be 
dumped and we have IRQ disabled).

Bertrand, Stefano? Do you use the guest stack in the dump? If so, what 
are the cases?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vcpu_show_execution_state() difference between Arm and x86
  2021-09-02 11:48     ` Julien Grall
@ 2021-09-02 11:55       ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2021-09-02 11:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Andrew Cooper, Bertrand Marquis

On 02.09.2021 13:48, Julien Grall wrote:
> On 02/09/2021 07:45, Jan Beulich wrote:
>> On 01.09.2021 20:11, Julien Grall wrote:
>>> I looked at the original commit to find out the reason to use the
>>> console lock. AFAICT, this was to allow console_force_unlock() to work
>>> properly. But it is not entirely clear why we couldn't get a new lock
>>> (with IRQ enabled) that could be forced unlocked in that function.
>>>
>>> Can either you or Andrew clarify it?
>>
>> AIUI any new lock would need to be IRQ-safe as well, as the lock
>> would be on paths taken to output stuff when the system crashed.
> 
> Hmmm... Just to confirm, what you are saying is some of the callers of 
> vcpu_show_execution_state() & co may do it with IRQ-disabled. Is that 
> correct?

No, that's not what I was saying. What I was saying is that crash-
safety requires an IRQ-safe lock, because the approach taken here
should imo match that in show_execution_state(). And _that_
function can be called with IRQs in either state.

> I have tried to play with it on Arm but then I realized that 
> check_lock() is not properly working on Arm because we don't call 
> spin_debug_enable() at boot. :/ So it would make sense why we never saw 
> any issue there...

Oops.

>> Hence it would be pointless to introduce yet another lock when the
>> one we have is already good enough. But I may be missing aspects,
>> in which case I'd have to defer to Andrew.
>>
>>> The other solution I can think off is buffering the output for
>>> show_registers and only print it once at the end. The downside is we may
>>> not get any output if there is an issue in the middle of the dump.
>>
>> Indeed; I'd prefer to avoid that, the more that it may be hard to
>> predict how much output there's going to be.
> 
> And it is not going to work as we couldn't grab the P2M lock with IRQ 
> disabled.
> 
> On Arm, the only problem is going to be the P2M lock for dump the guest 
> trace. A possible controversial approach for Arm is to just not dump the 
> guest stack or move it outside of the new lock and dump when IRQ is 
> enabled (I am not aware of any places where the guest stack will be 
> dumped and we have IRQ disabled).

Well, you could certainly omit the stack dump on Arm. I for one find
it quite useful every now and then. On x86, that is.

Jan



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-02 11:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 13:39 vcpu_show_execution_state() difference between Arm and x86 Jan Beulich
2021-09-01 18:11 ` Julien Grall
2021-09-02  6:45   ` Jan Beulich
2021-09-02 11:48     ` Julien Grall
2021-09-02 11:55       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.