Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

From: "Jürgen Groß" <jgross@suse.com>
To: Julien Grall <julien@xen.org>,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, netdev@vger.kernel.org,
	linux-scsi@vger.kernel.org
Cc: "Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	stable@vger.kernel.org,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Jens Axboe" <axboe@kernel.dk>, "Wei Liu" <wei.liu@kernel.org>,
	"Paul Durrant" <paul@xen.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>
Subject: Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
Date: Mon, 8 Feb 2021 15:50:59 +0100	[thread overview]
Message-ID: <bf0a3f8f-9604-1ff3-6b82-3cb117ce3839@suse.com> (raw)
In-Reply-To: <279b741b-09dc-c6af-bf9d-df57922fa465@xen.org>

[-- Attachment #1.1.1: Type: text/plain, Size: 5598 bytes --]

On 08.02.21 15:20, Julien Grall wrote:
> Hi Juergen,
> 
> On 08/02/2021 13:58, Jürgen Groß wrote:
>> On 08.02.21 14:09, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 08/02/2021 12:31, Jürgen Groß wrote:
>>>> On 08.02.21 13:16, Julien Grall wrote:
>>>>>
>>>>>
>>>>> On 08/02/2021 12:14, Jürgen Groß wrote:
>>>>>> On 08.02.21 11:40, Julien Grall wrote:
>>>>>>> Hi Juergen,
>>>>>>>
>>>>>>> On 08/02/2021 10:22, Jürgen Groß wrote:
>>>>>>>> On 08.02.21 10:54, Julien Grall wrote:
>>>>>>>>> ... I don't really see how the difference matter here. The idea 
>>>>>>>>> is to re-use what's already existing rather than trying to 
>>>>>>>>> re-invent the wheel with an extra lock (or whatever we can come 
>>>>>>>>> up).
>>>>>>>>
>>>>>>>> The difference is that the race is occurring _before_ any IRQ is
>>>>>>>> involved. So I don't see how modification of IRQ handling would 
>>>>>>>> help.
>>>>>>>
>>>>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like:
>>>>>>>
>>>>>>> if ( irq in progress )
>>>>>>> {
>>>>>>>    set IRQS_PENDING
>>>>>>>    return;
>>>>>>> }
>>>>>>>
>>>>>>> do
>>>>>>> {
>>>>>>>    clear IRQS_PENDING
>>>>>>>    handle_irq()
>>>>>>> } while (IRQS_PENDING is set)
>>>>>>>
>>>>>>> IRQ handling flow like handle_fasteoi_irq() looks like:
>>>>>>>
>>>>>>> if ( irq in progress )
>>>>>>>    return;
>>>>>>>
>>>>>>> handle_irq()
>>>>>>>
>>>>>>> The latter flow would catch "spurious" interrupt and ignore them. 
>>>>>>> So it would handle nicely the race when changing the event affinity.
>>>>>>
>>>>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is
>>>>>> issued, thus having the same problem again? 
>>>>>
>>>>> Sorry I can't parse this.
>>>>
>>>> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is
>>>> this condition being reset again in order to be able to process another
>>>> IRQ?
>>> It is reset after the handler has been called. See handle_irq_event().
>>
>> Right. And for us this is too early, as we want the next IRQ being
>> handled only after we have called xen_irq_lateeoi().
> 
> It is not really the next IRQ here. It is more a spurious IRQ because we 
> don't clear & mask the event right away. Instead, it is done later in 
> the handling.
> 
>>
>>>
>>>> I believe this will be the case before our "lateeoi" handling is
>>>> becoming active (more precise: when our IRQ handler is returning to
>>>> handle_fasteoi_irq()), resulting in the possibility of the same race we
>>>> are experiencing now.
>>>
>>> I am a bit confused what you mean by "lateeoi" handling is becoming 
>>> active. Can you clarify?
>>
>> See above: the next call of the handler should be allowed only after
>> xen_irq_lateeoi() for the IRQ has been called.
>>
>> If the handler is being called earlier we have the race resulting
>> in the WARN() splats.
> 
> I feel it is dislike to understand race with just words. Can you provide 
> a scenario (similar to the one I originally provided) with two vCPUs and 
> show how this can happen?

vCPU0                | vCPU1
                      |
                      | Call xen_rebind_evtchn_to_cpu()
receive event X      |
                      | mask event X
                      | bind to vCPU1
<vCPU descheduled>   | unmask event X
                      |
                      | receive event X
                      |
                      | handle_fasteoi_irq(X)
                      |  -> handle_irq_event()
                      |   -> set IRQD_IN_PROGRESS
                      |   -> evtchn_interrupt()
                      |      -> evtchn->enabled = false
                      |   -> clear IRQD_IN_PROGRESS
handle_fasteoi_irq(X)|
-> evtchn_interrupt()|
    -> WARN()         |
                      | xen_irq_lateeoi(X)

> 
>>
>>> Note that are are other IRQ flows existing. We should have a look at 
>>> them before trying to fix thing ourself.
>>
>> Fine with me, but it either needs to fit all use cases (interdomain,
>> IPI, real interrupts) or we need to have a per-type IRQ flow.
> 
> AFAICT, we already used different flow based on the use cases. Before 
> 2011, we used to use the fasteoi one but this was changed by the 
> following commit:

Yes, I know that.

>>
>> I think we should fix the issue locally first, then we can start to do
>> a thorough rework planning. Its not as if the needed changes with the
>> current flow would be so huge, and I'd really like to have a solution
>> rather sooner than later. Changing the IRQ flow might have other side
>> effects which need to be excluded by thorough testing.
> I agree that we need a solution ASAP. But I am a bit worry to:
>    1) Add another lock in that event handling path.

Regarding complexity: it is very simple (just around masking/unmasking
of the event channel). Contention is very unlikely.

>    2) Add more complexity in the event handling (it is already fairly 
> difficult to reason about the locking/race)
> 
> Let see what the local fix look like.

Yes.

> 
>>> Although, the other issue I can see so far is handle_irq_for_port() 
>>> will update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. 
>>> But it is not clear this is what you mean by "becoming active".
>>
>> As long as a single event can't be handled on multiple cpus at the same
>> time, there is no locking needed.
> 
> Well, it can happen in the current code (see my original scenario). If 
> your idea fix it then fine.

I hope so.

Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]