All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julien Grall <julien@xen.org>
To: "Jürgen Groß" <jgross@suse.com>,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, netdev@vger.kernel.org,
	linux-scsi@vger.kernel.org
Cc: "Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	stable@vger.kernel.org,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Jens Axboe" <axboe@kernel.dk>, "Wei Liu" <wei.liu@kernel.org>,
	"Paul Durrant" <paul@xen.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>
Subject: Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids
Date: Mon, 8 Feb 2021 09:11:18 +0000	[thread overview]
Message-ID: <fcf3181b-3efc-55f5-687c-324937b543e6@xen.org> (raw)
In-Reply-To: <eeb62129-d9fc-2155-0e0f-aff1fbb33fbc@suse.com>

Hi Juergen,

On 07/02/2021 12:58, Jürgen Groß wrote:
> On 06.02.21 19:46, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 06/02/2021 10:49, Juergen Gross wrote:
>>> The first three patches are fixes for XSA-332. The avoid WARN splats
>>> and a performance issue with interdomain events.
>>
>> Thanks for helping to figure out the problem. Unfortunately, I still 
>> see reliably the WARN splat with the latest Linux master 
>> (1e0d27fce010) + your first 3 patches.
>>
>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
>> events ABI.
>>
>> After some debugging, I think I have an idea what's went wrong. The 
>> problem happens when the event is initially bound from vCPU0 to a 
>> different vCPU.
>>
>>  From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
>> event to prevent it being delivered on an unexpected vCPU. However, I 
>> believe the following can happen:
>>
>> vCPU0                | vCPU1
>>                  |
>>                  | Call xen_rebind_evtchn_to_cpu()
>> receive event X            |
>>                  | mask event X
>>                  | bind to vCPU1
>> <vCPU descheduled>        | unmask event X
>>                  |
>>                  | receive event X
>>                  |
>>                  | handle_edge_irq(X)
>> handle_edge_irq(X)        |  -> handle_irq_event()
>>                  |   -> set IRQD_IN_PROGRESS
>>   -> set IRQS_PENDING        |
>>                  |   -> evtchn_interrupt()
>>                  |   -> clear IRQD_IN_PROGRESS
>>                  |  -> IRQS_PENDING is set
>>                  |  -> handle_irq_event()
>>                  |   -> evtchn_interrupt()
>>                  |     -> WARN()
>>                  |
>>
>> All the lateeoi handlers expect a ONESHOT semantic and 
>> evtchn_interrupt() is doesn't tolerate any deviation.
>>
>> I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
>> lateeoi irq acknowledgment") because the interrupt was disabled 
>> previously. Therefore we wouldn't do another iteration in 
>> handle_edge_irq().
> 
> I think you picked the wrong commit for blaming, as this is just
> the last patch of the three patches you were testing.

I actually found the right commit for blaming but I copied the 
information from the wrong shell :/. The bug was introduced by:

c44b849cee8c ("xen/events: switch user event channels to lateeoi model")

> 
>> Aside the handlers, I think it may impact the defer EOI mitigation 
>> because in theory if a 3rd vCPU is joining the party (let say vCPU A 
>> migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, 
>> irq_epoch, eoi_time} could possibly get mangled?
>>
>> For a fix, we may want to consider to hold evtchn_rwlock with the 
>> write permission. Although, I am not 100% sure this is going to 
>> prevent everything.
> 
> It will make things worse, as it would violate the locking hierarchy
> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held).

Ah, right.

> 
> On a first glance I think we'll need a 3rd masking state ("temporarily
> masked") in the second patch in order to avoid a race with lateeoi.
> 
> In order to avoid the race you outlined above we need an "event is being
> handled" indicator checked via test_and_set() semantics in
> handle_irq_for_port() and reset only when calling clear_evtchn().

It feels like we are trying to workaround the IRQ flow we are using 
(i.e. handle_edge_irq()).

This reminds me the thread we had before discovering XSA-332 (see [1]). 
Back then, it was suggested to switch back to handle_fasteoi_irq().

Cheers,

[1] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.21.2004271552430.29217@sstabellini-ThinkPad-T480s/

-- 
Julien Grall

  reply	other threads:[~2021-02-08  9:14 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-06 10:49 [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Juergen Gross
2021-02-06 10:49 ` [PATCH 1/7] xen/events: reset affinity of 2-level event initially Juergen Gross
2021-02-06 11:20   ` Julien Grall
2021-02-06 12:09     ` Jürgen Groß
2021-02-06 12:19       ` Julien Grall
2021-02-06 10:49 ` [PATCH 2/7] xen/events: don't unmask an event channel when an eoi is pending Juergen Gross
2021-02-08 10:06   ` Jan Beulich
2021-02-08 10:21     ` Jürgen Groß
2021-02-08 10:15   ` Ross Lagerwall
2021-02-06 10:49 ` [PATCH 3/7] xen/events: fix lateeoi irq acknowledgment Juergen Gross
2021-02-06 10:49 ` [PATCH 4/7] xen/events: link interdomain events to associated xenbus device Juergen Gross
2021-02-08 23:26   ` Boris Ostrovsky
2021-02-09 13:55   ` Wei Liu
2021-02-06 10:49 ` [PATCH 5/7] xen/events: add per-xenbus device event statistics and settings Juergen Gross
2021-02-08 23:35   ` Boris Ostrovsky
2021-02-06 10:49 ` [PATCH 6/7] xen/evtch: use smp barriers for user event ring Juergen Gross
2021-02-08  9:38   ` Jan Beulich
2021-02-08  9:41     ` Jürgen Groß
2021-02-08  9:44   ` Andrew Cooper
2021-02-08  9:50     ` Jan Beulich
2021-02-08 10:23       ` Andrew Cooper
2021-02-08 10:25         ` Jürgen Groß
2021-02-08 10:31           ` Andrew Cooper
2021-02-08 10:36         ` Jan Beulich
2021-02-08 10:45           ` Andrew Cooper
2021-02-06 10:49 ` [PATCH 7/7] xen/evtchn: read producer index only once Juergen Gross
2021-02-08  9:48   ` Jan Beulich
2021-02-08 10:41     ` Jürgen Groß
2021-02-08 10:51       ` Jan Beulich
2021-02-08 10:59         ` Jürgen Groß
2021-02-08 11:50           ` Julien Grall
2021-02-08 11:54           ` Jan Beulich
2021-02-08 12:15             ` Jürgen Groß
2021-02-08 12:23               ` Jan Beulich
2021-02-08 12:26                 ` Jürgen Groß
2021-02-08 11:40   ` Julien Grall
2021-02-08 11:48     ` Jürgen Groß
2021-02-08 12:03       ` Julien Grall
2021-02-06 18:46 ` [PATCH 0/7] xen/events: bug fixes and some diagnostic aids Julien Grall
2021-02-07 12:58   ` Jürgen Groß
2021-02-08  9:11     ` Julien Grall [this message]
2021-02-08  9:41       ` Jürgen Groß
2021-02-08  9:54         ` Julien Grall
2021-02-08 10:22           ` Jürgen Groß
2021-02-08 10:40             ` Julien Grall
2021-02-08 12:14               ` Jürgen Groß
2021-02-08 12:16                 ` Julien Grall
2021-02-08 12:31                   ` Jürgen Groß
2021-02-08 13:09                     ` Julien Grall
2021-02-08 13:58                       ` Jürgen Groß
2021-02-08 14:20                         ` Julien Grall
2021-02-08 14:35                           ` Julien Grall
2021-02-08 14:50                           ` Jürgen Groß

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcf3181b-3efc-55f5-687c-324937b543e6@xen.org \
    --to=julien@xen.org \
    --cc=axboe@kernel.dk \
    --cc=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kuba@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wei.liu@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.