Re: [PATCH] vpci/msix: exit early if MSI-X is disabled

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH] vpci/msix: exit early if MSI-X is disabled
Date: Sun, 6 Dec 2020 12:15:48 +0100	[thread overview]
Message-ID: <20201206111548.nzefo2fx6bvspuj5@Air-de-Roger> (raw)
In-Reply-To: <cdb2a1ae-9ee7-6661-b69f-d2faacef2c12@suse.com>

Sorry, slightly sleep deprived, hope the reply below makes sense.

On Thu, Dec 03, 2020 at 02:40:28PM +0100, Jan Beulich wrote:
> On 02.12.2020 09:38, Jan Beulich wrote:
> > On 01.12.2020 18:40, Roger Pau Monne wrote:
> >> --- a/xen/drivers/vpci/msix.c
> >> +++ b/xen/drivers/vpci/msix.c
> >> @@ -357,7 +357,11 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
> >>           * so that it picks the new state.
> >>           */
> >>          entry->masked = new_masked;
> >> -        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
> >> +
> >> +        if ( !msix->enabled )
> >> +            break;
> >> +
> >> +        if ( !new_masked && !msix->masked && entry->updated )
> >>          {
> >>              /*
> >>               * If MSI-X is enabled, the function mask is not active, the entry
> > 
> > What about a "disabled" -> "enabled-but-masked" transition? This,
> > afaict, similarly won't trigger setting up of entries from
> > control_write(), and hence I'd expect the ASSERT() to similarly
> > trigger when subsequently an entry's mask bit gets altered.

This would only happen if the user hasn't written to the entry address
or data fields since initialization, or else the update field would be
set and then when clearing the entry mask bit in
PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET the entry will be properly setup.

> > I'd also be fine making this further adjustment, if you agree,
> > but the one thing I haven't been able to fully convince myself of
> > is that there's then still no need to set ->updated to true.
> 
> I've taken another look. I think setting ->updated (or something
> equivalent) is needed in that case, in order to not lose the
> setting of the entry mask bit. However, this would only defer the
> problem to control_write(): This would now need to call
> vpci_msix_arch_mask_entry() under suitable conditions, but avoid
> calling it when the entry is disabled or was never set up.

If the entry is masked control_write won't call update_entry, leaving
the entry updated bit as-is, thus deferring the call to update_entry
to further updates in PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET. I think this
is all fine.

> No
> matter whether making the setting of ->updated conditional, or
> adding a conditional call in update_entry(), we'd need to
> evaluate whether the entry is currently disabled. Imo, instead of
> introducing a new arch hook for this, it's easier to make
> vpci_msix_arch_mask_entry() tolerate getting called on a disabled
> entry. Below my proposed alternative change.

I think just setting the updated bit for all entries at initialization
would solve this, as this would then force a call to update_entry when
and entry us unmasked (by writes to PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET).

In any case the assert in vpci_msix_arch_mask_entry is a logic check,
IIRC calling it with an invalid pirq will just result in the function
being a no op as domain_spin_lock_irq_desc will return NULL.

> 
> While writing the description I started wondering why we require
> address or data fields to have got written before the first
> unmask. I don't think the hardware imposes such a requirement;
> zeros would be used instead, whatever this means. Let's not
> forget that it's only the primary purpose of MSI/MSI-X to
> trigger interrupts. Forcing the writes to go elsewhere in
> memory is not forbidden from all I know, and could be used by a
> driver. IOW I think ->updated should start out as set to true.
> But of course vpci_msi_update() then would need to check the
> upper address bits and avoid setting up an interrupt if they're
> not 0xfee. And further arrangements would be needed to have the
> guest requested write actually get carried out correctly.

Seems correct, albeit adding such logic seems to complicate the code
and expand the attack surface. IMO I wouldn't implement this unless we
know there's a real use case for this.

Thanks, Roger.