All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Juergen Gross <jgross@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [Xen-devel] [PATCH v2 1/2] x2APIC: translate IO-APIC entries when enabling the IOMMU
Date: Fri, 11 Oct 2019 11:28:40 +0200	[thread overview]
Message-ID: <20191011092840.GJ1389@Air-de-Roger.citrite.net> (raw)
In-Reply-To: <295ee1cd-33d6-41c8-93b8-2f7daf34de98@suse.com>

On Fri, Oct 11, 2019 at 09:52:35AM +0200, Jan Beulich wrote:
> On 10.10.2019 17:19, Roger Pau Monné  wrote:
> > On Thu, Oct 10, 2019 at 03:46:45PM +0200, Jan Beulich wrote:
> >> On 10.10.2019 15:12, Roger Pau Monné  wrote:
> >>> On Thu, Oct 10, 2019 at 02:55:02PM +0200, Jan Beulich wrote:
> >>>> On 10.10.2019 14:13, Roger Pau Monné  wrote:
> >>>>> On Thu, Oct 10, 2019 at 01:54:06PM +0200, Jan Beulich wrote:
> >>>>>> On 10.10.2019 13:33, Roger Pau Monne wrote:
> >>>>>>> When interrupt remapping is enabled as part of enabling x2APIC the
> >>>>>>
> >>>>>> Perhaps "unmasked" instead of "the"?
> >>>>>>
> >>>>>>> IO-APIC entries also need to be translated to the new format and added
> >>>>>>> to the interrupt remapping table.
> >>>>>>>
> >>>>>>> This prevents IOMMU interrupt remapping faults when booting on
> >>>>>>> hardware that has unmasked IO-APIC pins.
> >>>>>>
> >>>>>> But in the end it only papers over whatever the spurious interrupts
> >>>>>> result form, doesn't it? Which isn't to say this isn't an
> >>>>>> improvement. Calling out the ExtInt case here may be worthwhile as
> >>>>>> well, as would be pointing out that this case still won't work on
> >>>>>> AMD IOMMUs.
> >>>>>
> >>>>> But the fix for the ExtINT AMD issue should be done in
> >>>>> amd_iommu_ioapic_update_ire then, so that it can properly handle
> >>>>> ExtINT delivery mode, not to this part of the code. I will look
> >>>>> into it, but I think it's kind of tangential to the issue here.
> >>>>
> >>>> I'm not talking of you working on fixing this right away. I'm merely
> >>>> asking that you mention here (a) the ExtInt special case and (b)
> >>>> that this special case will (continue to) not work in the AMD case.
> >>>>
> >>>>>>> --- a/xen/arch/x86/apic.c
> >>>>>>> +++ b/xen/arch/x86/apic.c
> >>>>>>> @@ -515,7 +515,7 @@ static void resume_x2apic(void)
> >>>>>>>      iommu_enable_x2apic();
> >>>>>>>      __enable_x2apic();
> >>>>>>>  
> >>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>> +    restore_IO_APIC_setup(ioapic_entries, true);
> >>>>>>>      unmask_8259A();
> >>>>>>>  
> >>>>>>>  out:
> >>>>>>> @@ -961,7 +961,12 @@ void __init x2apic_bsp_setup(void)
> >>>>>>>          printk("Switched to APIC driver %s\n", genapic.name);
> >>>>>>>  
> >>>>>>>  restore_out:
> >>>>>>> -    restore_IO_APIC_setup(ioapic_entries);
> >>>>>>> +    /*
> >>>>>>> +     * NB: do not use raw mode when restoring entries if the iommu has been
> >>>>>>> +     * enabled during the process, because the entries need to be translated
> >>>>>>> +     * and added to the remapping table in that case.
> >>>>>>> +     */
> >>>>>>> +    restore_IO_APIC_setup(ioapic_entries, !x2apic_enabled);
> >>>>>>
> >>>>>> How is this different in the resume_x2apic() case? The IOMMU gets
> >>>>>> enabled in the course of that as well. I.e. I'd expect you want
> >>>>>> to pass "false" there, not "true".
> >>>>>
> >>>>> In the resume_x2apic case interrupt remapping should already be
> >>>>> enabled or not, but that function is not going to enable interrupt
> >>>>> remapping if it wasn't enabled before, hence the IO-APIC entries
> >>>>> should already be using the interrupt remapping table and no
> >>>>> translation is needed.
> >>>>
> >>>> Who / what would have enabled the IOMMU in the resume case?
> >>>
> >>> I don't think the question is who enables interrupt remapping in the
> >>> resume case (which is resume_x2apic when calling iommu_enable_x2apic
> >>> AFAICT), the point here is that on resume the entries in the IO-APIC
> >>> will already match the state of interrupt remapping, so they shouldn't
> >>> be translated. If interrupt remapping was off before suspend it will
> >>> still be off after resume, and there won't be any translation needed.
> >>> The same is true if interrupt remapping is on before suspend.
> >>
> >> I disagree: save_IO_APIC_setup() gets called from resume_x2apic(),
> >> not prior to suspend.
> > 
> > Oh, so maybe that's a misunderstanding on my side. I don't seem to be
> > able to find a statement about the contents of the IO-APIC registers
> > (and more specifically the entries) when getting back from
> > suspension. Are all entries cleared and masked?
> > 
> > Are the values previous to suspension stored?
> 
> See ioapic_suspend() / ioapic_resume(): Looks like there's some
> redundancy here - I don't think it makes sense for the LAPIC
> code to fiddle with all the RTEs if subsequently (I assume;
> didn't check) they'll all be overwritten anyway. I would seem
> more logical to me if they'd just all get masked for IOMMU
> enabling, deferring to ioapic_resume() for everything else.

Yes, it seems like resume_x2apic shouldn't need to play with the
entries at all TBH, just enabling interrupt remapping should be enough
given that the entries are already save and restored by
ioapic_{suspend/resume}.

Would you agree to leave that as-is in this patch, ie: always pass
true to restore_IO_APIC_setup in resume_x2apic?

> >>>>>> Also how would "x2apic_enabled" reflect the transition? It may
> >>>>>> have been "true" already before entering the function here.
> >>>>>
> >>>>> If x2apic_enabled == true at the point where restore_IO_APIC_setup
> >>>>> is called interrupt remapping must be enabled, because AFAICT at this
> >>>>> point it's not possible to have x2apic_enabled == true and interrupt
> >>>>> remapping disabled.
> >>>>>
> >>>>> The issue I can see here is what happens if interrupt remapping was
> >>>>> already enabled by the hardware, and entries in the IO-APIC are
> >>>>> already using the remapping table. I would have to look into how to
> >>>>> detect that case properly, but I think the proposed change is an
> >>>>> improvement overall.
> >>>>
> >>>> Firmware handing off with the IOMMU (and hence interrupt remapping)
> >>>> already enabled is a case which - afaict - we can't currently cope
> >>>> with. Firmware handing off in x2APIC enabled state is typically
> >>>> with the IOMMU (and hence interrupt remapping) still disabled. This
> >>>> is not a forbidden mode, it's just that in such a configuration
> >>>> interrupts can't be delivered to certain CPUs.
> >>>>
> >>>> In any event you need to properly distinguish x2APIC enabled state
> >>>> (or the transition thereof) from IOMMU / interrupt remapping
> >>>> enabled state (or the transition thereof). I.e. you want to avoid
> >>>> raw mode restore if interrupt remapping state transitioned from
> >>>> off to on in the process.
> >>>
> >>> Right, and that's why the call to restore_IO_APIC_setup in
> >>> x2apic_bsp_setup uses !x2apic_enabled as it's second parameter. If
> >>> interrupt remapping has been enabled by the call to
> >>> iommu_enable_x2apic x2apic_enabled must be true, and hence the entries
> >>> in the IO-APIC need to be translated to use the interrupt remapping
> >>> table. There's no path that can lead to restore_IO_APIC_setup with
> >>> interrupt remapping enabled and x2APIC mode disabled (or with x2APIC
> >>> enabled and interrupt remapping disabled).
> >>>
> >>> Hence if interrupt remapping is off before calling x2apic_bsp_setup
> >>> (which is what Xen expects to function properly) and x2apic_enabled ==
> >>> true when calling restore_IO_APIC_setup it means interrupt remapping
> >>> got enabled, and the IO-APIC entries need translation.
> >>
> >> But the code in question sits on a shared success/error path, and
> >> in the error case it matters whether x2apic_enabled was true already
> >> on entry.
> > 
> > But it's simply not possible to reach the call to
> > restore_IO_APIC_setup with x2apic_enabled == true and interrupt
> > remapping disabled, regardless of the initial value of
> > x2apic_enabled.
> > 
> > All the paths that could lead to this scenario are short-circuited
> > above with a panic.
> 
> Hmm, true. Nevertheless it would feel better if the conditionals
> were using what actually matters, rather than something derived.

Would you be OK to using something like v1:

https://lists.xenproject.org/archives/html/xen-devel/2019-10/msg00805.html

See the usage of iommu_enabled in x2apic_bsp_setup. I think using
x2apic_enabled is fine given the logic in the function, and the fact
that x2APIC mandates interrupt remapping, but I could live with that
extra local variable.

> >> I realize that io_apic_write() would suitably avoid going
> >> the remapping path, but I think it would be more clear if the
> >> distinction was already made properly at the call site.
> > 
> > I'm afraid I'm slightly loss, do you mean to replace the
> > ioapic_write_entry with an io_apic_write in restore_IO_APIC_setup?
> 
> No, because of ...
> 
> > That would be the same as always passing raw == false AFAICT.
> 
> ... this. I'm asking to pass in the argument for "raw" based
> on what you want, without relying on io_apic_write()'s
> behavior. That's more for code clarity than actual correctness,
> since - as said - io_apic_write() would invoke __io_apic_write()
> anyway.

Sorry, but I'm afraid I still don't fully understand what you mean
with this. restore_IO_APIC_setup uses ioapic_write_entry which in turn
uses __io_apic_write or io_apic_write. I could get rid of the usage of
ioapic_write_entry and just call __io_apic_write or io_apic_write (or
even directly call iommu_update_ire_from_apic), but I'm not sure
what's the benefit of any of this, it's just open-coding logic that's
done in the helpers.

Would you rather mean to rename the parameter of restore_IO_APIC_setup
from 'raw' to 'translate' or some such in order to clearly denote
there's a transformation done before writing the entry?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-10-11  9:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10 11:33 [Xen-devel] [PATCH v2 0/2] iommu: fixes for interrupt remapping enabling Roger Pau Monne
2019-10-10 11:33 ` [Xen-devel] [PATCH v2 1/2] x2APIC: translate IO-APIC entries when enabling the IOMMU Roger Pau Monne
2019-10-10 11:54   ` Jan Beulich
2019-10-10 12:13     ` Roger Pau Monné
2019-10-10 12:55       ` Jan Beulich
2019-10-10 13:12         ` Roger Pau Monné
2019-10-10 13:46           ` Jan Beulich
2019-10-10 15:19             ` Roger Pau Monné
2019-10-11  7:52               ` Jan Beulich
2019-10-11  9:28                 ` Roger Pau Monné [this message]
2019-10-11 10:01                   ` Jan Beulich
2019-10-11 10:36                     ` Roger Pau Monné
2019-10-10 13:29         ` Roger Pau Monné
2019-10-10 13:53           ` Jan Beulich
2019-10-10 11:33 ` [Xen-devel] [PATCH v2 2/2] iommu: translate IO-APIC pins when enabling interrupt remapping Roger Pau Monne
2019-10-10 11:59   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191011092840.GJ1389@Air-de-Roger.citrite.net \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.