IOMMU Archive on lore.kernel.org
 help / color / Atom feed
From: Stephane Eranian via iommu <iommu@lists.linux-foundation.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	Wincy Van <fanwenyi0529@gmail.com>,
	Ashok Raj <ashok.raj@intel.com>, x86 <x86@kernel.org>,
	Andi Kleen <andi.kleen@intel.com>, Borislav Petkov <bp@suse.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Juergen Gross <jgross@suse.com>, Tony Luck <tony.luck@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	iommu@lists.linux-foundation.org,
	Jacob Pan <jacob.jun.pan@intel.com>,
	Philippe Ombredanne <pombredanne@nexb.com>
Subject: Re: [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog
Date: Mon, 17 Jun 2019 14:38:14 -0700
Message-ID: <CABPqkBTai76Bgb4E61tF-mJUkFNxVa4B8M2bxTEYVgBsuAANNQ@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1906171007360.1760@nanos.tec.linutronix.de>

Hi,

On Mon, Jun 17, 2019 at 1:25 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sun, 16 Jun 2019, Thomas Gleixner wrote:
> > On Thu, 23 May 2019, Ricardo Neri wrote:
> > > When the hardlockup detector is enabled, the function
> > > hld_hpet_intremapactivate_irq() activates the recently created entry
> > > in the interrupt remapping table via the modify_irte() functions. While
> > > doing this, it specifies which CPU the interrupt must target via its APIC
> > > ID. This function can be called every time the destination iD of the
> > > interrupt needs to be updated; there is no need to allocate or remove
> > > entries in the interrupt remapping table.
> >
> > Brilliant.
> >
> > > +int hld_hpet_intremap_activate_irq(struct hpet_hld_data *hdata)
> > > +{
> > > +   u32 destid = apic->calc_dest_apicid(hdata->handling_cpu);
> > > +   struct intel_ir_data *data;
> > > +
> > > +   data = (struct intel_ir_data *)hdata->intremap_data;
> > > +   data->irte_entry.dest_id = IRTE_DEST(destid);
> > > +   return modify_irte(&data->irq_2_iommu, &data->irte_entry);
> >
> > This calls modify_irte() which does at the very beginning:
> >
> >    raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
> >
> > How is that supposed to work from NMI context? Not to talk about the
> > other spinlocks which are taken in the subsequent call chain.
> >
> > You cannot call in any of that code from NMI context.
> >
> > The only reason why this never deadlocked in your testing is that nothing
> > else touched that particular iommu where the HPET hangs off concurrently.
> >
> > But that's just pure luck and not design.
>
> And just for the record. I warned you about that problem during the review
> of an earlier version and told you to talk to IOMMU folks whether there is
> a way to update the entry w/o running into that lock problem.
>
> Can you tell my why am I actually reviewing patches and spending time on
> this when the result is ignored anyway?
>
> I also tried to figure out why you went away from the IPI broadcast
> design. The only information I found is:
>
> Changes vs. v1:
>
>  * Brought back the round-robin mechanism proposed in v1 (this time not
>    using the interrupt subsystem). This also requires to compute
>    expiration times as in v1 (Andi Kleen, Stephane Eranian).
>
> Great that there is no trace of any mail from Andi or Stephane about this
> on LKML. There is no problem with talking offlist about this stuff, but
> then you should at least provide a rationale for those who were not part of
> the private conversation.
>
Let me add some context to this whole patch series. The pressure on
the core PMU counters
is increasing as more people want to use them to measure always more
events. When the PMU
is overcommitted, i.e., more events than counters for them, there is
multiplexing. It comes
with an overhead that is too high for certain applications. One way to
avoid this is to lower the
multiplexing frequency, which is by default 1ms, but that comes with
loss of accuracy. Another approach is
to measure only a small number of events at a time and use multiple
runs, but then you lose consistent event
view. Another approach is to push for increasing the number of
counters. But getting new hardware
counters takes time. Short term, we can investigate what it would take
to free one cycle-capable
counter which is commandeered by the hard lockup detector on all X86
processors today. The functionality
of the watchdog, being able to get a crash dump on kernel deadlocks,
is important and we cannot simply
disable it. At scale, many bugs are exposed and thus machines
deadlock. Therefore, we want to investigate
what it would take to move the detector to another NMI-capable source,
such as the HPET because the
detector does not need high low granularity timer and interrupts only every 2s.

Furthermore, recent Intel erratum, e.g., the TSX issue forcing the TFA
code in perf_events, have increased the pressure
even more with only 3 generic counters left. Thus, it is time to look
at alternative ways of  getting a hard lockup detector
(NMI watchdog) from another NMI source than the PMU. To that extent, I
have been discussing about alternatives.
Intel suggested using the HPET and Ricardo has been working on
producing this patch series. It is clear from your review
that the patches have issues, but I am hoping that they can be
resolved with constructive feedback knowing what the end goal is.

As for the round-robin changes, yes, we discussed this as an
alternative to avoid overloading CPU0 with handling
all of the work to broadcasting IPI to 100+ other CPUs.

Thanks.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply index

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-24  1:16 [RFC PATCH v4 00/21] Implement an HPET-based hardlockup detector Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 01/21] x86/msi: Add definition for NMI delivery mode Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 02/21] x86/hpet: Expose hpet_writel() in header Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 03/21] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
2019-06-14 15:54   ` Thomas Gleixner
2019-06-14 15:59     ` Thomas Gleixner
2019-06-18 22:48     ` Ricardo Neri
2019-06-18 23:13       ` Thomas Gleixner
2019-05-24  1:16 ` [RFC PATCH v4 04/21] x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes Ricardo Neri
2019-06-14 18:17   ` Thomas Gleixner
2019-06-18 22:48     ` Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector Ricardo Neri
2019-06-11 19:54   ` Thomas Gleixner
2019-06-14  1:14     ` Ricardo Neri
2019-06-14 16:10       ` Thomas Gleixner
2019-06-18 22:48         ` Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 06/21] x86/hpet: Configure the timer used by the " Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 07/21] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 08/21] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 09/21] x86/nmi: Add a NMI_WATCHDOG NMI handler category Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 10/21] watchdog/hardlockup: Add function to enable NMI watchdog on all allowed CPUs at once Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 11/21] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs Ricardo Neri
2019-06-11 20:11   ` Thomas Gleixner
2019-06-18 22:46     ` Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 13/21] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 14/21] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 15/21] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 16/21] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 17/21] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable Ricardo Neri
2019-06-07  0:35   ` Stephane Eranian via iommu
2019-06-07 14:14     ` Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 18/21] x86/apic: Add a parameter for the APIC delivery mode Ricardo Neri
2019-06-16  9:55   ` Thomas Gleixner
2019-06-18 22:47     ` Ricardo Neri
2019-06-18 23:15       ` Thomas Gleixner
2019-05-24  1:16 ` [RFC PATCH v4 19/21] iommu/vt-d: Rework prepare_irte() to support per-irq " Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog Ricardo Neri
2019-06-16 18:42   ` Thomas Gleixner
2019-06-16 19:21   ` Thomas Gleixner
2019-06-17  8:25     ` Thomas Gleixner
2019-06-17 21:38       ` Stephane Eranian via iommu [this message]
2019-06-17 23:08         ` Thomas Gleixner
2019-06-19 15:43           ` Jacob Pan
2019-06-21 15:33             ` Thomas Gleixner
2019-06-21 17:31               ` Jacob Pan
2019-06-21 18:39                 ` Jacob Pan
2019-06-21 20:05                   ` Thomas Gleixner
2019-06-21 23:55                     ` Ricardo Neri
2019-06-22  7:21                       ` Thomas Gleixner
2019-06-18 22:45       ` Ricardo Neri
2019-05-24  1:16 ` [RFC PATCH v4 21/21] x86/watchdog/hardlockup/hpet: Support interrupt remapping Ricardo Neri
2019-06-16  8:44   ` Thomas Gleixner
2019-06-16  8:53     ` Thomas Gleixner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPqkBTai76Bgb4E61tF-mJUkFNxVa4B8M2bxTEYVgBsuAANNQ@mail.gmail.com \
    --to=iommu@lists.linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=bp@suse.de \
    --cc=ebiederm@xmission.com \
    --cc=eranian@google.com \
    --cc=fanwenyi0529@gmail.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jgross@suse.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pombredanne@nexb.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=rdunlap@infradead.org \
    --cc=ricardo.neri-calderon@linux.intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org iommu@archiver.kernel.org
	public-inbox-index linux-iommu


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/ public-inbox