dri-devel Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	boris.ostrovsky@oracle.com, "Jürgen Groß" <jgross@suse.com>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Karthikeyan Mitran <m.karthikeyan@mobiveil.co.in>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	dri-devel@lists.freedesktop.org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	Will Deacon <will@kernel.org>,
	Michal Simek <michal.simek@xilinx.com>,
	linux-s390@vger.kernel.org,
	afzal mohammed <afzal.mohd.ma@gmail.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Dave Jiang <dave.jiang@intel.com>,
	Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
	Helge Deller <deller@gmx.de>,
	Russell King <linux@armlinux.org.uk>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	linux-pci@vger.kernel.org, xen-devel@lists.xenproject.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Wambui Karuga <wambui.karugax@gmail.com>,
	Allen Hubbe <allenbh@gmail.com>, David Airlie <airlied@linux.ie>,
	linux-gpio@vger.kernel.org,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Lee Jones <lee.jones@linaro.org>,
	linux-arm-kernel@lists.infradead.org,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	linux-parisc@vger.kernel.org,
	Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>,
	Hou Zhiqiang <Zhiqiang.Hou@nxp.com>,
	Tariq Toukan <tariqt@nvidia.com>, Jon Mason <jdmason@kudzu.us>,
	linux-ntb@googlegroups.com, intel-gfx@lists.freedesktop.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [patch 27/30] xen/events: Only force affinity mask for percpu interrupts
Date: Fri, 11 Dec 2020 23:56:40 +0100
Message-ID: <87y2i4eytz.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <edbedd7a-4463-d934-73c9-fa046c19cf6d@citrix.com>

Andrew,

On Fri, Dec 11 2020 at 22:21, Andrew Cooper wrote:
> On 11/12/2020 21:27, Thomas Gleixner wrote:
>> It's not any different from the hardware example at least not as far as
>> I understood the code.
>
> Xen's event channels do have a couple of quirks.

Why am I not surprised?

> Binding an event channel always results in one spurious event being
> delivered.  This is to cover notifications which can get lost during the
> bidirectional setup, or re-setups in certain configurations.
>
> Binding an interdomain or pirq event channel always defaults to vCPU0. 
> There is no way to atomically set the affinity while binding.  I believe
> the API predates SMP guest support in Xen, and noone has fixed it up
> since.

That's fine. I'm not changing that.

What I'm changing is the unwanted and unnecessary overwriting of the
actual affinity mask.

We have a similar issue on real hardware where we can only target _one_
CPU and not all CPUs in the affinity mask. So we still can preserve the
(user) requested mask and just affine it to one CPU which is reflected
in the effective affinity mask. This the right thing to do for two
reasons:

   1) It allows proper interrupt distribution

   2) It does not break (user) requested affinity when the effective
      target CPU goes offline and the affinity mask still contains
      online CPUs. If you overwrite it you lost track of the requested
      broader mask.

> As a consequence, the guest will observe the event raised on vCPU0 as
> part of setting up the event, even if it attempts to set a different
> affinity immediately afterwards.  A little bit of care needs to be taken
> when binding an event channel on vCPUs other than 0, to ensure that the
> callback is safe with respect to any remaining state needing
> initialisation.

That's preserved for all non percpu interrupts. The percpu variant of
VIRQ and IPIs did binding to vCPU != 0 already before this change.

> Beyond this, there is nothing magic I'm aware of.
>
> We have seen soft lockups before in certain scenarios, simply due to the
> quantity of events hitting vCPU0 before irqbalance gets around to
> spreading the load.  This is why there is an attempt to round-robin the
> userspace event channel affinities by default, but I still don't see why
> this would need custom affinity logic itself.

Just the previous attempt makes no sense for the reasons I outlined in
the changelog. So now with this new spreading mechanics you get the
distribution for all cases:

  1) Post setup using and respecting the default affinity mask which can
     be set as a kernel commandline parameter.

  2) Runtime (user) requested affinity change with a mask which contains
     more than one vCPU. The previous logic always chose the first one
     in the mask.

     So assume userspace affines 4 irqs to a CPU 0-3 and 4 irqs to CPU
     4-7 then 4 irqs end up on CPU0 and 4 on CPU4

     The new algorithm which is similar to what we have on x86 (minus
     the vector space limitation) picks the CPU which has the least
     number of channels affine to it at that moment. If e.g. all 8 CPUs
     have the same number of vectors before that change then in the
     example above the first 4 are spread to CPU0-3 and the second 4 to
     CPU4-7

Thanks,

        tglx
   
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply index

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10 19:25 [patch 00/30] genirq: Treewide hunt for irq descriptor abuse and assorted fixes Thomas Gleixner
2020-12-10 19:25 ` [patch 01/30] genirq: Move irq_has_action() into core code Thomas Gleixner
2020-12-10 19:25 ` [patch 02/30] genirq: Move status flag checks to core Thomas Gleixner
2020-12-27 19:20   ` Guenter Roeck
2021-01-11 10:14     ` Thomas Gleixner
2020-12-10 19:25 ` [patch 03/30] genirq: Move irq_set_lockdep_class() " Thomas Gleixner
2020-12-11 17:53   ` Andy Shevchenko
2020-12-11 21:08     ` Thomas Gleixner
2020-12-11 22:07       ` Thomas Gleixner
2020-12-12 13:22         ` Andy Shevchenko
2020-12-10 19:25 ` [patch 04/30] genirq: Provide irq_get_effective_affinity() Thomas Gleixner
2020-12-10 19:25 ` [patch 05/30] genirq: Annotate irq stats data races Thomas Gleixner
2020-12-10 19:25 ` [patch 06/30] parisc/irq: Simplify irq count output for /proc/interrupts Thomas Gleixner
2020-12-10 19:25 ` [patch 07/30] genirq: Make kstat_irqs() static Thomas Gleixner
2020-12-10 19:25 ` [patch 08/30] genirq: Provide kstat_irqdesc_cpu() Thomas Gleixner
2020-12-10 19:25 ` [patch 09/30] ARM: smp: Use irq_desc_kstat_cpu() in show_ipi_list() Thomas Gleixner
2020-12-11 18:08   ` Marc Zyngier
2020-12-10 19:25 ` [patch 10/30] arm64/smp: Use irq_desc_kstat_cpu() in arch_show_interrupts() Thomas Gleixner
2020-12-11 18:08   ` Marc Zyngier
2020-12-10 19:25 ` [patch 11/30] parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts() Thomas Gleixner
2020-12-10 19:25 ` [patch 12/30] s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt() Thomas Gleixner
2020-12-10 20:31   ` Heiko Carstens
2020-12-10 19:25 ` [patch 13/30] drm/i915/lpe_audio: Remove pointless irq_to_desc() usage Thomas Gleixner
2020-12-10 19:48   ` [Intel-gfx] " Ville Syrjälä
2020-12-11  9:51     ` Jani Nikula
2020-12-10 19:25 ` [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy Thomas Gleixner
2020-12-11  9:54   ` Jani Nikula
2020-12-11 10:13   ` Tvrtko Ursulin
2020-12-11 12:57     ` Thomas Gleixner
2020-12-11 14:19       ` David Laight
2020-12-11 21:10         ` Thomas Gleixner
2020-12-11 22:06           ` David Laight
2020-12-10 19:25 ` [patch 15/30] pinctrl: nomadik: Use irq_has_action() Thomas Gleixner
2020-12-12  0:45   ` Linus Walleij
2020-12-10 19:25 ` [patch 16/30] mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc Thomas Gleixner
2020-12-11  8:22   ` Linus Walleij
2020-12-11 10:04   ` Lee Jones
2020-12-11 18:12   ` Andy Shevchenko
2020-12-10 19:25 ` [patch 17/30] NTB/msi: Use irq_has_action() Thomas Gleixner
2020-12-10 20:33   ` Logan Gunthorpe
2020-12-10 19:25 ` [patch 18/30] PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() Thomas Gleixner
2020-12-10 22:56   ` Rob Herring
2020-12-10 19:25 ` [patch 19/30] PCI: mobiveil: " Thomas Gleixner
2020-12-10 22:56   ` Rob Herring
2020-12-10 19:25 ` [patch 20/30] net/mlx4: Replace irq_to_desc() abuse Thomas Gleixner
2020-12-13 11:24   ` Tariq Toukan
2020-12-10 19:25 ` [patch 21/30] net/mlx4: Use effective interrupt affinity Thomas Gleixner
2020-12-13 11:31   ` Tariq Toukan
2020-12-10 19:25 ` [patch 22/30] net/mlx5: Replace irq_to_desc() abuse Thomas Gleixner
2020-12-13 11:34   ` Tariq Toukan
2020-12-14 21:13   ` Saeed Mahameed
2020-12-10 19:25 ` [patch 23/30] net/mlx5: Use effective interrupt affinity Thomas Gleixner
2020-12-13 11:35   ` Tariq Toukan
2020-12-14 20:58   ` Saeed Mahameed
2020-12-10 19:26 ` [patch 24/30] xen/events: Remove unused bind_evtchn_to_irq_lateeoi() Thomas Gleixner
2020-12-10 23:19   ` boris.ostrovsky
2020-12-11  0:04     ` Thomas Gleixner
2020-12-10 19:26 ` [patch 25/30] xen/events: Remove disfunct affinity spreading Thomas Gleixner
2020-12-10 19:26 ` [patch 26/30] xen/events: Use immediate affinity setting Thomas Gleixner
2020-12-10 19:26 ` [patch 27/30] xen/events: Only force affinity mask for percpu interrupts Thomas Gleixner
2020-12-10 23:20   ` boris.ostrovsky
2020-12-11  0:06     ` Thomas Gleixner
2020-12-11  6:17     ` Jürgen Groß
     [not found]       ` <874kksiras.fsf@nanos.tec.linutronix.de>
2020-12-11 10:23         ` Jürgen Groß
2020-12-11 12:10     ` Jürgen Groß
2020-12-11 12:37       ` Thomas Gleixner
2020-12-11 14:29         ` boris.ostrovsky
2020-12-11 21:27           ` Thomas Gleixner
2020-12-11 22:21             ` Andrew Cooper
2020-12-11 22:56               ` Thomas Gleixner [this message]
2020-12-10 19:26 ` [patch 28/30] xen/events: Reduce irq_info::spurious_cnt storage size Thomas Gleixner
2020-12-10 19:26 ` [patch 29/30] xen/events: Implement irq distribution Thomas Gleixner
2020-12-10 19:26 ` [patch 30/30] genirq: Remove export of irq_to_desc() Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2i4eytz.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=Zhiqiang.Hou@nxp.com \
    --cc=afzal.mohd.ma@gmail.com \
    --cc=airlied@linux.ie \
    --cc=allenbh@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bhelgaas@google.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=borntraeger@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=dave.jiang@intel.com \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hca@linux.ibm.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jdmason@kudzu.us \
    --cc=jgross@suse.com \
    --cc=kuba@kernel.org \
    --cc=lee.jones@linaro.org \
    --cc=leon@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-gpio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntb@googlegroups.com \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=m.karthikeyan@mobiveil.co.in \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=michal.simek@xilinx.com \
    --cc=netdev@vger.kernel.org \
    --cc=pankaj.laxminarayan.bharadiya@intel.com \
    --cc=peterz@infradead.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=saeedm@nvidia.com \
    --cc=sstabellini@kernel.org \
    --cc=tariqt@nvidia.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    --cc=wambui.karugax@gmail.com \
    --cc=will@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

dri-devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dri-devel/0 dri-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dri-devel dri-devel/ https://lore.kernel.org/dri-devel \
		dri-devel@lists.freedesktop.org
	public-inbox-index dri-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.dri-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git