linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
	vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org,
	bhelgaas@google.com, alex.williamson@redhat.com,
	jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com,
	baolu.lu@intel.com, kevin.tian@intel.com,
	sanjay.k.kumar@intel.com, tony.luck@intel.com,
	jing.lin@intel.com, dan.j.williams@intel.com,
	kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com,
	rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com,
	yan.y.zhao@linux.intel.com, pbonzini@redhat.com,
	samuel.ortiz@intel.com, mona.hossain@intel.com,
	dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm
Date: Fri, 09 Oct 2020 16:44:27 +0200	[thread overview]
Message-ID: <87v9fjtq5w.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20201008233210.GH4734@nvidia.com>

On Thu, Oct 08 2020 at 20:32, Jason Gunthorpe wrote:
> On Fri, Oct 09, 2020 at 01:17:38AM +0200, Thomas Gleixner wrote:
>> Thinking more about it, that very same thing will be needed for any
>> other IMS device and of course this is not going to end well because
>> some driver will fiddle with the PASID at the wrong time.
>
> Why? This looks like some quirk of the IDXD HW where it just randomly
> put PASID along with the IRQ mask register. Probably because PASID is
> not the full 32 bits.
>
> AFAIK the PASID is not tagged on the MemWr TLP triggering the
> interrupt, so it really is unrelated to the irq.

Right you are. With brain awake this does not make sense.

> I think the ioread to get the PASID is rather ugly, it should pluck
> the PASID out of some driver specific data structure with proper
> locking, and thus use the sleepable version of the irqchip?
>
> This is really not that different from what I was describing for queue
> contexts - the queue context needs to be assigned to the irq # before
> it can be used in the irq chip other wise there is no idea where to
> write the msg to. Just like pasid here.

Not really. In the IDXD case the storage is known when the host device
and the irq domain is initialized which is not the case for your variant
and it neither needs to send a magic command to the device to update the
data.

Due to the already explained racy behaviour of MSI affinity changes when
interrupt remapping is disabled, that async update would just not work
and I really don't want to see the resulting tinkering to paper over the
problem. TBH, even if that problem would not exist, my faith in driver
writers getting any of this correct is close to zero and I rather spend
a few brain cycles now than staring at the creative mess later.

So putting the brainfart of yesterday aside, we can simply treat this as
auxiliary data.

The patch below implements an interface for that and adds support to the
IMS driver. That interface is properly serialized and does not put any
restrictions on when this is called. (I know another interrupt chip
which could be simplified that way).

All the IDXD driver has to do is:

   auxval = ims_ctrl_pasid_aux(pasid, enabled);
   irq_set_auxdata(irqnr, IMS_AUXDATA_CONTROL_WORD, auxval);

I agree that irq_set_auxdata() is not the most elegant thing, but the
alternative solutions I looked at are just worse.

For now I just kept the data in the IMS storage which still requires an
ioread(), but these interrupts are not frequently masked and unmasked
during normal operation so performance is not a concern and caching that
value is an orthogonal optimization if someone cares.

Thanks,

        tglx

8<-------------

 drivers/irqchip/irq-ims-msi.c       |   35 +++++++++++++++++++++++++++++------
 include/linux/interrupt.h           |    2 ++
 include/linux/irq.h                 |    4 ++++
 include/linux/irqchip/irq-ims-msi.h |   25 ++++++++++++++++++++++++-
 kernel/irq/manage.c                 |   32 ++++++++++++++++++++++++++++++++
 5 files changed, 91 insertions(+), 7 deletions(-)

--- a/drivers/irqchip/irq-ims-msi.c
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -18,14 +18,19 @@ struct ims_array_data {
 	unsigned long		map[0];
 };
 
+static inline void iowrite32_and_flush(u32 value, void __iomem *addr)
+{
+	iowrite32(value, addr);
+	ioread32(addr);
+}
+
 static void ims_array_mask_irq(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
 	u32 __iomem *ctrl = &slot->ctrl;
 
-	iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_MASK, ctrl);
-	ioread32(ctrl); /* Flush write to device */
+	iowrite32_and_flush(ioread32(ctrl) | IMS_CTRL_VECTOR_MASKBIT, ctrl);
 }
 
 static void ims_array_unmask_irq(struct irq_data *data)
@@ -34,7 +39,7 @@ static void ims_array_unmask_irq(struct
 	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
 	u32 __iomem *ctrl = &slot->ctrl;
 
-	iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_MASK, ctrl);
+	iowrite32_and_flush(ioread32(ctrl) & ~IMS_CTRL_VECTOR_MASKBIT, ctrl);
 }
 
 static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
@@ -44,8 +49,24 @@ static void ims_array_write_msi_msg(stru
 
 	iowrite32(msg->address_lo, &slot->address_lo);
 	iowrite32(msg->address_hi, &slot->address_hi);
-	iowrite32(msg->data, &slot->data);
-	ioread32(slot);
+	iowrite32_and_flush(msg->data, &slot->data);
+}
+
+static int ims_array_set_auxdata(struct irq_data *data, unsigned int which,
+				 u64 auxval)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 val, __iomem *ctrl = &slot->ctrl;
+
+	if (which != IMS_AUXDATA_CONTROL_WORD)
+		return -EINVAL;
+	if (auxval & ~(u64)IMS_CONTROL_WORD_AUXMASK)
+		return -EINVAL;
+
+	val = ioread32(ctrl) & IMS_CONTROL_WORD_IRQMASK;
+	iowrite32_and_flush(val | (u32) auxval, ctrl);
+	return 0;
 }
 
 static const struct irq_chip ims_array_msi_controller = {
@@ -53,6 +74,7 @@ static const struct irq_chip ims_array_m
 	.irq_mask		= ims_array_mask_irq,
 	.irq_unmask		= ims_array_unmask_irq,
 	.irq_write_msi_msg	= ims_array_write_msi_msg,
+	.irq_set_auxdata	= ims_array_set_auxdata,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
@@ -62,7 +84,7 @@ static void ims_array_reset_slot(struct
 	iowrite32(0, &slot->address_lo);
 	iowrite32(0, &slot->address_hi);
 	iowrite32(0, &slot->data);
-	iowrite32(0, &slot->ctrl);
+	iowrite32_and_flush(IMS_CTRL_VECTOR_MASKBIT, &slot->ctrl);
 }
 
 static void ims_array_free_msi_store(struct irq_domain *domain,
@@ -97,6 +119,7 @@ static int ims_array_alloc_msi_store(str
 			goto fail;
 		set_bit(idx, ims->map);
 		entry->device_msi.priv_iomem = &ims->info.slots[idx];
+		ims_array_reset_slot(entry->device_msi.priv_iomem);
 		entry->device_msi.hwirq = idx;
 	}
 	return 0;
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -487,6 +487,8 @@ extern int irq_get_irqchip_state(unsigne
 extern int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
 				 bool state);
 
+int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val);
+
 #ifdef CONFIG_IRQ_FORCED_THREADING
 # ifdef CONFIG_PREEMPT_RT
 #  define force_irqthreads	(true)
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -481,6 +481,8 @@ static inline irq_hw_number_t irqd_to_hw
  *				irq_request_resources
  * @irq_compose_msi_msg:	optional to compose message content for MSI
  * @irq_write_msi_msg:	optional to write message content for MSI
+ * @irq_set_auxdata:	Optional function to update auxiliary data e.g. in
+ *			shared registers
  * @irq_get_irqchip_state:	return the internal state of an interrupt
  * @irq_set_irqchip_state:	set the internal state of a interrupt
  * @irq_set_vcpu_affinity:	optional to target a vCPU in a virtual machine
@@ -528,6 +530,8 @@ struct irq_chip {
 	void		(*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
 	void		(*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);
 
+	int		(*irq_set_auxdata)(struct irq_data *data, unsigned int which, u64 auxval);
+
 	int		(*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state);
 	int		(*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);
 
--- a/include/linux/irqchip/irq-ims-msi.h
+++ b/include/linux/irqchip/irq-ims-msi.h
@@ -5,6 +5,7 @@
 #define _LINUX_IRQCHIP_IRQ_IMS_MSI_H
 
 #include <linux/types.h>
+#include <linux/bits.h>
 
 /**
  * ims_hw_slot - The hardware layout of an IMS based MSI message
@@ -23,8 +24,30 @@ struct ims_slot {
 	u32	ctrl;
 } __packed;
 
+/*
+ * The IMS control word utilizes bit 0-2 for interrupt control. The remaining
+ * bits can contain auxiliary data.
+ */
+#define IMS_CONTROL_WORD_IRQMASK	GENMASK(2, 0)
+#define IMS_CONTROL_WORD_AUXMASK	GENMASK(31, 3)
+
 /* Bit to mask the interrupt in ims_hw_slot::ctrl */
-#define IMS_VECTOR_CTRL_MASK	0x01
+#define IMS_CTRL_VECTOR_MASKBIT		BIT(0)
+
+/* Auxiliary control word data related defines */
+enum {
+	IMS_AUXDATA_CONTROL_WORD,
+};
+
+#define IMS_CTRL_PASID_ENABLE		BIT(3)
+#define IMS_CTRL_PASID_SHIFT		12
+
+static inline u32 ims_ctrl_pasid_aux(unsigned int pasid, bool enable)
+{
+	u32 auxval = pasid << IMS_CTRL_PASID_SHIFT;
+
+	return enable ? auxval | IMS_CTRL_PASID_ENABLE : auxval;
+}
 
 /**
  * struct ims_array_info - Information to create an IMS array domain
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2752,3 +2752,35 @@ int irq_set_irqchip_state(unsigned int i
 	return err;
 }
 EXPORT_SYMBOL_GPL(irq_set_irqchip_state);
+
+/**
+ * irq_set_auxdata - Set auxiliary data
+ * @irq:	Interrupt to update
+ * @which:	Selector which data to update
+ * @auxval:	Auxiliary data value
+ *
+ * Function to update auxiliary data for an interrupt, e.g. to update data
+ * which is stored in a shared register or data storage (e.g. IMS).
+ */
+int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val)
+{
+	struct irq_desc *desc;
+	struct irq_data *data;
+	unsigned long flags;
+	int res = -ENODEV;
+
+	desc = irq_get_desc_buslock(irq, &flags, 0);
+	if (!desc)
+		return -EINVAL;
+
+	for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) {
+		if (data->chip->irq_set_auxdata) {
+			res = data->chip->irq_set_auxdata(data, which, val);
+			break;
+		}
+	}
+
+	irq_put_desc_busunlock(desc, flags);
+	return res;
+}
+EXPORT_SYMBOL_GPL(irq_set_auxdata);

  parent reply	other threads:[~2020-10-09 14:44 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <160021207013.67751.8220471499908137671.stgit@djiang5-desk3.ch.intel.com>
2020-09-15 23:27 ` [PATCH v3 01/18] irqchip: Add IMS (Interrupt Message Storage) driver Dave Jiang
2020-09-30 18:23   ` Thomas Gleixner
2020-10-01 22:59     ` Dey, Megha
2020-09-15 23:27 ` [PATCH v3 02/18] iommu/vt-d: Add DEV-MSI support Dave Jiang
2020-09-30 18:32   ` Thomas Gleixner
2020-10-01 23:26     ` Dey, Megha
2020-10-02 11:13       ` Thomas Gleixner
2020-10-08  7:54     ` David Woodhouse
2020-10-20 21:42       ` Dey, Megha
2020-09-15 23:28 ` [PATCH v3 05/18] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-09-30 18:47   ` Thomas Gleixner
2020-09-30 18:51     ` Jason Gunthorpe
2020-09-30 21:48       ` Thomas Gleixner
2020-09-30 21:49       ` Raj, Ashok
2020-09-30 21:57         ` Thomas Gleixner
2020-10-01  1:07           ` Raj, Ashok
2020-10-01  8:44             ` Thomas Gleixner
2020-09-30 22:38         ` Jason Gunthorpe
2020-10-01 20:48     ` Dave Jiang
2020-09-15 23:28 ` [PATCH v3 07/18] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-09-15 23:28 ` [PATCH v3 10/18] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-09-15 23:28 ` [PATCH v3 12/18] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-09-15 23:29 ` [PATCH v3 13/18] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-09-15 23:29 ` [PATCH v3 14/18] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-09-15 23:29 ` [PATCH v3 15/18] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-09-15 23:29 ` [PATCH v3 16/18] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-09-17 15:06 ` [PATCH v3 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver Jason Gunthorpe
2020-09-17 17:15   ` Dave Jiang
2020-09-17 17:27     ` Jason Gunthorpe
2020-09-17 17:30     ` Alex Williamson
2020-09-17 17:37       ` Jason Gunthorpe
     [not found] ` <160021248280.67751.12525558281536923518.stgit@djiang5-desk3.ch.intel.com>
2020-09-30 18:36   ` [PATCH v3 04/18] dmaengine: idxd: add interrupt handle request support Thomas Gleixner
2020-10-01 20:16     ` Dave Jiang
     [not found] ` <160021253189.67751.12686144284999931703.stgit@djiang5-desk3.ch.intel.com>
2020-09-30 19:57   ` [PATCH v3 11/18] dmaengine: idxd: ims setup for the vdcm Thomas Gleixner
2020-10-07 21:54     ` Dave Jiang
2020-10-08  7:39       ` Thomas Gleixner
2020-10-08 16:51         ` Dave Jiang
2020-10-08 23:17           ` Thomas Gleixner
2020-10-08 23:32             ` Jason Gunthorpe
2020-10-09  0:27               ` Dave Jiang
2020-10-09  1:22               ` Raj, Ashok
2020-10-09 11:57                 ` Jason Gunthorpe
2020-10-09 12:43                   ` Raj, Ashok
2020-10-09 12:49                     ` Jason Gunthorpe
2020-10-09 13:02                       ` Raj, Ashok
2020-10-09 13:12                         ` Jason Gunthorpe
2020-10-09 13:40                           ` Raj, Ashok
2020-10-09 14:44               ` Thomas Gleixner [this message]
2020-10-09 14:52                 ` Jason Gunthorpe
2020-10-09 16:02                   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9fjtq5w.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jing.lin@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=mona.hossain@intel.com \
    --cc=netanelg@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=samuel.ortiz@intel.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).