From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D083C433E1 for ; Tue, 11 Aug 2020 22:39:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 481172076C for ; Tue, 11 Aug 2020 22:39:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="oUCmqHFS"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="bwYqv+uH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726235AbgHKWjO (ORCPT ); Tue, 11 Aug 2020 18:39:14 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:48250 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726143AbgHKWjO (ORCPT ); Tue, 11 Aug 2020 18:39:14 -0400 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1597185551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b/XIyCsraF0XDGxNSIkgMuCOFCu+n+fIBiRqibux3Wc=; b=oUCmqHFSzGzni1o6gpYc6yml07aT0CEIi015tYdVAgMTHwNDT5PN7prv4h7APKms8DJiEF Qn1l16O5xvmJNDeiGSWNbc2Av++opG9Ipda2xxueEgI1aa9ZO3qGMJnlS8Y71aplCnGJXr apfXKVatTYXxYTZXa6N7knbFbs4BT+y2tZi4G81tq1kQdHiBX1qkbnZ2iVhGxwTuit5HJ3 kcJIdbxbmJF4kMVC1Z4qWf2qoPEtF0N7poTP5OsmmRLzaY2uvNtrBREkR3Ki2fvF4V0X9o SwEB6l5/MJ6Be6+Q0IjRyq4r4jtSkHdVWpF1FAfwDETSckytOOeUZ0vtA/zHTQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1597185551; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b/XIyCsraF0XDGxNSIkgMuCOFCu+n+fIBiRqibux3Wc=; b=bwYqv+uHfJFpFgqEXz6cu4pwyFOjfsGUOz0wb8CCBhky4vKUVECRl8RVLFpMaQsOMJOfUk 4a+Py/p+A2duoBBw== To: "Dey\, Megha" , Jason Gunthorpe , "gregkh\@linuxfoundation.org" Cc: Marc Zyngier , "Jiang\, Dave" , "vkoul\@kernel.org" , "bhelgaas\@google.com" , "rafael\@kernel.org" , "hpa\@zytor.com" , "alex.williamson\@redhat.com" , "Pan\, Jacob jun" , "Raj\, Ashok" , "Liu\, Yi L" , "Lu\, Baolu" , "Tian\, Kevin" , "Kumar\, Sanjay K" , "Luck\, Tony" , "Lin\, Jing" , "Williams\, Dan J" , "kwankhede\@nvidia.com" , "eric.auger\@redhat.com" , "parav\@mellanox.com" , "Hansen\, Dave" , "netanelg\@mellanox.com" , "shahafs\@mellanox.com" , "yan.y.zhao\@linux.intel.com" , "pbonzini\@redhat.com" , "Ortiz\, Samuel" , "Hossain\, Mona" , "dmaengine\@vger.kernel.org" , "linux-kernel\@vger.kernel.org" , "x86\@kernel.org" , "linux-pci\@vger.kernel.org" , "kvm\@vger.kernel.org" Subject: Re: [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain In-Reply-To: <996854e7-ab11-b76b-64dc-b760b3ad2365@intel.com> References: <87h7tcgbs2.fsf@nanos.tec.linutronix.de> <996854e7-ab11-b76b-64dc-b760b3ad2365@intel.com> Date: Wed, 12 Aug 2020 00:39:10 +0200 Message-ID: <878sek7qpd.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org Megha. "Dey, Megha" writes: > On 8/8/2020 12:47 PM, Thomas Gleixner wrote: >>> 3. Come up with a ground up approach which adheres to the layering >>> constraints of the IRQ subsystem >> Yes. It's something which can be used by all devices which have: >> >> 1) A device specific irq chip implementation including a msi write function >> 2) Device specific resource management (slots in the IMS case) >> >> The infrastructure you need is basically a wrapper around the core MSI >> domain (similar to PCI, platform-MSI etc,) which provides the specific >> functionality to handle the above. > > ok, i will create a per device irq chip which will directly have the > device specific callbacks instead of another layer of redirection. > > This way i will get rid of the 'platform_msi_ops' data structure. > > I am not sure what you mean by device specific resource management, are > you referring to dev_msi_alloc/free_irqs? I think I gave you a hint: >> 2) Device specific resource management (slots in the IMS case) The IMS storage is an array with slots in the IDXD case and these slots are assigned at interrupt allocation time, right? In other cases where the IMS storage is in some other place, e.g. queue memory, then this still needs to associated to the interrupt at allocation time. But of course because you create some disconnected irqdomain you have to do that assignment seperately on the side and then stick this information into msi_desc after the fact. And as this is device specific every device driver which utilizes IMS has to do that which is bonkers at best. >> 2) A slot allocation or association function and their 'free' >> counterpart (irq_domain_ops) > > This is one part I didn't understand. > > Currently my dev_msi_alloc_irqs is simply a wrapper over > platform_msi_domain_alloc_irqs which again mostly calls > msi_domain_alloc_irqs. > > When you say add a .alloc, .free, does this mean we should add a device > specific alloc/free and not use the default > msi_domain_alloc/msi_domain_free? > > I don't see anything device specific to be done for IDXD atleast, can > you please let me know? Each and every time I mentioned this, I explicitely mentioned "slot allocation (array) or slot association (IMS store is not in an array)". But you keep asking what's device specific over and over and where resource management is? The storage slot array is a resoruce which needs to be managed and it is device specific by specification. And it's part of the interrupt allocation obviously because without a place to store the MSI message the whole thing would not work. This slot does not come out of thin air, right? https://github.com/intel/idxd-driver/commit/fb9a2f4e36525a1f18d9e654d472aa87a9adcb30 int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) { struct idxd_device *idxd = vidxd->idxd; struct ims_irq_entry *irq_entry; struct mdev_device *mdev = vidxd->vdev.mdev; struct device *dev = mdev_dev(mdev); struct msi_desc *desc; int err, i = 0; int index; /* * MSIX vec 0 is emulated by the vdcm and does not take up an IMS. The total MSIX vecs used * by the mdev will be total IMS + 1. vec 0 is used for misc interrupts such as command * completion, error notification, PMU, etc. The other vectors are used for descriptor * completion. Thus only the number of IMS vectors need to be allocated, which is * VIDXD_MAX_MSIX_VECS - 1. */ err = dev_msi_domain_alloc_irqs(dev, VIDXD_MAX_MSIX_VECS - 1, &idxd_ims_ops); if (err < 0) { dev_dbg(dev, "Enabling IMS entry! %d\n", err); return err; } i = 0; for_each_msi_entry(desc, dev) { index = idxd_alloc_ims_index(idxd); if (index < 0) { err = index; break; } vidxd->ims_index[i] = index; irq_entry = &vidxd->irq_entries[i]; irq_entry->vidxd = vidxd; irq_entry->int_src = i; irq_entry->irq = desc->irq; i++; } if (err) vidxd_free_ims_entries(vidxd); return 0; } idxd_alloc_ims_index() is an allocation, right? And the above aside of having 3 redundant levels of storage for exactly the same information is just a violation of all layering concepts at once. I just wish I've never seen that code. >> Again. Look at the layering. What you created now is a pseudo shared >> domain which needs >> >> 1) An indirection layer for providing device specific functions >> >> 2) An extra allocation layer in the device specific driver to assign >> IMS slots completely outside of the domain allocation mechanism. > hmmm, again I am not sure of which extra allocation layer you are > referring to.. See above. >> The infrastructure itself is not more than a thin wrapper around the >> existing msi domain infrastructure and might even share code with >> platform-msi. > > From your explanation: > > In the device driver: > > static const struct irq_domain_ops idxd_irq_domain_ops = { > > .alloc= idxd_domain_alloc, //not sure what this should do You might know by now. Also it's not necessarily the .alloc callback which needs to be implemented. As I said we can add ops if necessary and if it makes sense. This needs some thoughts to provide proper layering and for sharing as much code as possible. > except the alloc/free irq_domain_ops, does this look fine to you? It's at least heading into the right direction. But before we talk about the details at this level the device::msi_domain pointer issue wants to be resolved. It's part of the solution to share code at various levels and to make utilization of this technology as simple as possible for driver writers. We need to think about infrastructure which can be used by various types of IMS devices, e.g. those with storage arrays and this with storage in random places, like the network card Jason was talking about. And to get there we need to do the house cleaning first. Also if you do a proper infrastructure then you need exactly ONE implementation of an irqdomain and an irqchip for devices which have a IMS slot storage array. Every driver for a device which has this kind of storage can reuse that even with different array sizes. If done right then your IDXD driver needs: idxd->domain = create_ims_array_domain(...., basepointer, max_slots); in the init function of the control block. create_ims_array_domain() is not part of IDXD, it's a common irq domain/irq chip implementation which deals with IMS slot storage arrays of arbitrary size. And then when creating a subdevice you do: subdevice->msi_domain = idxd->domain; and to allocate the interrupts you just do: device_msi_alloc_irqs(subdevice, nrirqs); and device_msi_alloc_irqs() is shared infrastructure which has nothing to do with idxd or the ims array domain. The same can be done for devices which have their storage embedded into whatever other data structure on the device, e.g. queue memory, and share the same message storage layout. And we need to put thoughts into the shared infrastructure upfront because all of this can also be used on bare metal. The next thing you completely missed is to think about the ability to support managed interrupts which we have in PCI/MSIX today. Its just a matter of time that a IMS device comes along which want's it's subdevice interrupts managed properly when running on bare metal. Can we please just go back to proper engineering and figure out how to create something which is not just yet another half baken works for IDXD "solution"? This means we need a proper decription of possible IMS usage scenarios and the foreseeable storage scenarios (arrays, queue data, ....). Along with requirement like managed interrupts etc. I'm sure quite some of this information is scattered over a wide range of mail threads, but it's not my job to hunt it down. Without consistent information at least to the point which is available today this is going to end up in a major tinkering trainwreck. I have zero interest in dealing with those especially if the major pain can be avoided by doing proper analysis and design upfront. Thanks, tglx