netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 00/19] pds core and vdpa drivers
@ 2022-11-18 22:56 Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 01/19] pds_core: initial framework for pds_core driver Shannon Nelson
                   ` (18 more replies)
  0 siblings, 19 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Summary:
--------
This is a first draft patchset of a pair of new drivers for use with
the AMD/Pensando Distributed Services Card (DSC), intended to work along
side the existing ionic Ethernet driver to provide support of devices
for better virtualization support.  These drivers work together using
the auxiliary_bus for client drivers (pds_vdpa) to use the core
configuration services (pds_core).

This large patchset is both drivers combined in order to give a full
RFC view and can be split into separate pds_core and pds_vdpa patchsets
in the future.


Detail:
-------
AMD/Pensando is making available a new set of devices for supporting vDPA,
VFio, and potentially other features in the Distributed Services Card
(DSC).  These features are implemented through a PF that serves as a Core
device for controlling and configuring its VF devices.  These VF devices
have separate drivers that use the auxiliary_bus to work through the Core
device as the control path.

Currently, the DSC supports standard ethernet operations using the
ionic driver.  This is not replaced by the Core-based devices - these
new devices are in addition to the existing Ethernet device.  Typical DSC
configurations will include both PDS devices and Ionic Eth devices.

The Core device is a new PCI PF device managed by a new driver 'pds_core'.
It sets up a small representer netdev for managing the associated VFs,
and sets up auxiliary_bus devices for each VF for communicating with
the drivers for the VF devices.  The VFs may be for VFio/NVMe or vDPA,
and other services in the future; these VF types are selected as part
of the DSC internal FW configurations, which is out of the scope of
this patchset.  The Core device sets up devlink parameters for enabling
available feature sets.

Once a feature set is enabled, auxiliary_bus devices are created for each
VF that supports the feature.  These auxiliary_bus devices are named by
their feature plus VF PCI bdf so the auxiliary device driver can find
its related VF PCI driver instance.  The VF's driver then connects to
and uses this auxiliary_device to do control path configuration of the
feature through the PF device.

A cheap ASCII diagram of vDPA instance looks something like this and can
then be used with the vdpa kernel module to provide devices for virtio_vdpa
kernel module for host interfaces, vhost_vdpa kernel module for interfaces
exported into your favorite VM.


                                  ,----------.
                                  |   vdpa   |
                                  '----------'
                                       |
                                     vdpa_dev
                                    ctl   data
                                     |     ||
           pds_core.vDPA.2305 <---+  |     ||
                   |              |  |     ||
       netdev      |              |  |     ||
          |        |              |  |     ||
         .------------.         .------------.
         |  pds_core  |         |  pds_vdpa  |
         '------------'         '------------'
               ||                     ||
	     09:00.0                09:00.1
== PCI =========================================================
               ||                     ||
          .----------.           .----------.
    ,-----|    PF    |-----------|    VF    |-------,
    |     '----------'           -----------'       |
    |                     DSC                       |
    |                                               |
    -------------------------------------------------


The pds_core driver is targeted to reside in
drivers/net/ethernet/pensando/pds_core and the pds_vdpa driver lands
in drivers/vdpa/pds.  There are some shared include files placed in
include/linux/pds, which seemed reasonable at the time, but I've recently
seen suggestions of putting files like this under include/net instead,
so that may be up for some discussion.

I appreciate any and all time folks can spend reviewing and commenting.

Thanks,
sln

Shannon Nelson (19):
  pds_core: initial framework for pds_core driver
  pds_core: add devcmd device interfaces
  pds_core: health timer and workqueue
  pds_core: set up device and adminq
  pds_core: Add adminq processing and commands
  pds_core: add FW update feature to devlink
  pds_core: set up the VIF definitions and defaults
  pds_core: initial VF configuration
  pds_core: add auxiliary_bus devices
  pds_core: devlink params for enabling VIF support
  pds_core: add the aux client API
  pds_core: publish events to the clients
  pds_core: Kconfig and pds_core.rst
  pds_vdpa: Add new PCI VF device for PDS vDPA services
  pds_vdpa: virtio bar setup for vdpa
  pds_vdpa: add auxiliary driver
  pds_vdpa: add vdpa config client commands
  pds_vdpa: add support for vdpa and vdpamgmt interfaces
  pds_vdpa: add Kconfig entry and pds_vdpa.rst

 .../ethernet/pensando/pds_core.rst            | 162 ++++
 .../ethernet/pensando/pds_vdpa.rst            |  85 ++
 MAINTAINERS                                   |   4 +-
 drivers/net/ethernet/pensando/Kconfig         |  12 +
 .../net/ethernet/pensando/pds_core/Makefile   |  15 +
 .../net/ethernet/pensando/pds_core/adminq.c   | 299 +++++++
 .../net/ethernet/pensando/pds_core/auxbus.c   | 306 +++++++
 drivers/net/ethernet/pensando/pds_core/core.c | 616 ++++++++++++++
 drivers/net/ethernet/pensando/pds_core/core.h | 342 ++++++++
 .../net/ethernet/pensando/pds_core/debugfs.c  | 262 ++++++
 drivers/net/ethernet/pensando/pds_core/dev.c  | 403 +++++++++
 .../net/ethernet/pensando/pds_core/devlink.c  | 310 +++++++
 drivers/net/ethernet/pensando/pds_core/fw.c   | 192 +++++
 drivers/net/ethernet/pensando/pds_core/main.c | 440 ++++++++++
 .../net/ethernet/pensando/pds_core/netdev.c   | 504 +++++++++++
 drivers/vdpa/Kconfig                          |   7 +
 drivers/vdpa/pds/Makefile                     |  11 +
 drivers/vdpa/pds/aux_drv.c                    | 156 ++++
 drivers/vdpa/pds/aux_drv.h                    |  28 +
 drivers/vdpa/pds/cmds.c                       | 266 ++++++
 drivers/vdpa/pds/cmds.h                       |  17 +
 drivers/vdpa/pds/debugfs.c                    | 234 +++++
 drivers/vdpa/pds/debugfs.h                    |  28 +
 drivers/vdpa/pds/pci_drv.c                    | 172 ++++
 drivers/vdpa/pds/pci_drv.h                    |  49 ++
 drivers/vdpa/pds/vdpa_dev.c                   | 796 ++++++++++++++++++
 drivers/vdpa/pds/vdpa_dev.h                   |  60 ++
 drivers/vdpa/pds/virtio_pci.c                 | 283 +++++++
 include/linux/pds/pds_adminq.h                | 643 ++++++++++++++
 include/linux/pds/pds_auxbus.h                |  88 ++
 include/linux/pds/pds_common.h                |  99 +++
 include/linux/pds/pds_core_if.h               | 582 +++++++++++++
 include/linux/pds/pds_intr.h                  | 160 ++++
 include/linux/pds/pds_vdpa.h                  | 219 +++++
 34 files changed, 7849 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_core.rst
 create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
 create mode 100644 drivers/net/ethernet/pensando/pds_core/Makefile
 create mode 100644 drivers/net/ethernet/pensando/pds_core/adminq.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/auxbus.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/core.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/core.h
 create mode 100644 drivers/net/ethernet/pensando/pds_core/debugfs.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/dev.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/devlink.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/fw.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/main.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/netdev.c
 create mode 100644 drivers/vdpa/pds/Makefile
 create mode 100644 drivers/vdpa/pds/aux_drv.c
 create mode 100644 drivers/vdpa/pds/aux_drv.h
 create mode 100644 drivers/vdpa/pds/cmds.c
 create mode 100644 drivers/vdpa/pds/cmds.h
 create mode 100644 drivers/vdpa/pds/debugfs.c
 create mode 100644 drivers/vdpa/pds/debugfs.h
 create mode 100644 drivers/vdpa/pds/pci_drv.c
 create mode 100644 drivers/vdpa/pds/pci_drv.h
 create mode 100644 drivers/vdpa/pds/vdpa_dev.c
 create mode 100644 drivers/vdpa/pds/vdpa_dev.h
 create mode 100644 drivers/vdpa/pds/virtio_pci.c
 create mode 100644 include/linux/pds/pds_adminq.h
 create mode 100644 include/linux/pds/pds_auxbus.h
 create mode 100644 include/linux/pds/pds_common.h
 create mode 100644 include/linux/pds/pds_core_if.h
 create mode 100644 include/linux/pds/pds_intr.h
 create mode 100644 include/linux/pds/pds_vdpa.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 01/19] pds_core: initial framework for pds_core driver
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 02/19] pds_core: add devcmd device interfaces Shannon Nelson
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

This is the initial PCI driver framework for the new pds_core device
driver and its family of client drivers.  This does the very basics of
registering for the new PCI device 1dd8:100c, setting up debugfs entries,
and registering with devlink.

The new PCI device id has not made it to the official PCI ID Repository
yet, but will soon be registered there.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/Makefile   |   9 +
 drivers/net/ethernet/pensando/pds_core/core.h |  68 ++
 .../net/ethernet/pensando/pds_core/debugfs.c  |  47 ++
 .../net/ethernet/pensando/pds_core/devlink.c  |  55 ++
 drivers/net/ethernet/pensando/pds_core/main.c | 263 ++++++++
 include/linux/pds/pds_common.h                |  13 +
 include/linux/pds/pds_core_if.h               | 581 ++++++++++++++++++
 7 files changed, 1036 insertions(+)
 create mode 100644 drivers/net/ethernet/pensando/pds_core/Makefile
 create mode 100644 drivers/net/ethernet/pensando/pds_core/core.h
 create mode 100644 drivers/net/ethernet/pensando/pds_core/debugfs.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/devlink.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/main.c
 create mode 100644 include/linux/pds/pds_common.h
 create mode 100644 include/linux/pds/pds_core_if.h

diff --git a/drivers/net/ethernet/pensando/pds_core/Makefile b/drivers/net/ethernet/pensando/pds_core/Makefile
new file mode 100644
index 000000000000..72bbc5fa68ad
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2022 Pensando Systems, Inc
+
+obj-$(CONFIG_PDS_CORE) := pds_core.o
+
+pds_core-y := main.o \
+	      devlink.o
+
+pds_core-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
new file mode 100644
index 000000000000..022adc4aea01
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _PDSC_H_
+#define _PDSC_H_
+
+#include <linux/debugfs.h>
+#include <net/devlink.h>
+
+#include <linux/pds/pds_common.h>
+#include <linux/pds/pds_core_if.h>
+
+#define PDSC_DRV_DESCRIPTION	"Pensando Core PF Driver"
+
+struct pdsc_dev_bar {
+	void __iomem *vaddr;
+	phys_addr_t bus_addr;
+	unsigned long len;
+	int res_index;
+};
+
+/* No state flags set means we are in a steady running state */
+enum pdsc_state_flags {
+	PDSC_S_FW_DEAD,		    /* fw stopped, waiting for startup or recovery */
+	PDSC_S_INITING_DRIVER,	    /* initial startup from probe */
+	PDSC_S_STOPPING_DRIVER,	    /* driver remove */
+
+	/* leave this as last */
+	PDSC_S_STATE_SIZE
+};
+
+struct pdsc {
+	struct pci_dev *pdev;
+	struct dentry *dentry;
+	struct device *dev;
+	struct pdsc_dev_bar bars[PDS_CORE_BARS_MAX];
+	int hw_index;
+	int id;
+
+	unsigned long state;
+
+	struct pds_core_dev_info_regs __iomem *info_regs;
+	struct pds_core_dev_cmd_regs __iomem *cmd_regs;
+	struct pds_core_intr __iomem *intr_ctrl;
+	u64 __iomem *intr_status;
+	u64 __iomem *db_pages;
+	dma_addr_t phy_db_pages;
+	u64 __iomem *kern_dbpage;
+};
+
+struct pdsc *pdsc_dl_alloc(struct device *dev);
+void pdsc_dl_free(struct pdsc *pdsc);
+int pdsc_dl_register(struct pdsc *pdsc);
+void pdsc_dl_unregister(struct pdsc *pdsc);
+
+#ifdef CONFIG_DEBUG_FS
+void pdsc_debugfs_create(void);
+void pdsc_debugfs_destroy(void);
+void pdsc_debugfs_add_dev(struct pdsc *pdsc);
+void pdsc_debugfs_del_dev(struct pdsc *pdsc);
+#else
+static inline void pdsc_debugfs_create(void) { }
+static inline void pdsc_debugfs_destroy(void) { }
+static inline void pdsc_debugfs_add_dev(struct pdsc *pdsc) { }
+static inline void pdsc_debugfs_del_dev(struct pdsc *pdsc) { }
+#endif
+
+#endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/pensando/pds_core/debugfs.c b/drivers/net/ethernet/pensando/pds_core/debugfs.c
new file mode 100644
index 000000000000..3f876dcc5431
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/debugfs.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifdef CONFIG_DEBUG_FS
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+
+#include "core.h"
+
+static struct dentry *pdsc_dir;
+
+void pdsc_debugfs_create(void)
+{
+	pdsc_dir = debugfs_create_dir(PDS_CORE_DRV_NAME, NULL);
+}
+
+void pdsc_debugfs_destroy(void)
+{
+	debugfs_remove_recursive(pdsc_dir);
+}
+
+static int core_state_show(struct seq_file *seq, void *v)
+{
+	struct pdsc *pdsc = seq->private;
+
+	seq_printf(seq, "%#lx\n", pdsc->state);
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(core_state);
+
+void pdsc_debugfs_add_dev(struct pdsc *pdsc)
+{
+	pdsc->dentry = debugfs_create_dir(pci_name(pdsc->pdev), pdsc_dir);
+
+	debugfs_create_file("state", 0400, pdsc->dentry,
+			    pdsc, &core_state_fops);
+}
+
+void pdsc_debugfs_del_dev(struct pdsc *pdsc)
+{
+	debugfs_remove_recursive(pdsc->dentry);
+	pdsc->dentry = NULL;
+}
+#endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/net/ethernet/pensando/pds_core/devlink.c b/drivers/net/ethernet/pensando/pds_core/devlink.c
new file mode 100644
index 000000000000..3538aa9cf9e3
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/devlink.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+
+#include "core.h"
+
+static int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
+			    struct netlink_ext_ack *extack)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+
+	return devlink_info_driver_name_put(req, pdsc->pdev->driver->name);
+}
+
+static const struct devlink_ops pdsc_dl_ops = {
+	.info_get	= pdsc_dl_info_get,
+};
+
+struct pdsc *pdsc_dl_alloc(struct device *dev)
+{
+	struct devlink *dl;
+
+	dl = devlink_alloc(&pdsc_dl_ops, sizeof(struct pdsc), dev);
+	if (!dl)
+		return NULL;
+
+	return devlink_priv(dl);
+}
+
+void pdsc_dl_free(struct pdsc *pdsc)
+{
+	struct devlink *dl = priv_to_devlink(pdsc);
+
+	devlink_free(dl);
+}
+
+int pdsc_dl_register(struct pdsc *pdsc)
+{
+	struct devlink *dl = priv_to_devlink(pdsc);
+
+	devlink_register(dl);
+
+	return 0;
+}
+
+void pdsc_dl_unregister(struct pdsc *pdsc)
+{
+	struct devlink *dl = priv_to_devlink(pdsc);
+
+	devlink_unregister(dl);
+}
diff --git a/drivers/net/ethernet/pensando/pds_core/main.c b/drivers/net/ethernet/pensando/pds_core/main.c
new file mode 100644
index 000000000000..4bdbcc1c17a7
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/main.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+/* main PCI driver and mgmt logic */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/aer.h>
+
+#include "core.h"
+
+MODULE_DESCRIPTION(PDSC_DRV_DESCRIPTION);
+MODULE_AUTHOR("Pensando Systems, Inc");
+MODULE_LICENSE("GPL");
+
+/* Supported devices */
+static const struct pci_device_id pdsc_id_table[] = {
+	{ PCI_VDEVICE(PENSANDO, PCI_DEVICE_ID_PENSANDO_CORE_PF) },
+	{ 0, }	/* end of table */
+};
+MODULE_DEVICE_TABLE(pci, pdsc_id_table);
+
+static void pdsc_unmap_bars(struct pdsc *pdsc)
+{
+	struct pdsc_dev_bar *bars = pdsc->bars;
+	unsigned int i;
+
+	for (i = 0; i < PDS_CORE_BARS_MAX; i++) {
+		if (bars[i].vaddr) {
+			pcim_iounmap(pdsc->pdev, bars[i].vaddr);
+			bars[i].vaddr = NULL;
+		}
+
+		bars[i].len = 0;
+		bars[i].bus_addr = 0;
+		bars[i].res_index = 0;
+	}
+}
+
+static int pdsc_map_bars(struct pdsc *pdsc)
+{
+	struct pdsc_dev_bar *bar = pdsc->bars;
+	struct pci_dev *pdev = pdsc->pdev;
+	struct device *dev = pdsc->dev;
+	struct pdsc_dev_bar *bars;
+	unsigned int i, j;
+	int num_bars = 0;
+	int err;
+	u32 sig;
+
+	bars = pdsc->bars;
+	num_bars = 0;
+
+	/* Since the PCI interface in the hardware is configurable,
+	 * we need to poke into all the bars to find the set we're
+	 * expecting.  The will be in the right order.
+	 */
+	for (i = 0, j = 0; i < PDS_CORE_BARS_MAX; i++) {
+		if (!(pci_resource_flags(pdev, i) & IORESOURCE_MEM))
+			continue;
+
+		bars[j].len = pci_resource_len(pdev, i);
+		bars[j].bus_addr = pci_resource_start(pdev, i);
+		bars[j].res_index = i;
+
+		/* only map the whole bar 0 */
+		if (j > 0) {
+			bars[j].vaddr = NULL;
+		} else {
+			bars[j].vaddr = pcim_iomap(pdev, i, bars[j].len);
+			if (!bars[j].vaddr) {
+				dev_err(dev,
+					"Cannot memory-map BAR %d, aborting\n",
+					i);
+				return -ENODEV;
+			}
+		}
+
+		j++;
+	}
+	num_bars = j;
+
+	/* BAR0: dev_cmd and interrupts */
+	if (num_bars < 1) {
+		dev_err(dev, "No bars found\n");
+		err = -EFAULT;
+		goto err_out;
+	}
+
+	if (bar->len < PDS_CORE_BAR0_SIZE) {
+		dev_err(dev, "Resource bar size %lu too small\n",
+			bar->len);
+		err = -EFAULT;
+		goto err_out;
+	}
+
+	pdsc->info_regs = bar->vaddr + PDS_CORE_BAR0_DEV_INFO_REGS_OFFSET;
+	pdsc->cmd_regs = bar->vaddr + PDS_CORE_BAR0_DEV_CMD_REGS_OFFSET;
+	pdsc->intr_status = bar->vaddr + PDS_CORE_BAR0_INTR_STATUS_OFFSET;
+	pdsc->intr_ctrl = bar->vaddr + PDS_CORE_BAR0_INTR_CTRL_OFFSET;
+
+	sig = ioread32(&pdsc->info_regs->signature);
+	if (sig != PDS_CORE_DEV_INFO_SIGNATURE) {
+		dev_err(dev, "Incompatible firmware signature %x", sig);
+		err = -EFAULT;
+		goto err_out;
+	}
+
+	/* BAR1: doorbells */
+	bar++;
+	if (num_bars < 2) {
+		dev_err(dev, "Doorbell bar missing\n");
+		err = -EFAULT;
+		goto err_out;
+	}
+
+	pdsc->db_pages = bar->vaddr;
+	pdsc->phy_db_pages = bar->bus_addr;
+
+	return 0;
+
+err_out:
+	pdsc_unmap_bars(pdsc);
+	pdsc->info_regs = NULL;
+	pdsc->cmd_regs = NULL;
+	pdsc->intr_status = NULL;
+	pdsc->intr_ctrl = NULL;
+	return err;
+}
+
+static DEFINE_IDA(pdsc_pf_ida);
+
+static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct pdsc *pdsc;
+	int err;
+
+	pdsc = pdsc_dl_alloc(dev);
+	if (!pdsc)
+		return -ENOMEM;
+
+	pdsc->pdev = pdev;
+	pdsc->dev = &pdev->dev;
+	set_bit(PDSC_S_FW_DEAD, &pdsc->state);
+	set_bit(PDSC_S_INITING_DRIVER, &pdsc->state);
+	pci_set_drvdata(pdev, pdsc);
+	pdsc_debugfs_add_dev(pdsc);
+
+	err = ida_alloc(&pdsc_pf_ida, GFP_KERNEL);
+	if (err < 0) {
+		dev_err(pdsc->dev, "%s: id alloc failed: %pe\n", __func__, ERR_PTR(err));
+		goto err_out_free_devlink;
+	}
+	pdsc->id = err;
+
+	/* Query system for DMA addressing limitation for the device. */
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(PDS_CORE_ADDR_LEN));
+	if (err) {
+		dev_err(dev, "Unable to obtain 64-bit DMA for consistent allocations, aborting: %pe\n",
+			ERR_PTR(err));
+		goto err_out_free_ida;
+	}
+
+	pci_enable_pcie_error_reporting(pdev);
+
+	/* Use devres management */
+	err = pcim_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Cannot enable PCI device: %pe\n", ERR_PTR(err));
+		goto err_out_free_ida;
+	}
+
+	err = pci_request_regions(pdev, PDS_CORE_DRV_NAME);
+	if (err) {
+		dev_err(dev, "Cannot request PCI regions: %pe\n", ERR_PTR(err));
+		goto err_out_pci_disable_device;
+	}
+
+	pcie_print_link_status(pdev);
+	pci_set_master(pdev);
+
+	err = pdsc_map_bars(pdsc);
+	if (err)
+		goto err_out_pci_disable_device;
+
+	/* publish devlink device */
+	err = pdsc_dl_register(pdsc);
+	if (err) {
+		dev_err(dev, "Cannot register devlink: %pe\n", ERR_PTR(err));
+		goto err_out;
+	}
+
+	clear_bit(PDSC_S_INITING_DRIVER, &pdsc->state);
+	return 0;
+
+err_out:
+	pci_clear_master(pdev);
+	pdsc_unmap_bars(pdsc);
+	pci_release_regions(pdev);
+err_out_pci_disable_device:
+	pci_disable_pcie_error_reporting(pdev);
+	pci_disable_device(pdev);
+err_out_free_ida:
+	ida_free(&pdsc_pf_ida, pdsc->id);
+err_out_free_devlink:
+	pdsc_debugfs_del_dev(pdsc);
+	pdsc_dl_free(pdsc);
+
+	return err;
+}
+
+static void pdsc_remove(struct pci_dev *pdev)
+{
+	struct pdsc *pdsc = pci_get_drvdata(pdev);
+
+	/* Undo the devlink registration now to be sure there
+	 * are no requests while we're stopping.
+	 */
+	pdsc_dl_unregister(pdsc);
+
+	/* Device teardown */
+	ida_free(&pdsc_pf_ida, pdsc->id);
+
+	/* PCI teardown */
+	pci_clear_master(pdev);
+	pdsc_unmap_bars(pdsc);
+	pci_release_regions(pdev);
+	pci_disable_pcie_error_reporting(pdev);
+	pci_disable_device(pdev);
+
+	/* Devlink and pdsc struct teardown */
+	pdsc_dl_free(pdsc);
+}
+
+static struct pci_driver pdsc_driver = {
+	.name = PDS_CORE_DRV_NAME,
+	.id_table = pdsc_id_table,
+	.probe = pdsc_probe,
+	.remove = pdsc_remove,
+};
+
+static int __init pdsc_init_module(void)
+{
+	pdsc_debugfs_create();
+	return pci_register_driver(&pdsc_driver);
+}
+
+static void __exit pdsc_cleanup_module(void)
+{
+	pci_unregister_driver(&pdsc_driver);
+	pdsc_debugfs_destroy();
+
+	pr_info("removed\n");
+}
+
+module_init(pdsc_init_module);
+module_exit(pdsc_cleanup_module);
diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h
new file mode 100644
index 000000000000..7de3c1b8526b
--- /dev/null
+++ b/include/linux/pds/pds_common.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB) OR BSD-2-Clause */
+/* Copyright (c) 2022 Pensando Systems, Inc.  All rights reserved. */
+
+#ifndef _PDS_COMMON_H_
+#define _PDS_COMMON_H_
+
+#define PDS_CORE_DRV_NAME			"pds_core"
+
+/* the device's internal addressing uses up to 52 bits */
+#define PDS_CORE_ADDR_LEN	52
+#define PDS_CORE_ADDR_MASK	(BIT_ULL(PDS_ADDR_LEN) - 1)
+
+#endif /* _PDS_COMMON_H_ */
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
new file mode 100644
index 000000000000..6333ec351e14
--- /dev/null
+++ b/include/linux/pds/pds_core_if.h
@@ -0,0 +1,581 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB) OR BSD-2-Clause */
+/* Copyright (c) 2022 Pensando Systems, Inc.  All rights reserved. */
+
+#ifndef _PDS_CORE_IF_H_
+#define _PDS_CORE_IF_H_
+
+#include "pds_common.h"
+
+#define PCI_VENDOR_ID_PENSANDO			0x1dd8
+#define PCI_DEVICE_ID_PENSANDO_CORE_PF		0x100c
+
+#define PDS_CORE_BARS_MAX			4
+#define PDS_CORE_PCI_BAR_DBELL			1
+
+/* Bar0 */
+#define PDS_CORE_DEV_INFO_SIGNATURE		0x44455649 /* 'DEVI' */
+#define PDS_CORE_BAR0_SIZE			0x8000
+#define PDS_CORE_BAR0_DEV_INFO_REGS_OFFSET	0x0000
+#define PDS_CORE_BAR0_DEV_CMD_REGS_OFFSET	0x0800
+#define PDS_CORE_BAR0_DEV_CMD_DATA_REGS_OFFSET	0x0c00
+#define PDS_CORE_BAR0_INTR_STATUS_OFFSET	0x1000
+#define PDS_CORE_BAR0_INTR_CTRL_OFFSET		0x2000
+#define PDS_CORE_DEV_CMD_DONE			0x00000001
+
+#define PDS_CORE_DEVCMD_TIMEOUT			5
+
+#define PDS_CORE_CLIENT_ID			0
+#define PDS_CORE_ASIC_TYPE_CAPRI		0
+
+/*
+ * enum pds_core_cmd_opcode - Device commands
+ */
+enum pds_core_cmd_opcode {
+
+	/* Core init */
+	PDS_CORE_CMD_NOP		= 0,
+	PDS_CORE_CMD_IDENTIFY		= 1,
+	PDS_CORE_CMD_RESET		= 2,
+	PDS_CORE_CMD_INIT		= 3,
+
+	PDS_CORE_CMD_FW_DOWNLOAD	= 4,
+	PDS_CORE_CMD_FW_CONTROL		= 5,
+
+	/* SR/IOV commands */
+	PDS_CORE_CMD_VF_GETATTR		= 60,
+	PDS_CORE_CMD_VF_SETATTR		= 61,
+	PDS_CORE_CMD_VF_CTRL		= 62,
+
+	/* Add commands before this line */
+	PDS_CORE_CMD_MAX,
+	PDS_CORE_CMD_COUNT
+};
+
+/**
+ * struct pds_core_drv_identity - Driver identity information
+ * @drv_type:         Driver type (enum pds_core_driver_type)
+ * @os_dist:          OS distribution, numeric format
+ * @os_dist_str:      OS distribution, string format
+ * @kernel_ver:       Kernel version, numeric format
+ * @kernel_ver_str:   Kernel version, string format
+ * @driver_ver_str:   Driver version, string format
+ */
+struct pds_core_drv_identity {
+	__le32 drv_type;
+	__le32 os_dist;
+	char   os_dist_str[128];
+	__le32 kernel_ver;
+	char   kernel_ver_str[32];
+	char   driver_ver_str[32];
+};
+
+#define PDS_DEV_TYPE_MAX	16
+/**
+ * struct pds_core_dev_identity - Device identity information
+ * @version:	      Version of device identify
+ * @type:	      Identify type (0 for now)
+ * @state:	      Device state
+ * @rsvd:	      Word boundary padding
+ * @nlifs:	      Number of LIFs provisioned
+ * @nintrs:	      Number of interrupts provisioned
+ * @ndbpgs_per_lif:   Number of doorbell pages per LIF
+ * @intr_coal_mult:   Interrupt coalescing multiplication factor
+ *		      Scale user-supplied interrupt coalescing
+ *		      value in usecs to device units using:
+ *		      device units = usecs * mult / div
+ * @intr_coal_div:    Interrupt coalescing division factor
+ *		      Scale user-supplied interrupt coalescing
+ *		      value in usecs to device units using:
+ *		      device units = usecs * mult / div
+ * @vif_types:        How many of each VIF device type is supported
+ */
+struct pds_core_dev_identity {
+	u8     version;
+	u8     type;
+	u8     state;
+	u8     rsvd;
+	__le32 nlifs;
+	__le32 nintrs;
+	__le32 ndbpgs_per_lif;
+	__le32 intr_coal_mult;
+	__le32 intr_coal_div;
+	__le16 vif_types[PDS_DEV_TYPE_MAX];
+};
+
+#define PDS_CORE_IDENTITY_VERSION_1	1
+
+/**
+ * struct pds_core_dev_identify_cmd - Driver/device identify command
+ * @opcode:	Opcode PDS_CORE_CMD_IDENTIFY
+ * @ver:	Highest version of identify supported by driver
+ *
+ * Expects to find driver identification info (struct pds_core_drv_identity)
+ * in cmd_regs->data.  Driver should keep the devcmd interface locked
+ * while preparing the driver info.
+ */
+struct pds_core_dev_identify_cmd {
+	u8 opcode;
+	u8 ver;
+};
+
+/**
+ * struct pds_core_dev_identify_comp - Device identify command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @ver:	Version of identify returned by device
+ *
+ * Device identification info (struct pds_core_dev_identity) can be found
+ * in cmd_regs->data.  Driver should keep the devcmd interface locked
+ * while reading the results.
+ */
+struct pds_core_dev_identify_comp {
+	u8 status;
+	u8 ver;
+};
+
+/**
+ * struct pds_core_dev_reset_cmd - Device reset command
+ * @opcode:	Opcode PDS_CORE_CMD_RESET
+ *
+ * Resets and clears all LIFs, VDevs, and VIFs on the device.
+ */
+struct pds_core_dev_reset_cmd {
+	u8 opcode;
+};
+
+/**
+ * struct pds_core_dev_reset_comp - Reset command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ */
+struct pds_core_dev_reset_comp {
+	u8 status;
+};
+
+/*
+ * struct pds_core_dev_init_data - Pointers and info needed for the Core
+ * initialization PDS_CORE_CMD_INIT command.  The in and out structs are
+ * overlays on the pds_core_dev_cmd_regs.data space for passing data down
+ * to the firmware on init, and then returning initialization results.
+ */
+struct pds_core_dev_init_data_in {
+	__le64 adminq_q_base;
+	__le64 adminq_cq_base;
+	__le64 notifyq_cq_base;
+	__le32 flags;
+	__le16 intr_index;
+	u8     adminq_ring_size;
+	u8     notifyq_ring_size;
+};
+
+struct pds_core_dev_init_data_out {
+	__le32 core_hw_index;
+	__le32 adminq_hw_index;
+	__le32 notifyq_hw_index;
+	u8     adminq_hw_type;
+	u8     notifyq_hw_type;
+};
+
+/**
+ * struct pds_core_dev_init_cmd - Core device initialize
+ * @opcode:          opcode PDS_CORE_CMD_INIT
+ *
+ * Initializes the core device and sets up the AdminQ and NotifyQ.
+ * Expects to find initialization data (struct pds_core_dev_init_data_in)
+ * in cmd_regs->data.  Driver should keep the devcmd interface locked
+ * while preparing the driver info.
+ */
+struct pds_core_dev_init_cmd {
+	u8     opcode;
+};
+
+/**
+ * struct pds_core_dev_init_comp - Core init completion
+ * @status:     Status of the command (enum pds_core_status_code)
+ *
+ * Initialization result data (struct pds_core_dev_init_data_in)
+ * is found in cmd_regs->data.
+ */
+struct pds_core_dev_init_comp {
+	u8     status;
+};
+
+/**
+ * struct pds_core_fw_download_cmd - Firmware download command
+ * @opcode:     opcode
+ * @rsvd:	Word boundary padding
+ * @addr:       DMA address of the firmware buffer
+ * @offset:     offset of the firmware buffer within the full image
+ * @length:     number of valid bytes in the firmware buffer
+ */
+struct pds_core_fw_download_cmd {
+	u8     opcode;
+	u8     rsvd[3];
+	__le32 offset;
+	__le64 addr;
+	__le32 length;
+};
+
+/**
+ * struct pds_core_fw_download_comp - Firmware download completion
+ * @status:     Status of the command (enum pds_core_status_code)
+ */
+struct pds_core_fw_download_comp {
+	u8     status;
+};
+
+/**
+ * enum pds_core_fw_control_oper - FW control operations
+ * @PDS_CORE_FW_INSTALL_ASYNC:     Install firmware asynchronously
+ * @PDS_CORE_FW_INSTALL_STATUS:    Firmware installation status
+ * @PDS_CORE_FW_ACTIVATE_ASYNC:    Activate firmware asynchronously
+ * @PDS_CORE_FW_ACTIVATE_STATUS:   Firmware activate status
+ * @PDS_CORE_FW_UPDATE_CLEANUP:    Cleanup any firmware update leftovers
+ * @PDS_CORE_FW_GET_BOOT:          Return current active firmware slot
+ * @PDS_CORE_FW_SET_BOOT:          Set active firmware slot for next boot
+ * @PDS_CORE_FW_GET_LIST:          Return list of installed firmware images
+ */
+enum pds_core_fw_control_oper {
+	PDS_CORE_FW_INSTALL_ASYNC          = 0,
+	PDS_CORE_FW_INSTALL_STATUS         = 1,
+	PDS_CORE_FW_ACTIVATE_ASYNC         = 2,
+	PDS_CORE_FW_ACTIVATE_STATUS        = 3,
+	PDS_CORE_FW_UPDATE_CLEANUP         = 4,
+	PDS_CORE_FW_GET_BOOT               = 5,
+	PDS_CORE_FW_SET_BOOT               = 6,
+	PDS_CORE_FW_GET_LIST               = 7,
+};
+
+enum pds_core_fw_slot {
+	PDS_CORE_FW_SLOT_INVALID    = 0,
+	PDS_CORE_FW_SLOT_A	        = 1,
+	PDS_CORE_FW_SLOT_B          = 2,
+	PDS_CORE_FW_SLOT_GOLD       = 3,
+};
+
+/**
+ * struct pds_core_fw_control_cmd - Firmware control command
+ * @opcode:    opcode
+ * @rsvd:      Word boundary padding
+ * @oper:      firmware control operation (enum pds_core_fw_control_oper)
+ * @slot:      slot to operate on (enum pds_core_fw_slot)
+ */
+struct pds_core_fw_control_cmd {
+	u8  opcode;
+	u8  rsvd[3];
+	u8  oper;
+	u8  slot;
+};
+
+/**
+ * struct pds_core_fw_control_comp - Firmware control copletion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:	Word alignment space
+ * @slot:	Slot number (enum pds_core_fw_slot)
+ * @rsvd1:	Struct padding
+ * @color:	Color bit
+ */
+struct pds_core_fw_control_comp {
+	u8     status;
+	u8     rsvd[3];
+	u8     slot;
+	u8     rsvd1[10];
+	u8     color;
+};
+
+struct pds_core_fw_name_info {
+#define PDS_CORE_FWSLOT_BUFLEN		8
+#define PDS_CORE_FWVERS_BUFLEN		32
+	char   slotname[PDS_CORE_FWSLOT_BUFLEN];
+	char   fw_version[PDS_CORE_FWVERS_BUFLEN];
+};
+
+struct pds_core_fw_list_info {
+#define PDS_CORE_FWVERS_LIST_LEN	16
+	u8 num_fw_slots;
+	struct pds_core_fw_name_info fw_names[PDS_CORE_FWVERS_LIST_LEN];
+} __packed;
+
+enum pds_core_vf_attr {
+	PDS_CORE_VF_ATTR_SPOOFCHK	= 1,
+	PDS_CORE_VF_ATTR_TRUST		= 2,
+	PDS_CORE_VF_ATTR_MAC		= 3,
+	PDS_CORE_VF_ATTR_LINKSTATE	= 4,
+	PDS_CORE_VF_ATTR_VLAN		= 5,
+	PDS_CORE_VF_ATTR_RATE		= 6,
+	PDS_CORE_VF_ATTR_STATSADDR	= 7,
+};
+
+/**
+ * enum pds_core_vf_link_status - Virtual Function link status
+ * @PDS_CORE_VF_LINK_STATUS_AUTO:   Use link state of the uplink
+ * @PDS_CORE_VF_LINK_STATUS_UP:     Link always up
+ * @PDS_CORE_VF_LINK_STATUS_DOWN:   Link always down
+ */
+enum pds_core_vf_link_status {
+	PDS_CORE_VF_LINK_STATUS_AUTO = 0,
+	PDS_CORE_VF_LINK_STATUS_UP   = 1,
+	PDS_CORE_VF_LINK_STATUS_DOWN = 2,
+};
+
+/**
+ * struct pds_core_vf_setattr_cmd - Set VF attributes on the NIC
+ * @opcode:     Opcode
+ * @attr:       Attribute type (enum pds_core_vf_attr)
+ * @vf_index:   VF index
+ * @macaddr:	mac address
+ * @vlanid:	vlan ID
+ * @maxrate:	max Tx rate in Mbps
+ * @spoofchk:	enable address spoof checking
+ * @trust:	enable VF trust
+ * @linkstate:	set link up or down
+ * @stats:	stats addr struct
+ * @stats.pa:	set DMA address for VF stats
+ * @stats.len:	length of VF stats space
+ * @pad:	force union to specific size
+ */
+struct pds_core_vf_setattr_cmd {
+	u8     opcode;
+	u8     attr;
+	__le16 vf_index;
+	union {
+		u8     macaddr[6];
+		__le16 vlanid;
+		__le32 maxrate;
+		u8     spoofchk;
+		u8     trust;
+		u8     linkstate;
+		struct {
+			__le64 pa;
+			__le32 len;
+		} stats;
+		u8     pad[60];
+	} __packed;
+};
+
+struct pds_core_vf_setattr_comp {
+	u8     status;
+	u8     attr;
+	__le16 vf_index;
+	__le16 comp_index;
+	u8     rsvd[9];
+	u8     color;
+};
+
+/**
+ * struct pds_core_vf_getattr_cmd - Get VF attributes from the NIC
+ * @opcode:     Opcode
+ * @attr:       Attribute type (enum pds_core_vf_attr)
+ * @vf_index:   VF index
+ */
+struct pds_core_vf_getattr_cmd {
+	u8     opcode;
+	u8     attr;
+	__le16 vf_index;
+};
+
+struct pds_core_vf_getattr_comp {
+	u8     status;
+	u8     attr;
+	__le16 vf_index;
+	union {
+		u8     macaddr[6];
+		__le16 vlanid;
+		__le32 maxrate;
+		u8     spoofchk;
+		u8     trust;
+		u8     linkstate;
+		__le64 stats_pa;
+		u8     pad[11];
+	} __packed;
+	u8     color;
+};
+
+enum pds_core_vf_ctrl_opcode {
+	PDS_CORE_VF_CTRL_START_ALL	= 0,
+	PDS_CORE_VF_CTRL_START		= 1,
+};
+
+/**
+ * struct pds_core_vf_ctrl_cmd - VF control command
+ * @opcode:         Opcode for the command
+ * @ctrl_opcode:    VF control operation type
+ * @vf_index:       VF Index. It is unused if op START_ALL is used.
+ */
+struct pds_core_vf_ctrl_cmd {
+	u8	opcode;
+	u8	ctrl_opcode;
+	__le16	vf_index;
+};
+
+/**
+ * struct pds_core_vf_ctrl_comp - VF_CTRL command completion.
+ * @status:     Status of the command (enum pds_core_status_code)
+ */
+struct pds_core_vf_ctrl_comp {
+	u8	status;
+};
+
+/*
+ * union pds_core_dev_cmd - Overlay of core device command structures
+ */
+union pds_core_dev_cmd {
+	u8     opcode;
+	u32    words[16];
+
+	struct pds_core_dev_identify_cmd identify;
+	struct pds_core_dev_init_cmd     init;
+	struct pds_core_dev_reset_cmd    reset;
+	struct pds_core_fw_download_cmd  fw_download;
+	struct pds_core_fw_control_cmd   fw_control;
+
+	struct pds_core_vf_setattr_cmd   vf_setattr;
+	struct pds_core_vf_getattr_cmd   vf_getattr;
+	struct pds_core_vf_ctrl_cmd      vf_ctrl;
+};
+
+/*
+ * union pds_core_dev_comp - Overlay of core device completion structures
+ */
+union pds_core_dev_comp {
+	u8                                status;
+	u8                                bytes[16];
+
+	struct pds_core_dev_identify_comp identify;
+	struct pds_core_dev_reset_comp    reset;
+	struct pds_core_dev_init_comp     init;
+	struct pds_core_fw_download_comp  fw_download;
+	struct pds_core_fw_control_comp   fw_control;
+
+	struct pds_core_vf_setattr_comp   vf_setattr;
+	struct pds_core_vf_getattr_comp   vf_getattr;
+	struct pds_core_vf_ctrl_comp      vf_ctrl;
+};
+
+/**
+ * struct pds_core_dev_hwstamp_regs - Hardware current timestamp registers
+ * @tick_low:        Low 32 bits of hardware timestamp
+ * @tick_high:       High 32 bits of hardware timestamp
+ */
+struct pds_core_dev_hwstamp_regs {
+	u32    tick_low;
+	u32    tick_high;
+};
+
+/**
+ * struct pds_core_dev_info_regs - Device info register format (read-only)
+ * @signature:       Signature value of 0x44455649 ('DEVI')
+ * @version:         Current version of info
+ * @asic_type:       Asic type
+ * @asic_rev:        Asic revision
+ * @fw_status:       Firmware status
+ *			bit 0   - 1 = fw running
+ *			bit 4-7 - 4 bit generation number, changes on fw restart
+ * @fw_heartbeat:    Firmware heartbeat counter
+ * @serial_num:      Serial number
+ * @fw_version:      Firmware version
+ * @oprom_regs:      oprom_regs to store oprom debug enable/disable and bmp
+ * @rsvd_pad1024:    Struct padding
+ * @hwstamp:         Hardware current timestamp registers
+ * @rsvd_pad2048:    Struct padding
+ */
+struct pds_core_dev_info_regs {
+#define PDS_CORE_DEVINFO_FWVERS_BUFLEN 32
+#define PDS_CORE_DEVINFO_SERIAL_BUFLEN 32
+	u32    signature;
+	u8     version;
+	u8     asic_type;
+	u8     asic_rev;
+#define PDS_CORE_FW_STS_F_RUNNING	0x01
+#define PDS_CORE_FW_STS_F_GENERATION	0xF0
+	u8     fw_status;
+	__le32 fw_heartbeat;
+	char   fw_version[PDS_CORE_DEVINFO_FWVERS_BUFLEN];
+	char   serial_num[PDS_CORE_DEVINFO_SERIAL_BUFLEN];
+	u8     oprom_regs[32];     /* reserved */
+	u8     rsvd_pad1024[916];
+	struct pds_core_dev_hwstamp_regs hwstamp;   /* on 1k boundary */
+	u8     rsvd_pad2048[1016];
+} __packed;
+
+/**
+ * struct pds_core_dev_cmd_regs - Device command register format (read-write)
+ * @doorbell:	Device Cmd Doorbell, write-only
+ *              Write a 1 to signal device to process cmd
+ * @done:	Command completed indicator, poll for completion
+ *              bit 0 == 1 when command is complete
+ * @cmd:	Opcode-specific command bytes
+ * @comp:	Opcode-specific response bytes
+ * @rsvd:	Struct padding
+ * @data:	Opcode-specific side-data
+ */
+struct pds_core_dev_cmd_regs {
+	u32                     doorbell;
+	u32                     done;
+	union pds_core_dev_cmd  cmd;
+	union pds_core_dev_comp comp;
+	u8                      rsvd[48];
+	u32                     data[478];
+} __packed;
+
+/**
+ * struct pds_core_dev_regs - Device register format for bar 0 page 0
+ * @info:            Device info registers
+ * @devcmd:          Device command registers
+ */
+struct pds_core_dev_regs {
+	struct pds_core_dev_info_regs info;
+	struct pds_core_dev_cmd_regs  devcmd;
+} __packed;
+
+/*
+ * struct pds_core_vf_stats - VF statistics structure
+ */
+struct pds_core_vf_stats {
+	/* RX */
+	__le64 rx_ucast_bytes;
+	__le64 rx_ucast_packets;
+	__le64 rx_mcast_bytes;
+	__le64 rx_mcast_packets;
+	__le64 rx_bcast_bytes;
+	__le64 rx_bcast_packets;
+	__le64 rsvd0;
+	__le64 rsvd1;
+	/* RX drops */
+	__le64 rx_ucast_drop_bytes;
+	__le64 rx_ucast_drop_packets;
+	__le64 rx_mcast_drop_bytes;
+	__le64 rx_mcast_drop_packets;
+	__le64 rx_bcast_drop_bytes;
+	__le64 rx_bcast_drop_packets;
+	__le64 rx_dma_error;
+	__le64 rsvd2;
+	/* TX */
+	__le64 tx_ucast_bytes;
+	__le64 tx_ucast_packets;
+	__le64 tx_mcast_bytes;
+	__le64 tx_mcast_packets;
+	__le64 tx_bcast_bytes;
+	__le64 tx_bcast_packets;
+	__le64 rsvd3;
+	__le64 rsvd4;
+	/* TX drops */
+	__le64 tx_ucast_drop_bytes;
+	__le64 tx_ucast_drop_packets;
+	__le64 tx_mcast_drop_bytes;
+	__le64 tx_mcast_drop_packets;
+	__le64 tx_bcast_drop_bytes;
+	__le64 tx_bcast_drop_packets;
+	__le64 tx_dma_error;
+	__le64 rsvd5;
+};
+
+#ifndef __CHECKER__
+static_assert(sizeof(struct pds_core_drv_identity) <= 1912);
+static_assert(sizeof(struct pds_core_dev_identity) <= 1912);
+static_assert(sizeof(union pds_core_dev_cmd) == 64);
+static_assert(sizeof(union pds_core_dev_comp) == 16);
+static_assert(sizeof(struct pds_core_dev_info_regs) == 2048);
+static_assert(sizeof(struct pds_core_dev_cmd_regs) == 2048);
+static_assert(sizeof(struct pds_core_dev_regs) == 4096);
+#endif /* __CHECKER__ */
+
+#endif /* _PDS_CORE_REGS_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 02/19] pds_core: add devcmd device interfaces
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 01/19] pds_core: initial framework for pds_core driver Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 03/19] pds_core: health timer and workqueue Shannon Nelson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

The devcmd interface is the basic connection to the device through the
PCI BAR for low level identification and command services.  This does
the early device initialization and finds the identity data, and adds
devcmd routines to be used by later driver bits.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/Makefile   |   4 +-
 drivers/net/ethernet/pensando/pds_core/core.c |  43 ++
 drivers/net/ethernet/pensando/pds_core/core.h |  58 +++
 .../net/ethernet/pensando/pds_core/debugfs.c  |  67 +++
 drivers/net/ethernet/pensando/pds_core/dev.c  | 400 ++++++++++++++++++
 drivers/net/ethernet/pensando/pds_core/main.c |  30 ++
 include/linux/pds/pds_common.h                |  67 +++
 include/linux/pds/pds_intr.h                  | 160 +++++++
 8 files changed, 828 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/pensando/pds_core/core.c
 create mode 100644 drivers/net/ethernet/pensando/pds_core/dev.c
 create mode 100644 include/linux/pds/pds_intr.h

diff --git a/drivers/net/ethernet/pensando/pds_core/Makefile b/drivers/net/ethernet/pensando/pds_core/Makefile
index 72bbc5fa68ad..446054206b6a 100644
--- a/drivers/net/ethernet/pensando/pds_core/Makefile
+++ b/drivers/net/ethernet/pensando/pds_core/Makefile
@@ -4,6 +4,8 @@
 obj-$(CONFIG_PDS_CORE) := pds_core.o
 
 pds_core-y := main.o \
-	      devlink.o
+	      devlink.o \
+	      dev.o \
+	      core.o
 
 pds_core-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
new file mode 100644
index 000000000000..d846e8b93575
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <net/devlink.h>
+
+#include "core.h"
+
+int pdsc_setup(struct pdsc *pdsc, bool init)
+{
+	int err = 0;
+
+	if (init)
+		err = pdsc_dev_init(pdsc);
+	else
+		err = pdsc_dev_reinit(pdsc);
+	if (err)
+		return err;
+
+	clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
+	return 0;
+}
+
+void pdsc_teardown(struct pdsc *pdsc, bool removing)
+{
+	pdsc_devcmd_reset(pdsc);
+
+	if (removing && pdsc->intr_info) {
+		devm_kfree(pdsc->dev, pdsc->intr_info);
+		pdsc->intr_info = NULL;
+	}
+
+	if (pdsc->kern_dbpage) {
+		iounmap(pdsc->kern_dbpage);
+		pdsc->kern_dbpage = NULL;
+	}
+
+	set_bit(PDSC_S_FW_DEAD, &pdsc->state);
+}
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 022adc4aea01..bd86a9cd8e03 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -9,8 +9,13 @@
 
 #include <linux/pds/pds_common.h>
 #include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_intr.h>
 
 #define PDSC_DRV_DESCRIPTION	"Pensando Core PF Driver"
+#define PDSC_TEARDOWN_RECOVERY  false
+#define PDSC_TEARDOWN_REMOVING  true
+#define PDSC_SETUP_RECOVERY	false
+#define PDSC_SETUP_INIT		true
 
 struct pdsc_dev_bar {
 	void __iomem *vaddr;
@@ -19,6 +24,22 @@ struct pdsc_dev_bar {
 	int res_index;
 };
 
+struct pdsc_devinfo {
+	u8 asic_type;
+	u8 asic_rev;
+	char fw_version[PDS_CORE_DEVINFO_FWVERS_BUFLEN + 1];
+	char serial_num[PDS_CORE_DEVINFO_SERIAL_BUFLEN + 1];
+};
+
+#define PDSC_INTR_NAME_MAX_SZ		32
+
+struct pdsc_intr_info {
+	char name[PDSC_INTR_NAME_MAX_SZ];
+	unsigned int index;
+	unsigned int vector;
+	void *data;
+};
+
 /* No state flags set means we are in a steady running state */
 enum pdsc_state_flags {
 	PDSC_S_FW_DEAD,		    /* fw stopped, waiting for startup or recovery */
@@ -34,11 +55,24 @@ struct pdsc {
 	struct dentry *dentry;
 	struct device *dev;
 	struct pdsc_dev_bar bars[PDS_CORE_BARS_MAX];
+	int num_vfs;
 	int hw_index;
 	int id;
 
 	unsigned long state;
+	u8 fw_status;
+	u8 fw_generation;
+	unsigned long last_fw_time;
+	u32 last_hb;
 
+	struct pdsc_devinfo dev_info;
+	struct pds_core_dev_identity dev_ident;
+	unsigned int nintrs;
+	struct pdsc_intr_info *intr_info;	/* array of nintrs elements */
+
+	unsigned int devcmd_timeout;
+	struct mutex devcmd_lock;	/* lock for dev_cmd operations */
+	struct mutex config_lock;	/* lock for configuration operations */
 	struct pds_core_dev_info_regs __iomem *info_regs;
 	struct pds_core_dev_cmd_regs __iomem *cmd_regs;
 	struct pds_core_intr __iomem *intr_ctrl;
@@ -48,6 +82,8 @@ struct pdsc {
 	u64 __iomem *kern_dbpage;
 };
 
+void __iomem *pdsc_map_dbpage(struct pdsc *pdsc, int page_num);
+
 struct pdsc *pdsc_dl_alloc(struct device *dev);
 void pdsc_dl_free(struct pdsc *pdsc);
 int pdsc_dl_register(struct pdsc *pdsc);
@@ -58,11 +94,33 @@ void pdsc_debugfs_create(void);
 void pdsc_debugfs_destroy(void);
 void pdsc_debugfs_add_dev(struct pdsc *pdsc);
 void pdsc_debugfs_del_dev(struct pdsc *pdsc);
+void pdsc_debugfs_add_ident(struct pdsc *pdsc);
+void pdsc_debugfs_add_irqs(struct pdsc *pdsc);
 #else
 static inline void pdsc_debugfs_create(void) { }
 static inline void pdsc_debugfs_destroy(void) { }
 static inline void pdsc_debugfs_add_dev(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_del_dev(struct pdsc *pdsc) { }
+static inline void pdsc_debugfs_add_ident(struct pdsc *pdsc) { }
+static inline void pdsc_debugfs_add_irqs(struct pdsc *pdsc) { }
 #endif
 
+int pdsc_err_to_errno(enum pds_core_status_code code);
+bool pdsc_is_fw_running(struct pdsc *pdsc);
+bool pdsc_is_fw_good(struct pdsc *pdsc);
+int pdsc_devcmd(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+		union pds_core_dev_comp *comp, int max_seconds);
+int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+		       union pds_core_dev_comp *comp, int max_seconds);
+int pdsc_dev_cmd_vf_getattr(struct pdsc *pdsc, int vf, u8 attr,
+			    struct pds_core_vf_getattr_comp *comp);
+int pdsc_devcmd_init(struct pdsc *pdsc);
+int pdsc_devcmd_reset(struct pdsc *pdsc);
+int pds_devcmd_vf_start(struct pdsc *pdsc);
+int pdsc_dev_reinit(struct pdsc *pdsc);
+int pdsc_dev_init(struct pdsc *pdsc);
+
+int pdsc_setup(struct pdsc *pdsc, bool init);
+void pdsc_teardown(struct pdsc *pdsc, bool removing);
+
 #endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/pensando/pds_core/debugfs.c b/drivers/net/ethernet/pensando/pds_core/debugfs.c
index 3f876dcc5431..698fd6d09387 100644
--- a/drivers/net/ethernet/pensando/pds_core/debugfs.c
+++ b/drivers/net/ethernet/pensando/pds_core/debugfs.c
@@ -44,4 +44,71 @@ void pdsc_debugfs_del_dev(struct pdsc *pdsc)
 	debugfs_remove_recursive(pdsc->dentry);
 	pdsc->dentry = NULL;
 }
+
+static int identity_show(struct seq_file *seq, void *v)
+{
+	struct pdsc *pdsc = seq->private;
+	struct pds_core_dev_identity *ident;
+	int vt;
+
+	ident = &pdsc->dev_ident;
+
+	seq_printf(seq, "asic_type:        0x%x\n", pdsc->dev_info.asic_type);
+	seq_printf(seq, "asic_rev:         0x%x\n", pdsc->dev_info.asic_rev);
+	seq_printf(seq, "serial_num:       %s\n", pdsc->dev_info.serial_num);
+	seq_printf(seq, "fw_version:       %s\n", pdsc->dev_info.fw_version);
+	seq_printf(seq, "fw_status:        0x%x\n",
+		   ioread8(&pdsc->info_regs->fw_status));
+	seq_printf(seq, "fw_heartbeat:     0x%x\n",
+		   ioread32(&pdsc->info_regs->fw_heartbeat));
+
+	seq_printf(seq, "nlifs:            %d\n", le32_to_cpu(ident->nlifs));
+	seq_printf(seq, "nintrs:           %d\n", le32_to_cpu(ident->nintrs));
+	seq_printf(seq, "ndbpgs_per_lif:   %d\n", le32_to_cpu(ident->ndbpgs_per_lif));
+	seq_printf(seq, "intr_coal_mult:   %d\n", le32_to_cpu(ident->intr_coal_mult));
+	seq_printf(seq, "intr_coal_div:    %d\n", le32_to_cpu(ident->intr_coal_div));
+
+	seq_puts(seq, "vif_types:        ");
+	for (vt = 0; vt < PDS_DEV_TYPE_MAX; vt++)
+		seq_printf(seq, "%d ", le16_to_cpu(pdsc->dev_ident.vif_types[vt]));
+	seq_puts(seq, "\n");
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(identity);
+
+void pdsc_debugfs_add_ident(struct pdsc *pdsc)
+{
+	debugfs_create_file("identity", 0400, pdsc->dentry, pdsc, &identity_fops);
+}
+
+static int irqs_show(struct seq_file *seq, void *v)
+{
+	struct pdsc *pdsc = seq->private;
+	struct pdsc_intr_info *intr_info;
+	int i;
+
+	seq_printf(seq, "index  vector  name (nintrs %d)\n", pdsc->nintrs);
+
+	if (!pdsc->intr_info)
+		return 0;
+
+	for (i = 0; i < pdsc->nintrs; i++) {
+		intr_info = &pdsc->intr_info[i];
+		if (!intr_info->vector)
+			continue;
+
+		seq_printf(seq, "% 3d    % 3d     %s\n",
+			   i, intr_info->vector, intr_info->name);
+	}
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(irqs);
+
+void pdsc_debugfs_add_irqs(struct pdsc *pdsc)
+{
+	debugfs_create_file("irqs", 0400, pdsc->dentry, pdsc, &irqs_fops);
+}
+
 #endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/net/ethernet/pensando/pds_core/dev.c b/drivers/net/ethernet/pensando/pds_core/dev.c
new file mode 100644
index 000000000000..addbd300e5c3
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/dev.c
@@ -0,0 +1,400 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/version.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/utsname.h>
+#include <linux/ctype.h>
+
+#include "core.h"
+
+#define PDS_CASE_STRINGIFY(opcode) case (opcode): return #opcode
+
+int pdsc_err_to_errno(enum pds_core_status_code code)
+{
+	switch (code) {
+	case PDS_RC_SUCCESS:
+		return 0;
+	case PDS_RC_EVERSION:
+	case PDS_RC_EQTYPE:
+	case PDS_RC_EQID:
+	case PDS_RC_EINVAL:
+	case PDS_RC_ENOSUPP:
+		return -EINVAL;
+	case PDS_RC_EPERM:
+		return -EPERM;
+	case PDS_RC_ENOENT:
+		return -ENOENT;
+	case PDS_RC_EAGAIN:
+		return -EAGAIN;
+	case PDS_RC_ENOMEM:
+		return -ENOMEM;
+	case PDS_RC_EFAULT:
+		return -EFAULT;
+	case PDS_RC_EBUSY:
+		return -EBUSY;
+	case PDS_RC_EEXIST:
+		return -EEXIST;
+	case PDS_RC_EVFID:
+		return -ENODEV;
+	case PDS_RC_ECLIENT:
+		return -ECHILD;
+	case PDS_RC_ENOSPC:
+		return -ENOSPC;
+	case PDS_RC_ERANGE:
+		return -ERANGE;
+	case PDS_RC_BAD_ADDR:
+		return -EFAULT;
+	case PDS_RC_EOPCODE:
+	case PDS_RC_EINTR:
+	case PDS_RC_DEV_CMD:
+	case PDS_RC_ERROR:
+	case PDS_RC_ERDMA:
+	case PDS_RC_EIO:
+	default:
+		return -EIO;
+	}
+}
+
+bool pdsc_is_fw_running(struct pdsc *pdsc)
+{
+	pdsc->fw_status = ioread8(&pdsc->info_regs->fw_status);
+	pdsc->last_fw_time = jiffies;
+	pdsc->last_hb = ioread32(&pdsc->info_regs->fw_heartbeat);
+
+	/* Firmware is useful only if the running bit is set and
+	 * fw_status != 0xff (bad PCI read)
+	 */
+	return (pdsc->fw_status != 0xff) &&
+		(pdsc->fw_status & PDS_CORE_FW_STS_F_RUNNING);
+}
+
+bool pdsc_is_fw_good(struct pdsc *pdsc)
+{
+	return pdsc_is_fw_running(pdsc) &&
+		(pdsc->fw_status & PDS_CORE_FW_STS_F_GENERATION) == pdsc->fw_generation;
+}
+
+static u8 pdsc_devcmd_status(struct pdsc *pdsc)
+{
+	return ioread8(&pdsc->cmd_regs->comp.status);
+}
+
+static bool pdsc_devcmd_done(struct pdsc *pdsc)
+{
+	return ioread32(&pdsc->cmd_regs->done) & PDS_CORE_DEV_CMD_DONE;
+}
+
+static void pdsc_devcmd_dbell(struct pdsc *pdsc)
+{
+	iowrite32(0, &pdsc->cmd_regs->done);
+	iowrite32(1, &pdsc->cmd_regs->doorbell);
+}
+
+static void pdsc_devcmd_clean(struct pdsc *pdsc)
+{
+	iowrite32(0, &pdsc->cmd_regs->doorbell);
+	memset_io(&pdsc->cmd_regs->cmd, 0, sizeof(pdsc->cmd_regs->cmd));
+}
+
+static const char *pdsc_devcmd_str(int opcode)
+{
+	switch (opcode) {
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_NOP);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_IDENTIFY);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_RESET);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_INIT);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_FW_DOWNLOAD);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_FW_CONTROL);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_VF_GETATTR);
+	PDS_CASE_STRINGIFY(PDS_CORE_CMD_VF_SETATTR);
+	default:
+		return "PDS_CORE_CMD_UNKNOWN";
+	}
+}
+
+static int pdsc_devcmd_wait(struct pdsc *pdsc, int max_seconds)
+{
+	struct device *dev = pdsc->dev;
+	unsigned long start_time;
+	unsigned long max_wait;
+	unsigned long duration;
+	int timeout = 0;
+	int status = 0;
+	int done = 0;
+	int err = 0;
+	int opcode;
+
+	opcode = ioread8(&pdsc->cmd_regs->cmd.opcode);
+
+	start_time = jiffies;
+	max_wait = start_time + (max_seconds * HZ);
+
+	while (!done && !timeout) {
+		done = pdsc_devcmd_done(pdsc);
+		if (done)
+			break;
+
+		timeout = time_after(jiffies, max_wait);
+		if (timeout)
+			break;
+
+		usleep_range(100, 200);
+	}
+	duration = jiffies - start_time;
+
+	if (done && duration > HZ)
+		dev_dbg(dev, "DEVCMD %d %s after %ld secs\n",
+			opcode, pdsc_devcmd_str(opcode), duration / HZ);
+
+	if (!done || timeout) {
+		dev_err(dev, "DEVCMD %d %s timeout, done %d timeout %d max_seconds=%d\n",
+			opcode, pdsc_devcmd_str(opcode), done, timeout,
+			max_seconds);
+		err = -ETIMEDOUT;
+		pdsc_devcmd_clean(pdsc);
+	}
+
+	status = pdsc_devcmd_status(pdsc);
+	err = pdsc_err_to_errno(status);
+	if (status != PDS_RC_SUCCESS && status != PDS_RC_EAGAIN)
+		dev_err(dev, "DEVCMD %d %s failed, status=%d err %d %pe\n",
+			opcode, pdsc_devcmd_str(opcode), status, err,
+			ERR_PTR(err));
+
+	return err;
+}
+
+int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+		       union pds_core_dev_comp *comp, int max_seconds)
+{
+	int err;
+
+	memcpy_toio(&pdsc->cmd_regs->cmd, cmd, sizeof(*cmd));
+	pdsc_devcmd_dbell(pdsc);
+	err = pdsc_devcmd_wait(pdsc, max_seconds);
+	memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));
+
+	return err;
+}
+
+int pdsc_devcmd(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+		union pds_core_dev_comp *comp, int max_seconds)
+{
+	int err;
+
+	mutex_lock(&pdsc->devcmd_lock);
+	err = pdsc_devcmd_locked(pdsc, cmd, comp, max_seconds);
+	mutex_unlock(&pdsc->devcmd_lock);
+
+	return err;
+}
+
+int pdsc_devcmd_init(struct pdsc *pdsc)
+{
+	union pds_core_dev_comp comp = { 0 };
+	union pds_core_dev_cmd cmd = {
+		.opcode = PDS_CORE_CMD_INIT,
+	};
+
+	return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+int pdsc_devcmd_reset(struct pdsc *pdsc)
+{
+	union pds_core_dev_comp comp = { 0 };
+	union pds_core_dev_cmd cmd = {
+		.reset.opcode = PDS_CORE_CMD_RESET,
+	};
+
+	return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+static int pdsc_devcmd_identify_locked(struct pdsc *pdsc)
+{
+	union pds_core_dev_comp comp = { 0 };
+	union pds_core_dev_cmd cmd = {
+		.identify.opcode = PDS_CORE_CMD_IDENTIFY,
+		.identify.ver = PDS_CORE_IDENTITY_VERSION_1,
+	};
+
+	return pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+int pdsc_dev_cmd_vf_getattr(struct pdsc *pdsc, int vf, u8 attr,
+			    struct pds_core_vf_getattr_comp *comp)
+{
+	union pds_core_dev_cmd cmd = {
+		.vf_getattr.opcode = PDS_CORE_CMD_VF_GETATTR,
+		.vf_getattr.attr = attr,
+		.vf_getattr.vf_index = cpu_to_le16(vf),
+	};
+	int err;
+
+	if (vf >= pdsc->num_vfs)
+		return -ENODEV;
+
+	switch (attr) {
+	case PDS_CORE_VF_ATTR_SPOOFCHK:
+	case PDS_CORE_VF_ATTR_TRUST:
+	case PDS_CORE_VF_ATTR_LINKSTATE:
+	case PDS_CORE_VF_ATTR_MAC:
+	case PDS_CORE_VF_ATTR_VLAN:
+	case PDS_CORE_VF_ATTR_RATE:
+		break;
+	case PDS_CORE_VF_ATTR_STATSADDR:
+	default:
+		return -EINVAL;
+	}
+
+	err = pdsc_devcmd(pdsc, &cmd,
+			  (union pds_core_dev_comp *)comp,
+			  pdsc->devcmd_timeout);
+
+	return err;
+}
+
+int pds_devcmd_vf_start(struct pdsc *pdsc)
+{
+	union pds_core_dev_comp comp = { 0 };
+	union pds_core_dev_cmd cmd = {
+		.vf_ctrl.opcode = PDS_CORE_CMD_VF_CTRL,
+		.vf_ctrl.ctrl_opcode = PDS_CORE_VF_CTRL_START_ALL,
+	};
+	int err;
+
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+
+	return err;
+}
+
+static void pdsc_init_devinfo(struct pdsc *pdsc)
+{
+	pdsc->dev_info.asic_type = ioread8(&pdsc->info_regs->asic_type);
+	pdsc->dev_info.asic_rev = ioread8(&pdsc->info_regs->asic_rev);
+
+	memcpy_fromio(pdsc->dev_info.fw_version,
+		      pdsc->info_regs->fw_version,
+		      PDS_CORE_DEVINFO_FWVERS_BUFLEN);
+
+	memcpy_fromio(pdsc->dev_info.serial_num,
+		      pdsc->info_regs->serial_num,
+		      PDS_CORE_DEVINFO_SERIAL_BUFLEN);
+
+	pdsc->dev_info.fw_version[PDS_CORE_DEVINFO_FWVERS_BUFLEN] = 0;
+	pdsc->dev_info.serial_num[PDS_CORE_DEVINFO_SERIAL_BUFLEN] = 0;
+
+	dev_dbg(pdsc->dev, "fw_version %s\n", pdsc->dev_info.fw_version);
+}
+
+static int pdsc_identify(struct pdsc *pdsc)
+{
+	struct pds_core_drv_identity drv = { 0 };
+	size_t sz;
+	int err;
+
+	drv.drv_type = cpu_to_le32(PDS_DRIVER_LINUX);
+	drv.kernel_ver = cpu_to_le32(LINUX_VERSION_CODE);
+	snprintf(drv.kernel_ver_str, sizeof(drv.kernel_ver_str),
+		 "%s %s", utsname()->release, utsname()->version);
+	snprintf(drv.driver_ver_str, sizeof(drv.driver_ver_str),
+		 "%s %s", PDS_CORE_DRV_NAME, utsname()->release);
+
+	/* Next let's get some info about the device
+	 * We use the devcmd_lock at this level in order to
+	 * get safe access to the cmd_regs->data before anyone
+	 * else can mess it up
+	 */
+	mutex_lock(&pdsc->devcmd_lock);
+
+	sz = min_t(size_t, sizeof(drv), sizeof(pdsc->cmd_regs->data));
+	memcpy_toio(&pdsc->cmd_regs->data, &drv, sz);
+
+	err = pdsc_devcmd_identify_locked(pdsc);
+	if (!err) {
+		sz = min_t(size_t, sizeof(pdsc->dev_ident), sizeof(pdsc->cmd_regs->data));
+		memcpy_fromio(&pdsc->dev_ident, &pdsc->cmd_regs->data, sz);
+	}
+	mutex_unlock(&pdsc->devcmd_lock);
+
+	if (err) {
+		dev_err(pdsc->dev, "Cannot identify device: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	if (isprint(pdsc->dev_info.fw_version[0]) &&
+	    isascii(pdsc->dev_info.fw_version[0]))
+		dev_info(pdsc->dev, "FW: %.*s\n",
+			 (int)(sizeof(pdsc->dev_info.fw_version) - 1),
+			 pdsc->dev_info.fw_version);
+	else
+		dev_info(pdsc->dev, "FW: (invalid string) 0x%02x 0x%02x 0x%02x 0x%02x ...\n",
+			 (u8)pdsc->dev_info.fw_version[0],
+			 (u8)pdsc->dev_info.fw_version[1],
+			 (u8)pdsc->dev_info.fw_version[2],
+			 (u8)pdsc->dev_info.fw_version[3]);
+
+	return 0;
+}
+
+int pdsc_dev_reinit(struct pdsc *pdsc)
+{
+	pdsc_init_devinfo(pdsc);
+
+	return pdsc_identify(pdsc);
+}
+
+int pdsc_dev_init(struct pdsc *pdsc)
+{
+	unsigned int nintrs;
+	int err;
+
+	/* Initial init and reset of device */
+	pdsc_init_devinfo(pdsc);
+	pdsc->devcmd_timeout = PDS_CORE_DEVCMD_TIMEOUT;
+
+	err = pdsc_devcmd_reset(pdsc);
+	if (err)
+		return err;
+
+	err = pdsc_identify(pdsc);
+	if (err)
+		return err;
+
+	pdsc_debugfs_add_ident(pdsc);
+
+	/* Now we can reserve interrupts */
+	nintrs = le32_to_cpu(pdsc->dev_ident.nintrs);
+	nintrs = min_t(unsigned int, num_online_cpus(), nintrs);
+
+	/* Get intr_info struct array for tracking */
+	pdsc->intr_info = devm_kcalloc(pdsc->dev, nintrs,
+				       sizeof(*pdsc->intr_info), GFP_KERNEL);
+	if (!pdsc->intr_info) {
+		err = -ENOSPC;
+		goto err_out;
+	}
+
+	err = pci_alloc_irq_vectors(pdsc->pdev, nintrs, nintrs, PCI_IRQ_MSIX);
+	if (err != nintrs) {
+		dev_err(pdsc->dev, "Can't get %d intrs from OS: %pe\n",
+			nintrs, ERR_PTR(err));
+		err = -ENOSPC;
+		goto err_out;
+	}
+	pdsc->nintrs = nintrs;
+	pdsc_debugfs_add_irqs(pdsc);
+
+	return 0;
+
+err_out:
+	if (pdsc->intr_info) {
+		devm_kfree(pdsc->dev, pdsc->intr_info);
+		pdsc->intr_info = NULL;
+	}
+	return err;
+}
diff --git a/drivers/net/ethernet/pensando/pds_core/main.c b/drivers/net/ethernet/pensando/pds_core/main.c
index 4bdbcc1c17a7..770b3f895bbb 100644
--- a/drivers/net/ethernet/pensando/pds_core/main.c
+++ b/drivers/net/ethernet/pensando/pds_core/main.c
@@ -189,6 +189,15 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto err_out_pci_disable_device;
 
+	/* PDS device setup */
+	mutex_init(&pdsc->devcmd_lock);
+	mutex_init(&pdsc->config_lock);
+
+	mutex_lock(&pdsc->config_lock);
+	err = pdsc_setup(pdsc, PDSC_SETUP_INIT);
+	if (err)
+		goto err_out_unmap_bars;
+
 	/* publish devlink device */
 	err = pdsc_dl_register(pdsc);
 	if (err) {
@@ -196,10 +205,21 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out;
 	}
 
+	mutex_unlock(&pdsc->config_lock);
+
+	pdsc->fw_generation = PDS_CORE_FW_STS_F_GENERATION &
+			      ioread8(&pdsc->info_regs->fw_status);
+
 	clear_bit(PDSC_S_INITING_DRIVER, &pdsc->state);
 	return 0;
 
 err_out:
+	pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
+err_out_unmap_bars:
+	mutex_unlock(&pdsc->config_lock);
+	mutex_destroy(&pdsc->config_lock);
+	mutex_destroy(&pdsc->devcmd_lock);
+	pci_free_irq_vectors(pdev);
 	pci_clear_master(pdev);
 	pdsc_unmap_bars(pdsc);
 	pci_release_regions(pdev);
@@ -224,10 +244,20 @@ static void pdsc_remove(struct pci_dev *pdev)
 	 */
 	pdsc_dl_unregister(pdsc);
 
+	/* Now we can lock it up and tear it down */
+	mutex_lock(&pdsc->config_lock);
+	set_bit(PDSC_S_STOPPING_DRIVER, &pdsc->state);
+
 	/* Device teardown */
+	pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
+	pdsc_debugfs_del_dev(pdsc);
+	mutex_unlock(&pdsc->config_lock);
+	mutex_destroy(&pdsc->config_lock);
+	mutex_destroy(&pdsc->devcmd_lock);
 	ida_free(&pdsc_pf_ida, pdsc->id);
 
 	/* PCI teardown */
+	pci_free_irq_vectors(pdev);
 	pci_clear_master(pdev);
 	pdsc_unmap_bars(pdsc);
 	pci_release_regions(pdev);
diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h
index 7de3c1b8526b..e7fe84379a2f 100644
--- a/include/linux/pds/pds_common.h
+++ b/include/linux/pds/pds_common.h
@@ -10,4 +10,71 @@
 #define PDS_CORE_ADDR_LEN	52
 #define PDS_CORE_ADDR_MASK	(BIT_ULL(PDS_ADDR_LEN) - 1)
 
+/*
+ * enum pds_core_status_code - Device command return codes
+ */
+enum pds_core_status_code {
+	PDS_RC_SUCCESS	= 0,	/* Success */
+	PDS_RC_EVERSION	= 1,	/* Incorrect version for request */
+	PDS_RC_EOPCODE	= 2,	/* Invalid cmd opcode */
+	PDS_RC_EIO	= 3,	/* I/O error */
+	PDS_RC_EPERM	= 4,	/* Permission denied */
+	PDS_RC_EQID	= 5,	/* Bad qid */
+	PDS_RC_EQTYPE	= 6,	/* Bad qtype */
+	PDS_RC_ENOENT	= 7,	/* No such element */
+	PDS_RC_EINTR	= 8,	/* operation interrupted */
+	PDS_RC_EAGAIN	= 9,	/* Try again */
+	PDS_RC_ENOMEM	= 10,	/* Out of memory */
+	PDS_RC_EFAULT	= 11,	/* Bad address */
+	PDS_RC_EBUSY	= 12,	/* Device or resource busy */
+	PDS_RC_EEXIST	= 13,	/* object already exists */
+	PDS_RC_EINVAL	= 14,	/* Invalid argument */
+	PDS_RC_ENOSPC	= 15,	/* No space left or alloc failure */
+	PDS_RC_ERANGE	= 16,	/* Parameter out of range */
+	PDS_RC_BAD_ADDR	= 17,	/* Descriptor contains a bad ptr */
+	PDS_RC_DEV_CMD	= 18,	/* Device cmd attempted on AdminQ */
+	PDS_RC_ENOSUPP	= 19,	/* Operation not supported */
+	PDS_RC_ERROR	= 29,	/* Generic error */
+	PDS_RC_ERDMA	= 30,	/* Generic RDMA error */
+	PDS_RC_EVFID	= 31,	/* VF ID does not exist */
+	PDS_RC_BAD_FW	= 32,	/* FW file is invalid or corrupted */
+	PDS_RC_ECLIENT	= 33,   /* No such client id */
+};
+
+enum pds_core_driver_type {
+	PDS_DRIVER_LINUX   = 1,
+	PDS_DRIVER_WIN     = 2,
+	PDS_DRIVER_DPDK    = 3,
+	PDS_DRIVER_FREEBSD = 4,
+	PDS_DRIVER_IPXE    = 5,
+	PDS_DRIVER_ESXI    = 6,
+};
+
+/* PDSC interface uses identity version 1 and PDSC uses 2 */
+#define PDSC_IDENTITY_VERSION_1		1
+#define PDSC_IDENTITY_VERSION_2		2
+
+#define PDS_CORE_IFNAMSIZ		16
+
+/**
+ * enum pds_core_logical_qtype - Logical Queue Types
+ * @PDS_CORE_QTYPE_ADMINQ:    Administrative Queue
+ * @PDS_CORE_QTYPE_NOTIFYQ:   Notify Queue
+ * @PDS_CORE_QTYPE_RXQ:       Receive Queue
+ * @PDS_CORE_QTYPE_TXQ:       Transmit Queue
+ * @PDS_CORE_QTYPE_EQ:        Event Queue
+ * @PDS_CORE_QTYPE_MAX:       Max queue type supported
+ */
+enum pds_core_logical_qtype {
+	PDS_CORE_QTYPE_ADMINQ  = 0,
+	PDS_CORE_QTYPE_NOTIFYQ = 1,
+	PDS_CORE_QTYPE_RXQ     = 2,
+	PDS_CORE_QTYPE_TXQ     = 3,
+	PDS_CORE_QTYPE_EQ      = 4,
+
+	PDS_CORE_QTYPE_MAX     = 16   /* don't change - used in struct size */
+};
+
+typedef void (*pds_core_cb)(void *cb_arg);
+
 #endif /* _PDS_COMMON_H_ */
diff --git a/include/linux/pds/pds_intr.h b/include/linux/pds/pds_intr.h
new file mode 100644
index 000000000000..bcdafd492e65
--- /dev/null
+++ b/include/linux/pds/pds_intr.h
@@ -0,0 +1,160 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB) OR BSD-2-Clause */
+/* Copyright (c) 2022 Pensando Systems, Inc.  All rights reserved. */
+
+#ifndef _PDS_INTR_H_
+#define _PDS_INTR_H_
+
+/*
+ * Interrupt control register
+ * @coal_init:        Coalescing timer initial value, in
+ *                    device units.  Use @identity->intr_coal_mult
+ *                    and @identity->intr_coal_div to convert from
+ *                    usecs to device units:
+ *
+ *                      coal_init = coal_usecs * coal_mutl / coal_div
+ *
+ *                    When an interrupt is sent the interrupt
+ *                    coalescing timer current value
+ *                    (@coalescing_curr) is initialized with this
+ *                    value and begins counting down.  No more
+ *                    interrupts are sent until the coalescing
+ *                    timer reaches 0.  When @coalescing_init=0
+ *                    interrupt coalescing is effectively disabled
+ *                    and every interrupt assert results in an
+ *                    interrupt.  Reset value: 0
+ * @mask:             Interrupt mask.  When @mask=1 the interrupt
+ *                    resource will not send an interrupt.  When
+ *                    @mask=0 the interrupt resource will send an
+ *                    interrupt if an interrupt event is pending
+ *                    or on the next interrupt assertion event.
+ *                    Reset value: 1
+ * @credits:          Interrupt credits.  This register indicates
+ *                    how many interrupt events the hardware has
+ *                    sent.  When written by software this
+ *                    register atomically decrements @int_credits
+ *                    by the value written.  When @int_credits
+ *                    becomes 0 then the "pending interrupt" bit
+ *                    in the Interrupt Status register is cleared
+ *                    by the hardware and any pending but unsent
+ *                    interrupts are cleared.
+ *                    !!!IMPORTANT!!! This is a signed register.
+ * @flags:            Interrupt control flags
+ *                       @unmask -- When this bit is written with a 1
+ *                       the interrupt resource will set mask=0.
+ *                       @coal_timer_reset -- When this
+ *                       bit is written with a 1 the
+ *                       @coalescing_curr will be reloaded with
+ *                       @coalescing_init to reset the coalescing
+ *                       timer.
+ * @mask_on_assert:   Automatically mask on assertion.  When
+ *                    @mask_on_assert=1 the interrupt resource
+ *                    will set @mask=1 whenever an interrupt is
+ *                    sent.  When using interrupts in Legacy
+ *                    Interrupt mode the driver must select
+ *                    @mask_on_assert=0 for proper interrupt
+ *                    operation.
+ * @coalescing_curr:  Coalescing timer current value, in
+ *                    microseconds.  When this value reaches 0
+ *                    the interrupt resource is again eligible to
+ *                    send an interrupt.  If an interrupt event
+ *                    is already pending when @coalescing_curr
+ *                    reaches 0 the pending interrupt will be
+ *                    sent, otherwise an interrupt will be sent
+ *                    on the next interrupt assertion event.
+ */
+struct pds_core_intr {
+	u32 coal_init;
+	u32 mask;
+	u16 credits;
+	u16 flags;
+#define PDS_CORE_INTR_F_UNMASK		0x0001
+#define PDS_CORE_INTR_F_TIMER_RESET	0x0002
+	u32 mask_on_assert;
+	u32 coalescing_curr;
+	u32 rsvd6[3];
+};
+#ifndef __CHECKER__
+static_assert(sizeof(struct pds_core_intr) == 32);
+#endif /* __CHECKER__ */
+
+#define PDS_CORE_INTR_CTRL_REGS_MAX		2048
+#define PDS_CORE_INTR_CTRL_COAL_MAX		0x3F
+#define PDS_CORE_INTR_INDEX_NOT_ASSIGNED	-1
+
+struct pds_core_intr_status {
+	u32 status[2];
+};
+
+/**
+ * enum pds_core_intr_mask_vals - valid values for mask and mask_assert.
+ * @PDS_CORE_INTR_MASK_CLEAR:	unmask interrupt.
+ * @PDS_CORE_INTR_MASK_SET:	mask interrupt.
+ */
+enum pds_core_intr_mask_vals {
+	PDS_CORE_INTR_MASK_CLEAR	= 0,
+	PDS_CORE_INTR_MASK_SET		= 1,
+};
+
+/**
+ * enum pds_core_intr_credits_bits - Bitwise composition of credits values.
+ * @PDS_CORE_INTR_CRED_COUNT:	bit mask of credit count, no shift needed.
+ * @PDS_CORE_INTR_CRED_COUNT_SIGNED: bit mask of credit count, including sign bit.
+ * @PDS_CORE_INTR_CRED_UNMASK:	unmask the interrupt.
+ * @PDS_CORE_INTR_CRED_RESET_COALESCE: reset the coalesce timer.
+ * @PDS_CORE_INTR_CRED_REARM:	unmask the and reset the timer.
+ */
+enum pds_core_intr_credits_bits {
+	PDS_CORE_INTR_CRED_COUNT		= 0x7fffu,
+	PDS_CORE_INTR_CRED_COUNT_SIGNED		= 0xffffu,
+	PDS_CORE_INTR_CRED_UNMASK		= 0x10000u,
+	PDS_CORE_INTR_CRED_RESET_COALESCE	= 0x20000u,
+	PDS_CORE_INTR_CRED_REARM		= (PDS_CORE_INTR_CRED_UNMASK |
+					   PDS_CORE_INTR_CRED_RESET_COALESCE),
+};
+
+static inline void pds_core_intr_coal_init(struct pds_core_intr __iomem *intr_ctrl,
+					   u32 coal)
+{
+	iowrite32(coal, &intr_ctrl->coal_init);
+}
+
+static inline void pds_core_intr_mask(struct pds_core_intr __iomem *intr_ctrl,
+				      u32 mask)
+{
+	iowrite32(mask, &intr_ctrl->mask);
+}
+
+static inline void pds_core_intr_credits(struct pds_core_intr __iomem *intr_ctrl,
+					 u32 cred, u32 flags)
+{
+	if (WARN_ON_ONCE(cred > PDS_CORE_INTR_CRED_COUNT)) {
+		cred = ioread32(&intr_ctrl->credits);
+		cred &= PDS_CORE_INTR_CRED_COUNT_SIGNED;
+	}
+
+	iowrite32(cred | flags, &intr_ctrl->credits);
+}
+
+static inline void pds_core_intr_clean_flags(struct pds_core_intr __iomem *intr_ctrl,
+					     u32 flags)
+{
+	u32 cred;
+
+	cred = ioread32(&intr_ctrl->credits);
+	cred &= PDS_CORE_INTR_CRED_COUNT_SIGNED;
+	cred |= flags;
+	iowrite32(cred, &intr_ctrl->credits);
+}
+
+static inline void pds_core_intr_clean(struct pds_core_intr __iomem *intr_ctrl)
+{
+	pds_core_intr_clean_flags(intr_ctrl, PDS_CORE_INTR_CRED_RESET_COALESCE);
+}
+
+static inline void pds_core_intr_mask_assert(struct pds_core_intr __iomem *intr_ctrl,
+					     u32 mask)
+{
+	iowrite32(mask, &intr_ctrl->mask_on_assert);
+}
+
+#endif /* _PDS_INTR_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 03/19] pds_core: health timer and workqueue
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 01/19] pds_core: initial framework for pds_core driver Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 02/19] pds_core: add devcmd device interfaces Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 04/19] pds_core: set up device and adminq Shannon Nelson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Add in the periodic health check and the related workqueue,
as well as the handlers for when a FW reset is seen.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/net/ethernet/pensando/pds_core/core.c | 60 +++++++++++++++++++
 drivers/net/ethernet/pensando/pds_core/core.h |  9 +++
 drivers/net/ethernet/pensando/pds_core/dev.c  |  3 +
 drivers/net/ethernet/pensando/pds_core/main.c | 50 ++++++++++++++++
 4 files changed, 122 insertions(+)

diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
index d846e8b93575..49cab9e58da6 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.c
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -41,3 +41,63 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 
 	set_bit(PDSC_S_FW_DEAD, &pdsc->state);
 }
+
+static void pdsc_fw_down(struct pdsc *pdsc)
+{
+	mutex_lock(&pdsc->config_lock);
+
+	if (test_and_set_bit(PDSC_S_FW_DEAD, &pdsc->state)) {
+		dev_err(pdsc->dev, "%s: already happening\n", __func__);
+		mutex_unlock(&pdsc->config_lock);
+		return;
+	}
+
+	pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY);
+
+	mutex_unlock(&pdsc->config_lock);
+}
+
+static void pdsc_fw_up(struct pdsc *pdsc)
+{
+	int err;
+
+	mutex_lock(&pdsc->config_lock);
+
+	if (!test_bit(PDSC_S_FW_DEAD, &pdsc->state)) {
+		dev_err(pdsc->dev, "%s: fw not dead\n", __func__);
+		mutex_unlock(&pdsc->config_lock);
+		return;
+	}
+
+	err = pdsc_setup(pdsc, PDSC_SETUP_RECOVERY);
+	if (err)
+		goto err_out;
+
+	mutex_unlock(&pdsc->config_lock);
+
+	return;
+
+err_out:
+	pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY);
+	mutex_unlock(&pdsc->config_lock);
+}
+
+void pdsc_health_thread(struct work_struct *work)
+{
+	struct pdsc *pdsc = container_of(work, struct pdsc, health_work);
+	bool healthy;
+
+	healthy = pdsc_is_fw_good(pdsc);
+	dev_dbg(pdsc->dev, "%s: health %d fw_status %#02x fw_heartbeat %d\n",
+		__func__, healthy, pdsc->fw_status, pdsc->last_hb);
+
+	if (test_bit(PDSC_S_FW_DEAD, &pdsc->state)) {
+		if (healthy)
+			pdsc_fw_up(pdsc);
+	} else {
+		if (!healthy)
+			pdsc_fw_down(pdsc);
+	}
+
+	pdsc->fw_generation = pdsc->fw_status & PDS_CORE_FW_STS_F_GENERATION;
+}
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index bd86a9cd8e03..462f7df99b3f 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -12,6 +12,8 @@
 #include <linux/pds/pds_intr.h>
 
 #define PDSC_DRV_DESCRIPTION	"Pensando Core PF Driver"
+
+#define PDSC_WATCHDOG_SECS	5
 #define PDSC_TEARDOWN_RECOVERY  false
 #define PDSC_TEARDOWN_REMOVING  true
 #define PDSC_SETUP_RECOVERY	false
@@ -64,12 +66,17 @@ struct pdsc {
 	u8 fw_generation;
 	unsigned long last_fw_time;
 	u32 last_hb;
+	struct timer_list wdtimer;
+	unsigned int wdtimer_period;
+	struct work_struct health_work;
 
 	struct pdsc_devinfo dev_info;
 	struct pds_core_dev_identity dev_ident;
 	unsigned int nintrs;
 	struct pdsc_intr_info *intr_info;	/* array of nintrs elements */
 
+	struct workqueue_struct *wq;
+
 	unsigned int devcmd_timeout;
 	struct mutex devcmd_lock;	/* lock for dev_cmd operations */
 	struct mutex config_lock;	/* lock for configuration operations */
@@ -82,6 +89,7 @@ struct pdsc {
 	u64 __iomem *kern_dbpage;
 };
 
+void pdsc_queue_health_check(struct pdsc *pdsc);
 void __iomem *pdsc_map_dbpage(struct pdsc *pdsc, int page_num);
 
 struct pdsc *pdsc_dl_alloc(struct device *dev);
@@ -122,5 +130,6 @@ int pdsc_dev_init(struct pdsc *pdsc);
 
 int pdsc_setup(struct pdsc *pdsc, bool init);
 void pdsc_teardown(struct pdsc *pdsc, bool removing);
+void pdsc_health_thread(struct work_struct *work);
 
 #endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/pensando/pds_core/dev.c b/drivers/net/ethernet/pensando/pds_core/dev.c
index addbd300e5c3..d6ef8a1bf46b 100644
--- a/drivers/net/ethernet/pensando/pds_core/dev.c
+++ b/drivers/net/ethernet/pensando/pds_core/dev.c
@@ -179,6 +179,9 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 	err = pdsc_devcmd_wait(pdsc, max_seconds);
 	memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));
 
+	if (err == -ENXIO || err == -ETIMEDOUT)
+		pdsc_queue_health_check(pdsc);
+
 	return err;
 }
 
diff --git a/drivers/net/ethernet/pensando/pds_core/main.c b/drivers/net/ethernet/pensando/pds_core/main.c
index 770b3f895bbb..23f209d3375c 100644
--- a/drivers/net/ethernet/pensando/pds_core/main.c
+++ b/drivers/net/ethernet/pensando/pds_core/main.c
@@ -25,6 +25,31 @@ static const struct pci_device_id pdsc_id_table[] = {
 };
 MODULE_DEVICE_TABLE(pci, pdsc_id_table);
 
+void pdsc_queue_health_check(struct pdsc *pdsc)
+{
+	unsigned long mask;
+
+	/* Don't do a check when in a transition state */
+	mask = BIT_ULL(PDSC_S_INITING_DRIVER) |
+	       BIT_ULL(PDSC_S_STOPPING_DRIVER);
+	if (pdsc->state & mask)
+		return;
+
+	/* Queue a new health check if one isn't already queued */
+	queue_work(pdsc->wq, &pdsc->health_work);
+}
+
+static void pdsc_wdtimer_cb(struct timer_list *t)
+{
+	struct pdsc *pdsc = from_timer(pdsc, t, wdtimer);
+
+	dev_dbg(pdsc->dev, "%s: jiffies %ld\n", __func__, jiffies);
+	mod_timer(&pdsc->wdtimer,
+		  round_jiffies(jiffies + pdsc->wdtimer_period));
+
+	pdsc_queue_health_check(pdsc);
+}
+
 static void pdsc_unmap_bars(struct pdsc *pdsc)
 {
 	struct pdsc_dev_bar *bars = pdsc->bars;
@@ -135,9 +160,12 @@ static int pdsc_map_bars(struct pdsc *pdsc)
 
 static DEFINE_IDA(pdsc_pf_ida);
 
+#define PDSC_WQ_NAME_LEN 24
+
 static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	struct device *dev = &pdev->dev;
+	char wq_name[PDSC_WQ_NAME_LEN];
 	struct pdsc *pdsc;
 	int err;
 
@@ -189,6 +217,13 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto err_out_pci_disable_device;
 
+	/* General workqueue and timer, but don't start timer yet */
+	snprintf(wq_name, sizeof(wq_name), "%s.%d", PDS_CORE_DRV_NAME, pdsc->id);
+	pdsc->wq = create_singlethread_workqueue(wq_name);
+	INIT_WORK(&pdsc->health_work, pdsc_health_thread);
+	timer_setup(&pdsc->wdtimer, pdsc_wdtimer_cb, 0);
+	pdsc->wdtimer_period = PDSC_WATCHDOG_SECS * HZ;
+
 	/* PDS device setup */
 	mutex_init(&pdsc->devcmd_lock);
 	mutex_init(&pdsc->config_lock);
@@ -209,6 +244,8 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pdsc->fw_generation = PDS_CORE_FW_STS_F_GENERATION &
 			      ioread8(&pdsc->info_regs->fw_status);
+	/* Lastly, start the health check timer */
+	mod_timer(&pdsc->wdtimer, round_jiffies(jiffies + pdsc->wdtimer_period));
 
 	clear_bit(PDSC_S_INITING_DRIVER, &pdsc->state);
 	return 0;
@@ -216,6 +253,12 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_out:
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
 err_out_unmap_bars:
+	del_timer_sync(&pdsc->wdtimer);
+	if (pdsc->wq) {
+		flush_workqueue(pdsc->wq);
+		destroy_workqueue(pdsc->wq);
+		pdsc->wq = NULL;
+	}
 	mutex_unlock(&pdsc->config_lock);
 	mutex_destroy(&pdsc->config_lock);
 	mutex_destroy(&pdsc->devcmd_lock);
@@ -248,6 +291,13 @@ static void pdsc_remove(struct pci_dev *pdev)
 	mutex_lock(&pdsc->config_lock);
 	set_bit(PDSC_S_STOPPING_DRIVER, &pdsc->state);
 
+	del_timer_sync(&pdsc->wdtimer);
+	if (pdsc->wq) {
+		flush_workqueue(pdsc->wq);
+		destroy_workqueue(pdsc->wq);
+		pdsc->wq = NULL;
+	}
+
 	/* Device teardown */
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
 	pdsc_debugfs_del_dev(pdsc);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 04/19] pds_core: set up device and adminq
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (2 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 03/19] pds_core: health timer and workqueue Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 05/19] pds_core: Add adminq processing and commands Shannon Nelson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Set up the basic adminq and notifyq services.  These are used
mostly by the client drivers for feature configuration.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/net/ethernet/pensando/pds_core/core.c | 444 +++++++++++-
 drivers/net/ethernet/pensando/pds_core/core.h | 149 ++++
 .../net/ethernet/pensando/pds_core/debugfs.c  | 125 ++++
 .../net/ethernet/pensando/pds_core/devlink.c  |  55 +-
 drivers/net/ethernet/pensando/pds_core/main.c |  14 +
 include/linux/pds/pds_adminq.h                | 641 ++++++++++++++++++
 6 files changed, 1424 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pds/pds_adminq.h

diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
index 49cab9e58da6..507f718bc8ab 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.c
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -10,8 +10,368 @@
 
 #include "core.h"
 
+#include <linux/pds/pds_adminq.h>
+
+void pdsc_work_thread(struct work_struct *work)
+{
+	/* stub */
+}
+
+irqreturn_t pdsc_adminq_isr(int irq, void *data)
+{
+	/* stub */
+	return IRQ_HANDLED;
+}
+
+void pdsc_intr_free(struct pdsc *pdsc, int index)
+{
+	struct pdsc_intr_info *intr_info;
+
+	if (index >= pdsc->nintrs || index < 0) {
+		WARN(true, "bad intr index %d\n", index);
+		return;
+	}
+
+	intr_info = &pdsc->intr_info[index];
+	if (!intr_info->vector)
+		return;
+	dev_dbg(pdsc->dev, "%s: idx %d vec %d name %s\n",
+		__func__, index, intr_info->vector, intr_info->name);
+
+	pds_core_intr_mask(&pdsc->intr_ctrl[index], PDS_CORE_INTR_MASK_SET);
+	pds_core_intr_clean(&pdsc->intr_ctrl[index]);
+
+	devm_free_irq(pdsc->dev, intr_info->vector, intr_info->data);
+
+	memset(intr_info, 0, sizeof(*intr_info));
+}
+
+int pdsc_intr_alloc(struct pdsc *pdsc, char *name,
+		    irq_handler_t handler, void *data)
+{
+	struct pdsc_intr_info *intr_info;
+	unsigned int index;
+	int err;
+
+	/* Find the first available interrupt */
+	for (index = 0; index < pdsc->nintrs; index++)
+		if (!pdsc->intr_info[index].vector)
+			break;
+	if (index >= pdsc->nintrs) {
+		dev_warn(pdsc->dev, "%s: no intr, index=%d nintrs=%d\n",
+			 __func__, index, pdsc->nintrs);
+		return -ENOSPC;
+	}
+
+	pds_core_intr_clean_flags(&pdsc->intr_ctrl[index],
+				  PDS_CORE_INTR_CRED_RESET_COALESCE);
+
+	intr_info = &pdsc->intr_info[index];
+
+	intr_info->index = index;
+	intr_info->data = data;
+	strscpy(intr_info->name, name, sizeof(intr_info->name));
+
+	/* Get the OS vector number for the interrupt */
+	err = pci_irq_vector(pdsc->pdev, index);
+	if (err < 0) {
+		dev_err(pdsc->dev, "failed to get intr vector index %d: %pe\n",
+			index, ERR_PTR(err));
+		goto err_out_free_intr;
+	}
+	intr_info->vector = err;
+
+	/* Init the device's intr mask */
+	pds_core_intr_clean(&pdsc->intr_ctrl[index]);
+	pds_core_intr_mask_assert(&pdsc->intr_ctrl[index], 1);
+	pds_core_intr_mask(&pdsc->intr_ctrl[index], PDS_CORE_INTR_MASK_SET);
+
+	/* Register the isr with a name */
+	err = devm_request_irq(pdsc->dev, intr_info->vector,
+			       handler, 0, intr_info->name, data);
+	if (err) {
+		dev_err(pdsc->dev, "failed to get intr irq vector %d: %pe\n",
+			intr_info->vector, ERR_PTR(err));
+		goto err_out_free_intr;
+	}
+
+	return index;
+
+err_out_free_intr:
+	pdsc_intr_free(pdsc, index);
+	return err;
+}
+
+static void pdsc_qcq_intr_free(struct pdsc *pdsc, struct pdsc_qcq *qcq)
+{
+	if (!(qcq->flags & PDS_CORE_QCQ_F_INTR) ||
+	    qcq->intx == PDS_CORE_INTR_INDEX_NOT_ASSIGNED)
+		return;
+
+	pdsc_intr_free(pdsc, qcq->intx);
+	qcq->intx = PDS_CORE_INTR_INDEX_NOT_ASSIGNED;
+}
+
+static int pdsc_qcq_intr_alloc(struct pdsc *pdsc, struct pdsc_qcq *qcq)
+{
+	char name[PDSC_INTR_NAME_MAX_SZ];
+	int index;
+
+	if (!(qcq->flags & PDS_CORE_QCQ_F_INTR)) {
+		qcq->intx = PDS_CORE_INTR_INDEX_NOT_ASSIGNED;
+		return 0;
+	}
+
+	snprintf(name, sizeof(name),
+		 "%s-%d-%s", PDS_CORE_DRV_NAME, pdsc->pdev->bus->number, qcq->q.name);
+	index = pdsc_intr_alloc(pdsc, name, pdsc_adminq_isr, qcq);
+	if (index < 0)
+		return index;
+	qcq->intx = index;
+
+	return 0;
+}
+
+void pdsc_qcq_free(struct pdsc *pdsc, struct pdsc_qcq *qcq)
+{
+	struct device *dev = pdsc->dev;
+
+	if (!(qcq && qcq->pdsc))
+		return;
+
+	pdsc_debugfs_del_qcq(qcq);
+
+	pdsc_qcq_intr_free(pdsc, qcq);
+
+	if (qcq->q_base) {
+		dmam_free_coherent(dev, qcq->q_size,
+				   qcq->q_base, qcq->q_base_pa);
+		qcq->q_base = NULL;
+		qcq->q_base_pa = 0;
+	}
+
+	if (qcq->cq_base) {
+		dmam_free_coherent(dev, qcq->cq_size, qcq->cq_base, qcq->cq_base_pa);
+		qcq->cq_base = NULL;
+		qcq->cq_base_pa = 0;
+	}
+
+	if (qcq->cq.info) {
+		vfree(qcq->cq.info);
+		qcq->cq.info = NULL;
+	}
+	if (qcq->q.info) {
+		vfree(qcq->q.info);
+		qcq->q.info = NULL;
+	}
+
+	qcq->pdsc = NULL;
+	memset(&qcq->q, 0, sizeof(qcq->q));
+	memset(&qcq->cq, 0, sizeof(qcq->cq));
+}
+
+static void pdsc_q_map(struct pdsc_queue *q, void *base, dma_addr_t base_pa)
+{
+	struct pdsc_q_info *cur;
+	unsigned int i;
+
+	q->base = base;
+	q->base_pa = base_pa;
+
+	for (i = 0, cur = q->info; i < q->num_descs; i++, cur++)
+		cur->desc = base + (i * q->desc_size);
+}
+
+static void pdsc_cq_map(struct pdsc_cq *cq, void *base, dma_addr_t base_pa)
+{
+	struct pdsc_cq_info *cur;
+	unsigned int i;
+
+	cq->base = base;
+	cq->base_pa = base_pa;
+
+	for (i = 0, cur = cq->info; i < cq->num_descs; i++, cur++)
+		cur->comp = base + (i * cq->desc_size);
+}
+
+int pdsc_qcq_alloc(struct pdsc *pdsc, unsigned int type, unsigned int index,
+		   const char *name, unsigned int flags, unsigned int num_descs,
+		   unsigned int desc_size, unsigned int cq_desc_size,
+		   unsigned int pid, struct pdsc_qcq *qcq)
+{
+	struct device *dev = pdsc->dev;
+	dma_addr_t cq_base_pa = 0;
+	dma_addr_t q_base_pa = 0;
+	void *q_base, *cq_base;
+	int err;
+
+	qcq->q.info = vzalloc(num_descs * sizeof(*qcq->q.info));
+	if (!qcq->q.info) {
+		dev_err(dev, "Cannot allocate %s queue info\n", name);
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	qcq->pdsc = pdsc;
+	qcq->flags = flags;
+	INIT_WORK(&qcq->work, pdsc_work_thread);
+
+	qcq->q.type = type;
+	qcq->q.index = index;
+	qcq->q.num_descs = num_descs;
+	qcq->q.desc_size = desc_size;
+	qcq->q.tail_idx = 0;
+	qcq->q.head_idx = 0;
+	qcq->q.pid = pid;
+	snprintf(qcq->q.name, sizeof(qcq->q.name), "%s%u", name, index);
+
+	err = pdsc_qcq_intr_alloc(pdsc, qcq);
+	if (err)
+		goto err_out_free_q_info;
+
+	qcq->cq.info = vzalloc(num_descs * sizeof(*qcq->cq.info));
+	if (!qcq->cq.info) {
+		dev_err(dev, "Cannot allocate %s completion queue info\n", name);
+		err = -ENOMEM;
+		goto err_out_free_irq;
+	}
+
+	qcq->cq.bound_intr = &pdsc->intr_info[qcq->intx];
+	qcq->cq.num_descs = num_descs;
+	qcq->cq.desc_size = cq_desc_size;
+	qcq->cq.tail_idx = 0;
+	qcq->cq.done_color = 1;
+
+	if (flags & PDS_CORE_QCQ_F_NOTIFYQ) {
+		/* q & cq need to be contiguous in case of notifyq */
+		qcq->q_size = PAGE_SIZE + ALIGN(num_descs * desc_size, PAGE_SIZE) +
+						ALIGN(num_descs * cq_desc_size, PAGE_SIZE);
+		qcq->q_base = dmam_alloc_coherent(dev, qcq->q_size + qcq->cq_size,
+						  &qcq->q_base_pa,
+						  GFP_KERNEL);
+		if (!qcq->q_base) {
+			dev_err(dev, "Cannot allocate %s qcq DMA memory\n", name);
+			err = -ENOMEM;
+			goto err_out_free_cq_info;
+		}
+		q_base = PTR_ALIGN(qcq->q_base, PAGE_SIZE);
+		q_base_pa = ALIGN(qcq->q_base_pa, PAGE_SIZE);
+		pdsc_q_map(&qcq->q, q_base, q_base_pa);
+
+		cq_base = PTR_ALIGN(q_base +
+			ALIGN(num_descs * desc_size, PAGE_SIZE), PAGE_SIZE);
+		cq_base_pa = ALIGN(qcq->q_base_pa +
+			ALIGN(num_descs * desc_size, PAGE_SIZE), PAGE_SIZE);
+
+	} else {
+		/* q DMA descriptors */
+		qcq->q_size = PAGE_SIZE + (num_descs * desc_size);
+		qcq->q_base = dmam_alloc_coherent(dev, qcq->q_size,
+						  &qcq->q_base_pa,
+						  GFP_KERNEL);
+		if (!qcq->q_base) {
+			dev_err(dev, "Cannot allocate %s queue DMA memory\n", name);
+			err = -ENOMEM;
+			goto err_out_free_cq_info;
+		}
+		q_base = PTR_ALIGN(qcq->q_base, PAGE_SIZE);
+		q_base_pa = ALIGN(qcq->q_base_pa, PAGE_SIZE);
+		pdsc_q_map(&qcq->q, q_base, q_base_pa);
+
+		/* cq DMA descriptors */
+		qcq->cq_size = PAGE_SIZE + (num_descs * cq_desc_size);
+		qcq->cq_base = dmam_alloc_coherent(dev, qcq->cq_size,
+						   &qcq->cq_base_pa,
+						   GFP_KERNEL);
+		if (!qcq->cq_base) {
+			dev_err(dev, "Cannot allocate %s cq DMA memory\n", name);
+			err = -ENOMEM;
+			goto err_out_free_q;
+		}
+		cq_base = PTR_ALIGN(qcq->cq_base, PAGE_SIZE);
+		cq_base_pa = ALIGN(qcq->cq_base_pa, PAGE_SIZE);
+	}
+
+	pdsc_cq_map(&qcq->cq, cq_base, cq_base_pa);
+	qcq->cq.bound_q = &qcq->q;
+
+	pdsc_debugfs_add_qcq(pdsc, qcq);
+
+	return 0;
+
+err_out_free_q:
+	dmam_free_coherent(dev, qcq->q_size, qcq->q_base, qcq->q_base_pa);
+err_out_free_cq_info:
+	vfree(qcq->cq.info);
+err_out_free_irq:
+	pdsc_qcq_intr_free(pdsc, qcq);
+err_out_free_q_info:
+	vfree(qcq->q.info);
+	memset(qcq, 0, sizeof(*qcq));
+err_out:
+	dev_err(dev, "qcq alloc of %s%d failed %d\n", name, index, err);
+	return err;
+}
+
+static int pdsc_core_init(struct pdsc *pdsc)
+{
+	union pds_core_dev_comp comp = { 0 };
+	union pds_core_dev_cmd cmd = {
+		.init.opcode = PDS_CORE_CMD_INIT,
+	};
+	struct pds_core_dev_init_data_out cido;
+	struct pds_core_dev_init_data_in cidi;
+	u32 dbid_count;
+	u32 dbpage_num;
+	size_t sz;
+	int err;
+
+	cidi.adminq_q_base = cpu_to_le64(pdsc->adminqcq.q_base_pa);
+	cidi.adminq_cq_base = cpu_to_le64(pdsc->adminqcq.cq_base_pa);
+	cidi.notifyq_cq_base = cpu_to_le64(pdsc->notifyqcq.cq.base_pa);
+	cidi.flags = cpu_to_le32(PDS_CORE_QINIT_F_IRQ | PDS_CORE_QINIT_F_ENA);
+	cidi.intr_index = cpu_to_le16(pdsc->adminqcq.intx);
+	cidi.adminq_ring_size = ilog2(pdsc->adminqcq.q.num_descs);
+	cidi.notifyq_ring_size = ilog2(pdsc->notifyqcq.q.num_descs);
+
+	mutex_lock(&pdsc->devcmd_lock);
+
+	sz = min_t(size_t, sizeof(cidi), sizeof(pdsc->cmd_regs->data));
+	memcpy_toio(&pdsc->cmd_regs->data, &cidi, sz);
+
+	err = pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+
+	sz = min_t(size_t, sizeof(cido), sizeof(pdsc->cmd_regs->data));
+	memcpy_fromio(&cido, &pdsc->cmd_regs->data, sz);
+
+	mutex_unlock(&pdsc->devcmd_lock);
+
+	pdsc->hw_index = le32_to_cpu(cido.core_hw_index);
+
+	dbid_count = le32_to_cpu(pdsc->dev_ident.ndbpgs_per_lif);
+	dbpage_num = pdsc->hw_index * dbid_count;
+	pdsc->kern_dbpage = pdsc_map_dbpage(pdsc, dbpage_num);
+	if (!pdsc->kern_dbpage) {
+		dev_err(pdsc->dev, "Cannot map dbpage, aborting\n");
+		return -ENOMEM;
+	}
+
+	pdsc->adminqcq.q.hw_type = cido.adminq_hw_type;
+	pdsc->adminqcq.q.hw_index = le32_to_cpu(cido.adminq_hw_index);
+	pdsc->adminqcq.q.dbval = PDS_CORE_DBELL_QID(pdsc->adminqcq.q.hw_index);
+
+	pdsc->notifyqcq.q.hw_type = cido.notifyq_hw_type;
+	pdsc->notifyqcq.q.hw_index = le32_to_cpu(cido.notifyq_hw_index);
+	pdsc->notifyqcq.q.dbval = PDS_CORE_DBELL_QID(pdsc->notifyqcq.q.hw_index);
+
+	pdsc->last_eid = 0;
+
+	return err;
+}
+
 int pdsc_setup(struct pdsc *pdsc, bool init)
 {
+	int numdescs;
 	int err = 0;
 
 	if (init)
@@ -21,17 +381,60 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
 	if (err)
 		return err;
 
+	/* Scale the descriptor ring length based on number of CPUs and VFs */
+	numdescs = max_t(int, PDSC_ADMINQ_MIN_LENGTH, num_online_cpus());
+	numdescs += 2 * pci_sriov_get_totalvfs(pdsc->pdev);
+	numdescs = roundup_pow_of_two(numdescs);
+	err = pdsc_qcq_alloc(pdsc, PDS_CORE_QTYPE_ADMINQ, 0, "adminq",
+			     PDS_CORE_QCQ_F_CORE | PDS_CORE_QCQ_F_INTR,
+			     numdescs,
+			     sizeof(union pds_core_adminq_cmd),
+			     sizeof(union pds_core_adminq_comp),
+			     0, &pdsc->adminqcq);
+	if (err)
+		goto err_out_teardown;
+
+	err = pdsc_qcq_alloc(pdsc, PDS_CORE_QTYPE_NOTIFYQ, 0, "notifyq",
+			     PDS_CORE_QCQ_F_NOTIFYQ,
+			     PDSC_NOTIFYQ_LENGTH,
+			     sizeof(struct pds_core_notifyq_cmd),
+			     sizeof(union pds_core_notifyq_comp),
+			     0, &pdsc->notifyqcq);
+	if (err)
+		goto err_out_teardown;
+
+	/* NotifyQ rides on the AdminQ interrupt */
+	pdsc->notifyqcq.intx = pdsc->adminqcq.intx;
+
+	/* Set up the Core with the AdminQ and NotifyQ info */
+	err = pdsc_core_init(pdsc);
+	if (err)
+		goto err_out_teardown;
+
 	clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
 	return 0;
+
+err_out_teardown:
+	pdsc_teardown(pdsc, init);
+	return err;
 }
 
 void pdsc_teardown(struct pdsc *pdsc, bool removing)
 {
+	int i;
+
 	pdsc_devcmd_reset(pdsc);
+	pdsc_qcq_free(pdsc, &pdsc->notifyqcq);
+	pdsc_qcq_free(pdsc, &pdsc->adminqcq);
 
-	if (removing && pdsc->intr_info) {
-		devm_kfree(pdsc->dev, pdsc->intr_info);
-		pdsc->intr_info = NULL;
+	if (pdsc->intr_info) {
+		for (i = 0; i < pdsc->nintrs; i++)
+			pdsc_intr_free(pdsc, i);
+
+		if (removing) {
+			devm_kfree(pdsc->dev, pdsc->intr_info);
+			pdsc->intr_info = NULL;
+		}
 	}
 
 	if (pdsc->kern_dbpage) {
@@ -42,6 +445,36 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 	set_bit(PDSC_S_FW_DEAD, &pdsc->state);
 }
 
+int pdsc_start(struct pdsc *pdsc)
+{
+	pds_core_intr_mask(&pdsc->intr_ctrl[pdsc->adminqcq.intx],
+			   PDS_CORE_INTR_MASK_CLEAR);
+
+	return 0;
+}
+
+static void pdsc_mask_interrupts(struct pdsc *pdsc)
+{
+	int i;
+
+	if (!pdsc->intr_info)
+		return;
+
+	/* Mask interrupts that are in use */
+	for (i = 0; i < pdsc->nintrs; i++)
+		if (pdsc->intr_info[i].vector)
+			pds_core_intr_mask(&pdsc->intr_ctrl[i],
+					   PDS_CORE_INTR_MASK_SET);
+}
+
+void pdsc_stop(struct pdsc *pdsc)
+{
+	if (pdsc->wq)
+		flush_workqueue(pdsc->wq);
+
+	pdsc_mask_interrupts(pdsc);
+}
+
 static void pdsc_fw_down(struct pdsc *pdsc)
 {
 	mutex_lock(&pdsc->config_lock);
@@ -52,6 +485,7 @@ static void pdsc_fw_down(struct pdsc *pdsc)
 		return;
 	}
 
+	pdsc_mask_interrupts(pdsc);
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY);
 
 	mutex_unlock(&pdsc->config_lock);
@@ -73,6 +507,10 @@ static void pdsc_fw_up(struct pdsc *pdsc)
 	if (err)
 		goto err_out;
 
+	err = pdsc_start(pdsc);
+	if (err)
+		goto err_out;
+
 	mutex_unlock(&pdsc->config_lock);
 
 	return;
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 462f7df99b3f..6b816b7fb193 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -9,11 +9,15 @@
 
 #include <linux/pds/pds_common.h>
 #include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
 #include <linux/pds/pds_intr.h>
 
 #define PDSC_DRV_DESCRIPTION	"Pensando Core PF Driver"
 
 #define PDSC_WATCHDOG_SECS	5
+#define PDSC_QUEUE_NAME_MAX_SZ  32
+#define PDSC_ADMINQ_MIN_LENGTH	16	/* must be a power of two */
+#define PDSC_NOTIFYQ_LENGTH	64	/* must be a power of two */
 #define PDSC_TEARDOWN_RECOVERY  false
 #define PDSC_TEARDOWN_REMOVING  true
 #define PDSC_SETUP_RECOVERY	false
@@ -33,6 +37,28 @@ struct pdsc_devinfo {
 	char serial_num[PDS_CORE_DEVINFO_SERIAL_BUFLEN + 1];
 };
 
+struct pdsc_queue {
+	struct pdsc_q_info *info;
+	u64 dbval;
+	u16 head_idx;
+	u16 tail_idx;
+	u8 hw_type;
+	unsigned int index;
+	unsigned int num_descs;
+	u64 dbell_count;
+	u64 features;
+	unsigned int type;
+	unsigned int hw_index;
+	union {
+		void *base;
+		struct pds_core_admin_cmd *adminq;
+	};
+	dma_addr_t base_pa;	/* must be page aligned */
+	unsigned int desc_size;
+	unsigned int pid;
+	char name[PDSC_QUEUE_NAME_MAX_SZ];
+};
+
 #define PDSC_INTR_NAME_MAX_SZ		32
 
 struct pdsc_intr_info {
@@ -42,6 +68,61 @@ struct pdsc_intr_info {
 	void *data;
 };
 
+struct pdsc_cq_info {
+	void *comp;
+};
+
+struct pdsc_buf_info {
+	struct page *page;
+	dma_addr_t dma_addr;
+	u32 page_offset;
+	u32 len;
+};
+
+struct pdsc_q_info {
+	union {
+		void *desc;
+		struct pdsc_admin_cmd *adminq_desc;
+	};
+	unsigned int bytes;
+	unsigned int nbufs;
+	struct pdsc_buf_info bufs[PDS_CORE_MAX_FRAGS];
+	struct pdsc_wait_context *wc;
+	void *dest;
+};
+
+struct pdsc_cq {
+	struct pdsc_cq_info *info;
+	struct pdsc_queue *bound_q;
+	struct pdsc_intr_info *bound_intr;
+	u16 tail_idx;
+	bool done_color;
+	unsigned int num_descs;
+	unsigned int desc_size;
+	void *base;
+	dma_addr_t base_pa;	/* must be page aligned */
+} ____cacheline_aligned_in_smp;
+
+struct pdsc_qcq {
+	struct pdsc *pdsc;
+	void *q_base;
+	dma_addr_t q_base_pa;	/* might not be page aligned */
+	void *cq_base;
+	dma_addr_t cq_base_pa;	/* might not be page aligned */
+	u32 q_size;
+	u32 cq_size;
+	bool armed;
+	unsigned int flags;
+
+	struct work_struct work;
+	struct pdsc_queue q;
+	struct pdsc_cq cq;
+	int intx;
+
+	u32 accum_work;
+	struct dentry *dentry;
+};
+
 /* No state flags set means we are in a steady running state */
 enum pdsc_state_flags {
 	PDSC_S_FW_DEAD,		    /* fw stopped, waiting for startup or recovery */
@@ -80,6 +161,7 @@ struct pdsc {
 	unsigned int devcmd_timeout;
 	struct mutex devcmd_lock;	/* lock for dev_cmd operations */
 	struct mutex config_lock;	/* lock for configuration operations */
+	spinlock_t adminq_lock;		/* lock for adminq operations */
 	struct pds_core_dev_info_regs __iomem *info_regs;
 	struct pds_core_dev_cmd_regs __iomem *cmd_regs;
 	struct pds_core_intr __iomem *intr_ctrl;
@@ -87,8 +169,57 @@ struct pdsc {
 	u64 __iomem *db_pages;
 	dma_addr_t phy_db_pages;
 	u64 __iomem *kern_dbpage;
+
+	struct pdsc_qcq adminqcq;
+	struct pdsc_qcq notifyqcq;
+	u64 last_eid;
+};
+
+/** enum pds_core_dbell_bits - bitwise composition of dbell values.
+ *
+ * @PDS_CORE_DBELL_QID_MASK:	unshifted mask of valid queue id bits.
+ * @PDS_CORE_DBELL_QID_SHIFT:	queue id shift amount in dbell value.
+ * @PDS_CORE_DBELL_QID:		macro to build QID component of dbell value.
+ *
+ * @PDS_CORE_DBELL_RING_MASK:	unshifted mask of valid ring bits.
+ * @PDS_CORE_DBELL_RING_SHIFT:	ring shift amount in dbell value.
+ * @PDS_CORE_DBELL_RING:		macro to build ring component of dbell value.
+ *
+ * @PDS_CORE_DBELL_RING_0:		ring zero dbell component value.
+ * @PDS_CORE_DBELL_RING_1:		ring one dbell component value.
+ * @PDS_CORE_DBELL_RING_2:		ring two dbell component value.
+ * @PDS_CORE_DBELL_RING_3:		ring three dbell component value.
+ *
+ * @PDS_CORE_DBELL_INDEX_MASK:	bit mask of valid index bits, no shift needed.
+ */
+enum pds_core_dbell_bits {
+	PDS_CORE_DBELL_QID_MASK		= 0xffffff,
+	PDS_CORE_DBELL_QID_SHIFT		= 24,
+
+#define PDS_CORE_DBELL_QID(n) \
+	(((u64)(n) & PDS_CORE_DBELL_QID_MASK) << PDS_CORE_DBELL_QID_SHIFT)
+
+	PDS_CORE_DBELL_RING_MASK		= 0x7,
+	PDS_CORE_DBELL_RING_SHIFT		= 16,
+
+#define PDS_CORE_DBELL_RING(n) \
+	(((u64)(n) & PDS_CORE_DBELL_RING_MASK) << PDS_CORE_DBELL_RING_SHIFT)
+
+	PDS_CORE_DBELL_RING_0		= 0,
+	PDS_CORE_DBELL_RING_1		= PDS_CORE_DBELL_RING(1),
+	PDS_CORE_DBELL_RING_2		= PDS_CORE_DBELL_RING(2),
+	PDS_CORE_DBELL_RING_3		= PDS_CORE_DBELL_RING(3),
+
+	PDS_CORE_DBELL_INDEX_MASK		= 0xffff,
 };
 
+static inline void pds_core_dbell_ring(u64 __iomem *db_page,
+				       enum pds_core_logical_qtype qtype,
+				       u64 val)
+{
+	writeq(val, &db_page[qtype]);
+}
+
 void pdsc_queue_health_check(struct pdsc *pdsc);
 void __iomem *pdsc_map_dbpage(struct pdsc *pdsc, int page_num);
 
@@ -104,6 +235,8 @@ void pdsc_debugfs_add_dev(struct pdsc *pdsc);
 void pdsc_debugfs_del_dev(struct pdsc *pdsc);
 void pdsc_debugfs_add_ident(struct pdsc *pdsc);
 void pdsc_debugfs_add_irqs(struct pdsc *pdsc);
+void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq);
+void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq);
 #else
 static inline void pdsc_debugfs_create(void) { }
 static inline void pdsc_debugfs_destroy(void) { }
@@ -111,6 +244,8 @@ static inline void pdsc_debugfs_add_dev(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_del_dev(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_add_ident(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_add_irqs(struct pdsc *pdsc) { }
+static inline void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq) { }
+static inline void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq) { }
 #endif
 
 int pdsc_err_to_errno(enum pds_core_status_code code);
@@ -128,8 +263,22 @@ int pds_devcmd_vf_start(struct pdsc *pdsc);
 int pdsc_dev_reinit(struct pdsc *pdsc);
 int pdsc_dev_init(struct pdsc *pdsc);
 
+int pdsc_intr_alloc(struct pdsc *pdsc, char *name,
+		    irq_handler_t handler, void *data);
+void pdsc_intr_free(struct pdsc *pdsc, int index);
+void pdsc_qcq_free(struct pdsc *pdsc, struct pdsc_qcq *qcq);
+int pdsc_qcq_alloc(struct pdsc *pdsc, unsigned int type, unsigned int index,
+		   const char *name, unsigned int flags, unsigned int num_descs,
+		   unsigned int desc_size, unsigned int cq_desc_size,
+		   unsigned int pid, struct pdsc_qcq *qcq);
 int pdsc_setup(struct pdsc *pdsc, bool init);
 void pdsc_teardown(struct pdsc *pdsc, bool removing);
+int pdsc_start(struct pdsc *pdsc);
+void pdsc_stop(struct pdsc *pdsc);
 void pdsc_health_thread(struct work_struct *work);
 
+void pdsc_process_adminq(struct pdsc_qcq *qcq);
+void pdsc_work_thread(struct work_struct *work);
+irqreturn_t pdsc_adminq_isr(int irq, void *data);
+
 #endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/pensando/pds_core/debugfs.c b/drivers/net/ethernet/pensando/pds_core/debugfs.c
index 698fd6d09387..294bb97ca639 100644
--- a/drivers/net/ethernet/pensando/pds_core/debugfs.c
+++ b/drivers/net/ethernet/pensando/pds_core/debugfs.c
@@ -111,4 +111,129 @@ void pdsc_debugfs_add_irqs(struct pdsc *pdsc)
 	debugfs_create_file("irqs", 0400, pdsc->dentry, pdsc, &irqs_fops);
 }
 
+static int q_tail_show(struct seq_file *seq, void *v)
+{
+	struct pdsc_queue *q = seq->private;
+
+	seq_printf(seq, "%d\n", q->tail_idx);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(q_tail);
+
+static int q_head_show(struct seq_file *seq, void *v)
+{
+	struct pdsc_queue *q = seq->private;
+
+	seq_printf(seq, "%d\n", q->head_idx);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(q_head);
+
+static int cq_tail_show(struct seq_file *seq, void *v)
+{
+	struct pdsc_cq *cq = seq->private;
+
+	seq_printf(seq, "%d\n", cq->tail_idx);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(cq_tail);
+
+static const struct debugfs_reg32 intr_ctrl_regs[] = {
+	{ .name = "coal_init", .offset = 0, },
+	{ .name = "mask", .offset = 4, },
+	{ .name = "credits", .offset = 8, },
+	{ .name = "mask_on_assert", .offset = 12, },
+	{ .name = "coal_timer", .offset = 16, },
+};
+
+void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq)
+{
+	struct dentry *qcq_dentry, *q_dentry, *cq_dentry;
+	struct dentry *intr_dentry;
+	struct debugfs_regset32 *intr_ctrl_regset;
+	struct pdsc_intr_info *intr = &pdsc->intr_info[qcq->intx];
+	struct debugfs_blob_wrapper *desc_blob;
+	struct device *dev = pdsc->dev;
+	struct pdsc_queue *q = &qcq->q;
+	struct pdsc_cq *cq = &qcq->cq;
+
+	qcq_dentry = debugfs_create_dir(q->name, pdsc->dentry);
+	if (IS_ERR_OR_NULL(qcq_dentry))
+		return;
+	qcq->dentry = qcq_dentry;
+
+	debugfs_create_x64("q_base_pa", 0400, qcq_dentry, &qcq->q_base_pa);
+	debugfs_create_x32("q_size", 0400, qcq_dentry, &qcq->q_size);
+	debugfs_create_x64("cq_base_pa", 0400, qcq_dentry, &qcq->cq_base_pa);
+	debugfs_create_x32("cq_size", 0400, qcq_dentry, &qcq->cq_size);
+	debugfs_create_x32("accum_work", 0400, qcq_dentry, &qcq->accum_work);
+
+	q_dentry = debugfs_create_dir("q", qcq->dentry);
+	if (IS_ERR_OR_NULL(q_dentry))
+		return;
+
+	debugfs_create_u32("index", 0400, q_dentry, &q->index);
+	debugfs_create_u32("num_descs", 0400, q_dentry, &q->num_descs);
+	debugfs_create_u32("desc_size", 0400, q_dentry, &q->desc_size);
+	debugfs_create_u32("pid", 0400, q_dentry, &q->pid);
+
+	debugfs_create_file("tail", 0400, q_dentry, q, &q_tail_fops);
+	debugfs_create_file("head", 0400, q_dentry, q, &q_head_fops);
+
+	desc_blob = devm_kzalloc(dev, sizeof(*desc_blob), GFP_KERNEL);
+	if (!desc_blob)
+		return;
+	desc_blob->data = q->base;
+	desc_blob->size = (unsigned long)q->num_descs * q->desc_size;
+	debugfs_create_blob("desc_blob", 0400, q_dentry, desc_blob);
+
+	cq_dentry = debugfs_create_dir("cq", qcq->dentry);
+	if (IS_ERR_OR_NULL(cq_dentry))
+		return;
+
+	debugfs_create_x64("base_pa", 0400, cq_dentry, &cq->base_pa);
+	debugfs_create_u32("num_descs", 0400, cq_dentry, &cq->num_descs);
+	debugfs_create_u32("desc_size", 0400, cq_dentry, &cq->desc_size);
+	debugfs_create_bool("done_color", 0400, cq_dentry, &cq->done_color);
+
+	debugfs_create_file("tail", 0400, cq_dentry, cq, &cq_tail_fops);
+
+	desc_blob = devm_kzalloc(dev, sizeof(*desc_blob), GFP_KERNEL);
+	if (!desc_blob)
+		return;
+	desc_blob->data = cq->base;
+	desc_blob->size = (unsigned long)cq->num_descs * cq->desc_size;
+	debugfs_create_blob("desc_blob", 0400, cq_dentry, desc_blob);
+
+	if (qcq->flags & PDS_CORE_QCQ_F_INTR) {
+		intr_dentry = debugfs_create_dir("intr", qcq->dentry);
+		if (IS_ERR_OR_NULL(intr_dentry))
+			return;
+
+		debugfs_create_u32("index", 0400, intr_dentry,
+				   &intr->index);
+		debugfs_create_u32("vector", 0400, intr_dentry,
+				   &intr->vector);
+
+		intr_ctrl_regset = devm_kzalloc(dev, sizeof(*intr_ctrl_regset),
+						GFP_KERNEL);
+		if (!intr_ctrl_regset)
+			return;
+		intr_ctrl_regset->regs = intr_ctrl_regs;
+		intr_ctrl_regset->nregs = ARRAY_SIZE(intr_ctrl_regs);
+		intr_ctrl_regset->base = &pdsc->intr_ctrl[intr->index];
+
+		debugfs_create_regset32("intr_ctrl", 0400, intr_dentry,
+					intr_ctrl_regset);
+	}
+};
+
+void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq)
+{
+	debugfs_remove_recursive(qcq->dentry);
+	qcq->dentry = NULL;
+}
 #endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/net/ethernet/pensando/pds_core/devlink.c b/drivers/net/ethernet/pensando/pds_core/devlink.c
index 3538aa9cf9e3..42cf17229ace 100644
--- a/drivers/net/ethernet/pensando/pds_core/devlink.c
+++ b/drivers/net/ethernet/pensando/pds_core/devlink.c
@@ -11,9 +11,62 @@
 static int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
 			    struct netlink_ext_ack *extack)
 {
+	union pds_core_dev_cmd cmd = {
+		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
+		.fw_control.oper = PDS_CORE_FW_GET_LIST,
+	};
+	struct pds_core_fw_list_info *fw_list;
 	struct pdsc *pdsc = devlink_priv(dl);
+	union pds_core_dev_comp comp;
+	char *fwprefix = "fw.";
+	char buf[16];
+	int listlen;
+	size_t sz;
+	int err;
+	int i;
 
-	return devlink_info_driver_name_put(req, pdsc->pdev->driver->name);
+	err = devlink_info_driver_name_put(req, pdsc->pdev->driver->name);
+	if (err)
+		return err;
+
+	sz = min_t(size_t, sizeof(buf),
+		   sizeof(fw_list->fw_names[0].slotname) + strlen(fwprefix));
+	fw_list = (struct pds_core_fw_list_info *)pdsc->cmd_regs->data;
+
+	mutex_lock(&pdsc->devcmd_lock);
+	err = pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout * 2);
+	listlen = fw_list->num_fw_slots;
+	for (i = 0; !err && i < listlen; i++) {
+		snprintf(buf, sz, "%s%s",
+			 fwprefix, fw_list->fw_names[i].slotname);
+		err = devlink_info_version_stored_put(req, buf,
+						      fw_list->fw_names[i].fw_version);
+	}
+	mutex_unlock(&pdsc->devcmd_lock);
+	if (err && err != -EIO)
+		return err;
+
+	err = devlink_info_version_running_put(req,
+					       DEVLINK_INFO_VERSION_GENERIC_FW,
+					       pdsc->dev_info.fw_version);
+	if (err)
+		return err;
+
+	snprintf(buf, sizeof(buf), "0x%x", pdsc->dev_info.asic_type);
+	err = devlink_info_version_fixed_put(req,
+					     DEVLINK_INFO_VERSION_GENERIC_ASIC_ID,
+					     buf);
+	if (err)
+		return err;
+
+	snprintf(buf, sizeof(buf), "0x%x", pdsc->dev_info.asic_rev);
+	err = devlink_info_version_fixed_put(req,
+					     DEVLINK_INFO_VERSION_GENERIC_ASIC_REV,
+					     buf);
+	if (err)
+		return err;
+
+	return devlink_info_serial_number_put(req, pdsc->dev_info.serial_num);
 }
 
 static const struct devlink_ops pdsc_dl_ops = {
diff --git a/drivers/net/ethernet/pensando/pds_core/main.c b/drivers/net/ethernet/pensando/pds_core/main.c
index 23f209d3375c..856704f8827a 100644
--- a/drivers/net/ethernet/pensando/pds_core/main.c
+++ b/drivers/net/ethernet/pensando/pds_core/main.c
@@ -158,6 +158,13 @@ static int pdsc_map_bars(struct pdsc *pdsc)
 	return err;
 }
 
+void __iomem *pdsc_map_dbpage(struct pdsc *pdsc, int page_num)
+{
+	return pci_iomap_range(pdsc->pdev,
+			       pdsc->bars[PDS_CORE_PCI_BAR_DBELL].res_index,
+			       (u64)page_num << PAGE_SHIFT, PAGE_SIZE);
+}
+
 static DEFINE_IDA(pdsc_pf_ida);
 
 #define PDSC_WQ_NAME_LEN 24
@@ -227,11 +234,15 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* PDS device setup */
 	mutex_init(&pdsc->devcmd_lock);
 	mutex_init(&pdsc->config_lock);
+	spin_lock_init(&pdsc->adminq_lock);
 
 	mutex_lock(&pdsc->config_lock);
 	err = pdsc_setup(pdsc, PDSC_SETUP_INIT);
 	if (err)
 		goto err_out_unmap_bars;
+	err = pdsc_start(pdsc);
+	if (err)
+		goto err_out_teardown;
 
 	/* publish devlink device */
 	err = pdsc_dl_register(pdsc);
@@ -251,6 +262,8 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 err_out:
+	pdsc_stop(pdsc);
+err_out_teardown:
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
 err_out_unmap_bars:
 	del_timer_sync(&pdsc->wdtimer);
@@ -299,6 +312,7 @@ static void pdsc_remove(struct pci_dev *pdev)
 	}
 
 	/* Device teardown */
+	pdsc_stop(pdsc);
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
 	pdsc_debugfs_del_dev(pdsc);
 	mutex_unlock(&pdsc->config_lock);
diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h
new file mode 100644
index 000000000000..b06d28d0f906
--- /dev/null
+++ b/include/linux/pds/pds_adminq.h
@@ -0,0 +1,641 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _PDS_CORE_ADMINQ_H_
+#define _PDS_CORE_ADMINQ_H_
+
+enum pds_core_adminq_flags {
+	PDS_AQ_FLAG_FASTPOLL	= BIT(1),	/* poll for completion at 1ms intervals */
+};
+
+/*
+ * enum pds_core_adminq_opcode - AdminQ command opcodes
+ * These commands are only processed on AdminQ, not available in devcmd
+ */
+enum pds_core_adminq_opcode {
+	PDS_AQ_CMD_NOP			= 0,
+
+	/* Client control */
+	PDS_AQ_CMD_CLIENT_REG		= 6,
+	PDS_AQ_CMD_CLIENT_UNREG		= 7,
+	PDS_AQ_CMD_CLIENT_CMD		= 8,
+
+	/* LIF commands */
+	PDS_AQ_CMD_LIF_IDENTIFY		= 20,
+	PDS_AQ_CMD_LIF_INIT		= 21,
+	PDS_AQ_CMD_LIF_RESET		= 22,
+	PDS_AQ_CMD_LIF_GETATTR		= 23,
+	PDS_AQ_CMD_LIF_SETATTR		= 24,
+	PDS_AQ_CMD_LIF_SETPHC		= 25,
+
+	PDS_AQ_CMD_RX_MODE_SET		= 30,
+	PDS_AQ_CMD_RX_FILTER_ADD	= 31,
+	PDS_AQ_CMD_RX_FILTER_DEL	= 32,
+
+	/* Queue commands */
+	PDS_AQ_CMD_Q_IDENTIFY		= 39,
+	PDS_AQ_CMD_Q_INIT		= 40,
+	PDS_AQ_CMD_Q_CONTROL		= 41,
+
+	/* RDMA commands */
+	PDS_AQ_CMD_RDMA_RESET_LIF	= 50,
+	PDS_AQ_CMD_RDMA_CREATE_EQ	= 51,
+	PDS_AQ_CMD_RDMA_CREATE_CQ	= 52,
+	PDS_AQ_CMD_RDMA_CREATE_ADMINQ	= 53,
+
+	/* SR/IOV commands */
+	PDS_AQ_CMD_VF_GETATTR		= 60,
+	PDS_AQ_CMD_VF_SETATTR		= 61,
+};
+
+/*
+ * enum pds_core_notifyq_opcode - NotifyQ event codes
+ */
+enum pds_core_notifyq_opcode {
+	PDS_EVENT_LINK_CHANGE		= 1,
+	PDS_EVENT_RESET			= 2,
+	PDS_EVENT_XCVR			= 5,
+	PDS_EVENT_CLIENT		= 6,
+};
+
+#define PDS_COMP_COLOR_MASK  0x80
+
+/**
+ * struct pds_core_notifyq_event - Generic event reporting structure
+ * @eid:   event number
+ * @ecode: event code
+ *
+ * This is the generic event report struct from which the other
+ * actual events will be formed.
+ */
+struct pds_core_notifyq_event {
+	__le64 eid;
+	__le16 ecode;
+};
+
+/**
+ * struct pds_core_link_change_event - Link change event notification
+ * @eid:		event number
+ * @ecode:		event code = PDS_EVENT_LINK_CHANGE
+ * @link_status:	link up/down, with error bits (enum pds_core_port_status)
+ * @link_speed:		speed of the network link
+ *
+ * Sent when the network link state changes between UP and DOWN
+ */
+struct pds_core_link_change_event {
+	__le64 eid;
+	__le16 ecode;
+	__le16 link_status;
+	__le32 link_speed;	/* units of 1Mbps: e.g. 10000 = 10Gbps */
+};
+
+/**
+ * struct pds_core_reset_event - Reset event notification
+ * @eid:		event number
+ * @ecode:		event code = PDS_EVENT_RESET
+ * @reset_code:		reset type
+ * @state:		0=pending, 1=complete, 2=error
+ *
+ * Sent when the NIC or some subsystem is going to be or
+ * has been reset.
+ */
+struct pds_core_reset_event {
+	__le64 eid;
+	__le16 ecode;
+	u8     reset_code;
+	u8     state;
+};
+
+/**
+ * struct pds_core_client_event - Client event notification
+ * @eid:		event number
+ * @ecode:		event code = PDS_EVENT_CLIENT
+ * @client_id:          client to sent event to
+ * @client_event:       wrapped event struct for the client
+ *
+ * Sent when an event needs to be passed on to a client
+ */
+struct pds_core_client_event {
+	__le64 eid;
+	__le16 ecode;
+	__le16 client_id;
+	u8     client_event[54];
+};
+
+/**
+ * struct pds_core_notifyq_cmd - Placeholder for building qcq
+ * @data:      anonymous field for building the qcq
+ */
+struct pds_core_notifyq_cmd {
+	__le32 data;	/* Not used but needed for qcq structure */
+};
+
+/*
+ * union pds_core_notifyq_comp - Overlay of notifyq event structures
+ */
+union pds_core_notifyq_comp {
+	struct {
+		__le64 eid;
+		__le16 ecode;
+	};
+	struct pds_core_notifyq_event     event;
+	struct pds_core_link_change_event link_change;
+	struct pds_core_reset_event       reset;
+	u8     data[64];
+};
+
+/**
+ * struct pds_core_client_reg_cmd - Register a new client with DSC
+ * @opcode:         opcode PDS_AQ_CMD_CLIENT_REG
+ * @rsvd:           word boundary padding
+ * @devname:        text name of client device
+ * @vif_type:       what type of device (enum pds_core_vif_types)
+ *
+ * Tell the DSC of the new client, and receive a client_id from DSC.
+ */
+struct pds_core_client_reg_cmd {
+	u8     opcode;
+	u8     rsvd[3];
+	char   devname[32];
+	u8     vif_type;
+};
+
+/**
+ * struct pds_core_client_reg_comp - Client registration completion
+ * @status:     Status of the command (enum pdc_core_status_code)
+ * @rsvd:       Word boundary padding
+ * @comp_index: Index in the descriptor ring for which this is the completion
+ * @client_id:  New id assigned by DSC
+ * @rsvd1:      Word boundary padding
+ * @color:      Color bit
+ */
+
+struct pds_core_client_reg_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	__le16 client_id;
+	u8     rsvd1[9];
+	u8     color;
+};
+
+/**
+ * struct pds_core_client_unreg_cmd - Unregister a client from DSC
+ * @opcode:     opcode PDS_AQ_CMD_CLIENT_UNREG
+ * @rsvd:       word boundary padding
+ * @client_id:  id of client being removed
+ *
+ * Tell the DSC this client is going away and remove its context
+ * This uses the generic completion.
+ */
+struct pds_core_client_unreg_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 client_id;
+};
+
+/**
+ * struct pds_core_client_request_cmd - Pass along a wrapped client AdminQ cmd
+ * @opcode:     opcode PDS_AQ_CMD_CLIENT_CMD
+ * @rsvd:       word boundary padding
+ * @client_id:  id of client being removed
+ * @client_cmd: the wrapped clinet command
+ *
+ * Proxy post an adminq command for the client.
+ * This uses the generic completion.
+ */
+struct pds_core_client_request_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 client_id;
+	u8     client_cmd[60];
+};
+
+#define PDS_CORE_MAX_FRAGS		16
+
+#define PDS_CORE_QCQ_F_INITED		BIT(0)
+#define PDS_CORE_QCQ_F_SG		BIT(1)
+#define PDS_CORE_QCQ_F_INTR		BIT(2)
+#define PDS_CORE_QCQ_F_TX_STATS		BIT(3)
+#define PDS_CORE_QCQ_F_RX_STATS		BIT(4)
+#define PDS_CORE_QCQ_F_NOTIFYQ		BIT(5)
+#define PDS_CORE_QCQ_F_CMB_RINGS	BIT(6)
+#define PDS_CORE_QCQ_F_CORE		BIT(7)
+
+enum pds_core_lif_type {
+	PDS_CORE_LIF_TYPE_DEFAULT = 0,
+};
+
+/**
+ * union pds_core_lif_config - LIF configuration
+ * @state:	    LIF state (enum pds_core_lif_state)
+ * @rsvd:           Word boundary padding
+ * @name:	    LIF name
+ * @rsvd2:          Word boundary padding
+ * @features:	    LIF features active (enum pds_core_hw_features)
+ * @queue_count:    Queue counts per queue-type
+ * @words:          Full union buffer size
+ */
+union pds_core_lif_config {
+	struct {
+		u8     state;
+		u8     rsvd[3];
+		char   name[PDS_CORE_IFNAMSIZ];
+		u8     rsvd2[12];
+		__le64 features;
+		__le32 queue_count[PDS_CORE_QTYPE_MAX];
+	} __packed;
+	__le32 words[64];
+};
+
+/**
+ * struct pds_core_lif_status - LIF status register
+ * @eid:	     most recent NotifyQ event id
+ * @rsvd:            full struct size
+ */
+struct pds_core_lif_status {
+	__le64 eid;
+	u8     rsvd[56];
+};
+
+/**
+ * struct pds_core_lif_info - LIF info structure
+ * @config:	LIF configuration structure
+ * @status:	LIF status structure
+ */
+struct pds_core_lif_info {
+	union pds_core_lif_config config;
+	struct pds_core_lif_status status;
+};
+
+/**
+ * struct pds_core_lif_identity - LIF identity information (type-specific)
+ * @features:		LIF features (see enum pds_core_hw_features)
+ * @version:		Identify structure version
+ * @hw_index:		LIF hardware index
+ * @rsvd:		Word boundary padding
+ * @max_nb_sessions:	Maximum number of sessions supported
+ * @rsvd2:		buffer padding
+ * @config:		LIF config struct with features, q counts
+ */
+struct pds_core_lif_identity {
+	__le64 features;
+	u8     version;
+	u8     hw_index;
+	u8     rsvd[2];
+	__le32 max_nb_sessions;
+	u8     rsvd2[120];
+	union pds_core_lif_config config;
+};
+
+/**
+ * struct pds_core_lif_identify_cmd - Get LIF identity info command
+ * @opcode:	Opcode PDS_AQ_CMD_LIF_IDENTIFY
+ * @type:	LIF type (enum pds_core_lif_type)
+ * @client_id:	Client identifier
+ * @ver:	Version of identify returned by device
+ * @rsvd:       Word boundary padding
+ * @ident_pa:	DMA address to receive identity info (struct pds_core_lif_identity)
+ *
+ * Firmware will copy LIF identity data (struct pds_core_lif_identity)
+ * into the buffer address given.
+ */
+struct pds_core_lif_identify_cmd {
+	u8     opcode;
+	u8     type;
+	__le16 client_id;
+	u8     ver;
+	u8     rsvd[3];
+	__le64 ident_pa;
+};
+
+/**
+ * struct pds_core_lif_identify_comp - LIF identify command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @ver:	Version of identify returned by device
+ * @bytes:	Bytes copied into the buffer
+ * @rsvd:       Word boundary padding
+ * @color:      Color bit
+ */
+struct pds_core_lif_identify_comp {
+	u8     status;
+	u8     ver;
+	__le16 bytes;
+	u8     rsvd[11];
+	u8     color;
+};
+
+/**
+ * struct pds_core_lif_init_cmd - LIF init command
+ * @opcode:	Opcode PDS_AQ_CMD_LIF_INIT
+ * @type:	LIF type (enum pds_core_lif_type)
+ * @client_id:	Client identifier
+ * @rsvd:       Word boundary padding
+ * @info_pa:	Destination address for LIF info (struct pds_core_lif_info)
+ */
+struct pds_core_lif_init_cmd {
+	u8     opcode;
+	u8     type;
+	__le16 client_id;
+	__le32 rsvd;
+	__le64 info_pa;
+};
+
+/**
+ * struct pds_core_lif_init_comp - LIF init command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:       Word boundary padding
+ * @hw_index:	Hardware index of the initialized LIF
+ * @rsvd1:      Word boundary padding
+ * @color:      Color bit
+ */
+struct pds_core_lif_init_comp {
+	u8 status;
+	u8 rsvd;
+	__le16 hw_index;
+	u8     rsvd1[11];
+	u8     color;
+};
+
+/**
+ * struct pds_core_lif_reset_cmd - LIF reset command
+ * Will reset only the specified LIF.
+ * @opcode:	Opcode PDS_AQ_CMD_LIF_RESET
+ * @rsvd:       Word boundary padding
+ * @client_id:	Client identifier
+ */
+struct pds_core_lif_reset_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 client_id;
+};
+
+/**
+ * enum pds_core_lif_attr - List of LIF attributes
+ * @PDS_CORE_LIF_ATTR_STATE:		LIF state attribute
+ * @PDS_CORE_LIF_ATTR_NAME:		LIF name attribute
+ * @PDS_CORE_LIF_ATTR_FEATURES:		LIF features attribute
+ * @PDS_CORE_LIF_ATTR_STATS_CTRL:	LIF statistics control attribute
+ */
+enum pds_core_lif_attr {
+	PDS_CORE_LIF_ATTR_STATE		= 0,
+	PDS_CORE_LIF_ATTR_NAME		= 1,
+	PDS_CORE_LIF_ATTR_FEATURES	= 4,
+	PDS_CORE_LIF_ATTR_STATS_CTRL	= 6,
+};
+
+/**
+ * struct pds_core_lif_setattr_cmd - Set LIF attributes on the NIC
+ * @opcode:	Opcode PDS_AQ_CMD_LIF_SETATTR
+ * @attr:	Attribute type (enum pds_core_lif_attr)
+ * @client_id:	Client identifier
+ * @state:	LIF state (enum pds_core_lif_state)
+ * @name:	The name string, 0 terminated
+ * @features:	Features (enum pds_core_hw_features)
+ * @stats_ctl:	Stats control commands (enum pds_core_stats_ctl_cmd)
+ * @rsvd:       Command Buffer padding
+ */
+struct pds_core_lif_setattr_cmd {
+	u8     opcode;
+	u8     attr;
+	__le16 client_id;
+	union {
+		u8      state;
+		char    name[PDS_CORE_IFNAMSIZ];
+		__le64  features;
+		u8      stats_ctl;
+		u8      rsvd[60];
+	} __packed;
+};
+
+/**
+ * struct pds_core_lif_setattr_comp - LIF set attr command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:       Word boundary padding
+ * @comp_index: Index in the descriptor ring for which this is the completion
+ * @features:	Features (enum pds_core_hw_features)
+ * @rsvd2:      Word boundary padding
+ * @color:	Color bit
+ */
+struct pds_core_lif_setattr_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	union {
+		__le64  features;
+		u8      rsvd2[11];
+	} __packed;
+	u8     color;
+};
+
+/**
+ * struct pds_core_lif_getattr_cmd - Get LIF attributes from the NIC
+ * @opcode:	Opcode PDS_AQ_CMD_LIF_GETATTR
+ * @attr:	Attribute type (enum pds_core_lif_attr)
+ * @client_id:	Client identifier
+ */
+struct pds_core_lif_getattr_cmd {
+	u8     opcode;
+	u8     attr;
+	__le16 client_id;
+};
+
+/**
+ * struct pds_core_lif_getattr_comp - LIF get attr command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:       Word boundary padding
+ * @comp_index: Index in the descriptor ring for which this is the completion
+ * @state:	LIF state (enum pds_core_lif_state)
+ * @name:	LIF name string, 0 terminated
+ * @features:	Features (enum pds_core_hw_features)
+ * @rsvd2:      Word boundary padding
+ * @color:	Color bit
+ */
+struct pds_core_lif_getattr_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	union {
+		u8      state;
+		__le64  features;
+		u8      rsvd2[11];
+	} __packed;
+	u8     color;
+};
+
+/**
+ * union pds_core_q_identity - Queue identity information
+ * @version:	Queue type version that can be used with FW
+ * @supported:	Bitfield of queue versions, first bit = ver 0
+ * @rsvd:       Word boundary padding
+ * @features:	Queue features
+ * @desc_sz:	Descriptor size
+ * @comp_sz:	Completion descriptor size
+ * @rsvd2:      Word boundary padding
+ */
+struct pds_core_q_identity {
+	u8      version;
+	u8      supported;
+	u8      rsvd[6];
+#define PDS_CORE_QIDENT_F_CQ	0x01	/* queue has completion ring */
+	__le64  features;
+	__le16  desc_sz;
+	__le16  comp_sz;
+	u8      rsvd2[6];
+};
+
+/**
+ * struct pds_core_q_identify_cmd - queue identify command
+ * @opcode:	Opcode PDS_AQ_CMD_Q_IDENTIFY
+ * @type:	Logical queue type (enum pds_core_logical_qtype)
+ * @client_id:	Client identifier
+ * @ver:	Highest queue type version that the driver supports
+ * @rsvd:       Word boundary padding
+ * @ident_pa:   DMA address to receive the data (struct pds_core_q_identity)
+ */
+struct pds_core_q_identify_cmd {
+	u8     opcode;
+	u8     type;
+	__le16 client_id;
+	u8     ver;
+	u8     rsvd[3];
+	__le64 ident_pa;
+};
+
+/**
+ * struct pds_core_q_identify_comp - queue identify command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:       Word boundary padding
+ * @comp_index:	Index in the descriptor ring for which this is the completion
+ * @ver:	Queue type version that can be used with FW
+ * @rsvd1:      Word boundary padding
+ * @color:      Color bit
+ */
+struct pds_core_q_identify_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	u8     ver;
+	u8     rsvd1[10];
+	u8     color;
+};
+
+/**
+ * struct pds_core_q_init_cmd - Queue init command
+ * @opcode:	  Opcode PDS_AQ_CMD_Q_INIT
+ * @type:	  Logical queue type
+ * @client_id:	  Client identifier
+ * @ver:	  Queue type version
+ * @rsvd:         Word boundary padding
+ * @index:	  (LIF, qtype) relative admin queue index
+ * @intr_index:	  Interrupt control register index, or Event queue index
+ * @pid:	  Process ID
+ * @flags:
+ *    IRQ:	  Interrupt requested on completion
+ *    ENA:	  Enable the queue.  If ENA=0 the queue is initialized
+ *		  but remains disabled, to be later enabled with the
+ *		  Queue Enable command. If ENA=1, then queue is
+ *		  initialized and then enabled.
+ * @cos:	  Class of service for this queue
+ * @ring_size:	  Queue ring size, encoded as a log2(size), in
+ *		  number of descriptors.  The actual ring size is
+ *		  (1 << ring_size).  For example, to select a ring size
+ *		  of 64 descriptors write ring_size = 6. The minimum
+ *		  ring_size value is 2 for a ring of 4 descriptors.
+ *		  The maximum ring_size value is 12 for a ring of 4k
+ *		  descriptors. Values of ring_size <2 and >12 are
+ *		  reserved.
+ * @ring_base:	  Queue ring base address
+ * @cq_ring_base: Completion queue ring base address
+ */
+struct pds_core_q_init_cmd {
+	u8     opcode;
+	u8     type;
+	__le16 client_id;
+	u8     ver;
+	u8     rsvd[3];
+	__le32 index;
+	__le16 pid;
+	__le16 intr_index;
+	__le16 flags;
+#define PDS_CORE_QINIT_F_IRQ	0x01	/* Request interrupt on completion */
+#define PDS_CORE_QINIT_F_ENA	0x02	/* Enable the queue */
+	u8     cos;
+#define PDS_CORE_QSIZE_MIN_LG2	2
+#define PDS_CORE_QSIZE_MAX_LG2	12
+	u8     ring_size;
+	__le64 ring_base;
+	__le64 cq_ring_base;
+} __packed;
+
+/**
+ * struct pds_core_q_init_comp - Queue init command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:       Word boundary padding
+ * @comp_index:	Index in the descriptor ring for which this is the completion
+ * @hw_index:	Hardware Queue ID
+ * @hw_type:	Hardware Queue type
+ * @rsvd2:      Word boundary padding
+ * @color:	Color
+ */
+struct pds_core_q_init_comp {
+	u8     status;
+	u8     rsvd;
+	__le16 comp_index;
+	__le32 hw_index;
+	u8     hw_type;
+	u8     rsvd2[6];
+	u8     color;
+};
+
+union pds_core_adminq_cmd {
+	u8     opcode;
+	u8     bytes[64];
+
+	struct pds_core_client_reg_cmd     client_reg;
+	struct pds_core_client_unreg_cmd   client_unreg;
+	struct pds_core_client_request_cmd client_request;
+
+	struct pds_core_lif_identify_cmd  lif_ident;
+	struct pds_core_lif_init_cmd      lif_init;
+	struct pds_core_lif_reset_cmd     lif_reset;
+	struct pds_core_lif_setattr_cmd   lif_setattr;
+	struct pds_core_lif_getattr_cmd   lif_getattr;
+
+	struct pds_core_q_identify_cmd    q_ident;
+	struct pds_core_q_init_cmd        q_init;
+};
+
+union pds_core_adminq_comp {
+	struct {
+		u8     status;
+		u8     rsvd;
+		__le16 comp_index;
+		u8     rsvd2[11];
+		u8     color;
+	};
+	u32    words[4];
+
+	struct pds_core_client_reg_comp   client_reg;
+
+	struct pds_core_lif_identify_comp lif_ident;
+	struct pds_core_lif_init_comp     lif_init;
+	struct pds_core_lif_setattr_comp  lif_setattr;
+	struct pds_core_lif_getattr_comp  lif_getattr;
+
+	struct pds_core_q_identify_comp   q_ident;
+	struct pds_core_q_init_comp       q_init;
+};
+
+#ifndef __CHECKER__
+static_assert(sizeof(union pds_core_adminq_cmd) == 64);
+static_assert(sizeof(union pds_core_adminq_comp) == 16);
+static_assert(sizeof(union pds_core_notifyq_comp) == 64);
+#endif /* __CHECKER__ */
+
+static inline u8 pdsc_color_match(u8 color, u8 done_color)
+{
+	return (!!(color & PDS_COMP_COLOR_MASK)) == done_color;
+}
+
+#endif /* _PDS_CORE_ADMINQ_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 05/19] pds_core: Add adminq processing and commands
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (3 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 04/19] pds_core: set up device and adminq Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink Shannon Nelson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Add the service routines for submitting and processing
the adminq messages and for handling notifyq events.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/Makefile   |   1 +
 .../net/ethernet/pensando/pds_core/adminq.c   | 282 ++++++++++++++++++
 drivers/net/ethernet/pensando/pds_core/core.c |  11 -
 drivers/net/ethernet/pensando/pds_core/core.h |   6 +
 include/linux/pds/pds_adminq.h                |   2 +
 5 files changed, 291 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/pensando/pds_core/adminq.c

diff --git a/drivers/net/ethernet/pensando/pds_core/Makefile b/drivers/net/ethernet/pensando/pds_core/Makefile
index 446054206b6a..c7a722f7d9b8 100644
--- a/drivers/net/ethernet/pensando/pds_core/Makefile
+++ b/drivers/net/ethernet/pensando/pds_core/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_PDS_CORE) := pds_core.o
 pds_core-y := main.o \
 	      devlink.o \
 	      dev.o \
+	      adminq.o \
 	      core.o
 
 pds_core-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/net/ethernet/pensando/pds_core/adminq.c b/drivers/net/ethernet/pensando/pds_core/adminq.c
new file mode 100644
index 000000000000..ba9e84a7ca92
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/adminq.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+
+#include "core.h"
+
+#include <linux/pds/pds_adminq.h>
+
+struct pdsc_wait_context {
+	struct pdsc_qcq *qcq;
+	struct completion wait_completion;
+};
+
+static int pdsc_process_notifyq(struct pdsc_qcq *qcq)
+{
+	union pds_core_notifyq_comp *comp;
+	struct pdsc *pdsc = qcq->pdsc;
+	struct pdsc_cq *cq = &qcq->cq;
+	struct pdsc_cq_info *cq_info;
+	int nq_work = 0;
+	u64 eid;
+
+	cq_info = &cq->info[cq->tail_idx];
+	comp = cq_info->comp;
+	eid = le64_to_cpu(comp->event.eid);
+	while (eid > pdsc->last_eid) {
+		u16 ecode = le16_to_cpu(comp->event.ecode);
+
+		switch (ecode) {
+		case PDS_EVENT_LINK_CHANGE:
+			dev_info(pdsc->dev, "NotifyQ LINK_CHANGE ecode %d eid %lld\n",
+				 ecode, eid);
+			break;
+
+		case PDS_EVENT_RESET:
+			dev_info(pdsc->dev, "NotifyQ RESET ecode %d eid %lld\n",
+				 ecode, eid);
+			break;
+
+		case PDS_EVENT_XCVR:
+			dev_info(pdsc->dev, "NotifyQ XCVR ecode %d eid %lld\n",
+				 ecode, eid);
+			break;
+
+		default:
+			dev_info(pdsc->dev, "NotifyQ ecode %d eid %lld\n",
+				 ecode, eid);
+			break;
+		}
+
+		pdsc->last_eid = eid;
+		cq->tail_idx = (cq->tail_idx + 1) & (cq->num_descs - 1);
+		cq_info = &cq->info[cq->tail_idx];
+		comp = cq_info->comp;
+		eid = le64_to_cpu(comp->event.eid);
+
+		nq_work++;
+	}
+
+	qcq->accum_work += nq_work;
+
+	return nq_work;
+}
+
+void pdsc_process_adminq(struct pdsc_qcq *qcq)
+{
+	union pds_core_adminq_comp *comp;
+	struct pdsc_queue *q = &qcq->q;
+	struct pdsc *pdsc = qcq->pdsc;
+	struct pdsc_cq *cq = &qcq->cq;
+	struct pdsc_q_info *q_info;
+	unsigned long irqflags;
+	int nq_work = 0;
+	int aq_work = 0;
+	int credits;
+	u32 index;
+
+	/* Check for NotifyQ event */
+	nq_work = pdsc_process_notifyq(&pdsc->notifyqcq);
+
+	/* Check for empty queue, which can happen if the interrupt was
+	 * for a NotifyQ event and there are no new AdminQ completions.
+	 */
+	if (q->tail_idx == q->head_idx)
+		goto credits;
+
+	/* Find the first completion to clean,
+	 * run the callback in the related q_info,
+	 * and continue while we still match done color
+	 */
+	spin_lock_irqsave(&pdsc->adminq_lock, irqflags);
+	comp = cq->info[cq->tail_idx].comp;
+	while (pdsc_color_match(comp->color, cq->done_color)) {
+		q_info = &q->info[q->tail_idx];
+		index = q->tail_idx;
+		q->tail_idx = (q->tail_idx + 1) & (q->num_descs - 1);
+
+		/* Copy out the completion data */
+		memcpy(q_info->dest, comp, sizeof(*comp));
+
+		complete_all(&q_info->wc->wait_completion);
+
+		if (cq->tail_idx == cq->num_descs - 1)
+			cq->done_color = !cq->done_color;
+		cq->tail_idx = (cq->tail_idx + 1) & (cq->num_descs - 1);
+		comp = cq->info[cq->tail_idx].comp;
+
+		aq_work++;
+	}
+	spin_unlock_irqrestore(&pdsc->adminq_lock, irqflags);
+
+	qcq->accum_work += aq_work;
+
+credits:
+	/* Return the interrupt credits, one for each completion */
+	credits = nq_work + aq_work;
+	if (credits)
+		pds_core_intr_credits(&pdsc->intr_ctrl[qcq->intx],
+				      credits,
+				      PDS_CORE_INTR_CRED_REARM);
+}
+
+void pdsc_work_thread(struct work_struct *work)
+{
+	struct pdsc_qcq *qcq = container_of(work, struct pdsc_qcq, work);
+
+	pdsc_process_adminq(qcq);
+}
+
+irqreturn_t pdsc_adminq_isr(int irq, void *data)
+{
+	struct pdsc_qcq *qcq = data;
+	struct pdsc *pdsc = qcq->pdsc;
+
+	/* Don't process AdminQ when shutting down */
+	if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER)) {
+		pr_err("%s: called while PDSC_S_STOPPING_DRIVER\n", __func__);
+		return IRQ_HANDLED;
+	}
+
+	queue_work(pdsc->wq, &qcq->work);
+	pds_core_intr_mask(&pdsc->intr_ctrl[irq], PDS_CORE_INTR_MASK_CLEAR);
+
+	return IRQ_HANDLED;
+}
+
+static int __pdsc_adminq_post(struct pdsc *pdsc,
+			      struct pdsc_qcq *qcq,
+			      union pds_core_adminq_cmd *cmd,
+			      union pds_core_adminq_comp *comp,
+			      struct pdsc_wait_context *wc)
+{
+	struct pdsc_queue *q = &qcq->q;
+	struct pdsc_q_info *q_info;
+	unsigned long irqflags;
+	unsigned int avail;
+	int ret = 0;
+	int index;
+
+	spin_lock_irqsave(&pdsc->adminq_lock, irqflags);
+
+	/* Check for space in the queue */
+	avail = q->tail_idx;
+	if (q->head_idx >= avail)
+		avail += q->num_descs - q->head_idx - 1;
+	else
+		avail -= q->head_idx + 1;
+	if (!avail) {
+		ret = -ENOSPC;
+		goto err_out;
+	}
+
+	/* Check that the FW is running */
+	if (!pdsc_is_fw_running(pdsc)) {
+		u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
+
+		dev_info(pdsc->dev, "%s: post failed - fw not running %#02x:\n",
+			 __func__, fw_status);
+		ret = -ENXIO;
+
+		goto err_out;
+	}
+
+	/* Post the request */
+	index = q->head_idx;
+	q_info = &q->info[index];
+	q_info->wc = wc;
+	q_info->dest = comp;
+	memcpy(q_info->desc, cmd, sizeof(*cmd));
+
+	dev_dbg(pdsc->dev, "head_idx %d tail_idx %d\n", q->head_idx, q->tail_idx);
+	dev_dbg(pdsc->dev, "post admin queue command:\n");
+	dynamic_hex_dump("cmd ", DUMP_PREFIX_OFFSET, 16, 1,
+			 cmd, sizeof(*cmd), true);
+
+	q->head_idx = (q->head_idx + 1) & (q->num_descs - 1);
+
+	pds_core_dbell_ring(pdsc->kern_dbpage, q->hw_type, q->dbval | q->head_idx);
+	ret = index;
+
+err_out:
+	spin_unlock_irqrestore(&pdsc->adminq_lock, irqflags);
+	return ret;
+}
+
+int pdsc_adminq_post(struct pdsc *pdsc,
+		     struct pdsc_qcq *qcq,
+		     union pds_core_adminq_cmd *cmd,
+		     union pds_core_adminq_comp *comp,
+		     bool fast_poll)
+{
+	struct pdsc_wait_context wc = {
+		.wait_completion = COMPLETION_INITIALIZER_ONSTACK(wc.wait_completion),
+		.qcq = qcq,
+	};
+	unsigned long poll_interval = 1;
+	unsigned long time_limit;
+	unsigned long time_start;
+	unsigned long time_done;
+	unsigned long remaining;
+	int err = 0;
+	int index;
+
+	index = __pdsc_adminq_post(pdsc, qcq, cmd, comp, &wc);
+	if (index < 0) {
+		err = index;
+		goto out;
+	}
+
+	time_start = jiffies;
+	time_limit = time_start + HZ * pdsc->devcmd_timeout;
+	do {
+		/* Timeslice the actual wait to catch IO errors etc early */
+		remaining = wait_for_completion_timeout(&wc.wait_completion,
+							msecs_to_jiffies(poll_interval));
+		if (remaining)
+			break;
+
+		if (!pdsc_is_fw_running(pdsc)) {
+			u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
+
+			dev_dbg(pdsc->dev, "%s: post wait failed - fw not running %#02x:\n",
+				__func__, fw_status);
+			err = -ENXIO;
+			break;
+		}
+
+		/* When fast_poll is not requested, prevent aggressive polling
+		 * on failures due to timeouts by doing exponential back off.
+		 */
+		if (!fast_poll && poll_interval < PDSC_ADMINQ_MAX_POLL_INTERVAL)
+			poll_interval <<= 1;
+	} while (time_before(jiffies, time_limit));
+	time_done = jiffies;
+	dev_dbg(pdsc->dev, "%s: elapsed %d msecs\n",
+		__func__, jiffies_to_msecs(time_done - time_start));
+
+	/* Check the results */
+	if (time_after_eq(time_done, time_limit))
+		err = -ETIMEDOUT;
+
+	dev_dbg(pdsc->dev, "read admin queue completion idx %d:\n", index);
+	dynamic_hex_dump("comp ", DUMP_PREFIX_OFFSET, 16, 1,
+			 comp, sizeof(*comp), true);
+
+	if (remaining && comp->status)
+		err = pdsc_err_to_errno(comp->status);
+
+out:
+	if (err) {
+		dev_dbg(pdsc->dev, "%s: opcode %d status %d err %pe\n",
+			__func__, cmd->opcode, comp->status, ERR_PTR(err));
+		if (err == -ENXIO || err == -ETIMEDOUT)
+			pdsc_queue_health_check(pdsc);
+	}
+
+	return err;
+}
diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
index 507f718bc8ab..e2017cee8284 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.c
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -12,17 +12,6 @@
 
 #include <linux/pds/pds_adminq.h>
 
-void pdsc_work_thread(struct work_struct *work)
-{
-	/* stub */
-}
-
-irqreturn_t pdsc_adminq_isr(int irq, void *data)
-{
-	/* stub */
-	return IRQ_HANDLED;
-}
-
 void pdsc_intr_free(struct pdsc *pdsc, int index)
 {
 	struct pdsc_intr_info *intr_info;
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 6b816b7fb193..87b221aa7b44 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -263,6 +263,12 @@ int pds_devcmd_vf_start(struct pdsc *pdsc);
 int pdsc_dev_reinit(struct pdsc *pdsc);
 int pdsc_dev_init(struct pdsc *pdsc);
 
+int pdsc_adminq_post(struct pdsc *pdsc,
+		     struct pdsc_qcq *qcq,
+		     union pds_core_adminq_cmd *cmd,
+		     union pds_core_adminq_comp *comp,
+		     bool fast_poll);
+
 int pdsc_intr_alloc(struct pdsc *pdsc, char *name,
 		    irq_handler_t handler, void *data);
 void pdsc_intr_free(struct pdsc *pdsc, int index);
diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h
index b06d28d0f906..19070099eb35 100644
--- a/include/linux/pds/pds_adminq.h
+++ b/include/linux/pds/pds_adminq.h
@@ -4,6 +4,8 @@
 #ifndef _PDS_CORE_ADMINQ_H_
 #define _PDS_CORE_ADMINQ_H_
 
+#define PDSC_ADMINQ_MAX_POLL_INTERVAL	256
+
 enum pds_core_adminq_flags {
 	PDS_AQ_FLAG_FASTPOLL	= BIT(1),	/* poll for completion at 1ms intervals */
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (4 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 05/19] pds_core: Add adminq processing and commands Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-28 18:27   ` Jakub Kicinski
  2022-11-18 22:56 ` [RFC PATCH net-next 07/19] pds_core: set up the VIF definitions and defaults Shannon Nelson
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Add in the support for doing firmware updates, and for selecting
the next firmware image to boot on, and tie them into the
devlink flash and parameter handling.  The FW flash is the same
as in the ionic driver.  However, this device has the ability
to report what is in the firmware slots on the device and
allows you to select the slot to use on the next device boot.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/Makefile   |   3 +-
 drivers/net/ethernet/pensando/pds_core/core.h |   8 +
 .../net/ethernet/pensando/pds_core/devlink.c  | 103 ++++++++++
 drivers/net/ethernet/pensando/pds_core/fw.c   | 192 ++++++++++++++++++
 4 files changed, 305 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/pensando/pds_core/fw.c

diff --git a/drivers/net/ethernet/pensando/pds_core/Makefile b/drivers/net/ethernet/pensando/pds_core/Makefile
index c7a722f7d9b8..06bd3da8c38b 100644
--- a/drivers/net/ethernet/pensando/pds_core/Makefile
+++ b/drivers/net/ethernet/pensando/pds_core/Makefile
@@ -7,6 +7,7 @@ pds_core-y := main.o \
 	      devlink.o \
 	      dev.o \
 	      adminq.o \
-	      core.o
+	      core.o \
+	      fw.o
 
 pds_core-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 87b221aa7b44..687e1debd079 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -123,6 +123,11 @@ struct pdsc_qcq {
 	struct dentry *dentry;
 };
 
+enum pdsc_devlink_param_id {
+	PDSC_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	PDSC_DEVLINK_PARAM_ID_FW_BOOT,
+};
+
 /* No state flags set means we are in a steady running state */
 enum pdsc_state_flags {
 	PDSC_S_FW_DEAD,		    /* fw stopped, waiting for startup or recovery */
@@ -287,4 +292,7 @@ void pdsc_process_adminq(struct pdsc_qcq *qcq);
 void pdsc_work_thread(struct work_struct *work);
 irqreturn_t pdsc_adminq_isr(int irq, void *data);
 
+int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
+			 struct netlink_ext_ack *extack);
+
 #endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/pensando/pds_core/devlink.c b/drivers/net/ethernet/pensando/pds_core/devlink.c
index 42cf17229ace..0568e8b7391c 100644
--- a/drivers/net/ethernet/pensando/pds_core/devlink.c
+++ b/drivers/net/ethernet/pensando/pds_core/devlink.c
@@ -8,6 +8,100 @@
 
 #include "core.h"
 
+static char *slot_labels[] = { "fw.gold", "fw.mainfwa", "fw.mainfwb" };
+
+static int pdsc_dl_fw_boot_get(struct devlink *dl, u32 id,
+			       struct devlink_param_gset_ctx *ctx)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	union pds_core_dev_cmd cmd = {
+		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
+		.fw_control.oper = PDS_CORE_FW_GET_BOOT,
+	};
+	union pds_core_dev_comp comp;
+	int err;
+
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+	if (err) {
+		if (err == -EIO) {
+			snprintf(ctx->val.vstr, sizeof(ctx->val.vstr), "(unknown)");
+			return 0;
+		} else {
+			return err;
+		}
+	}
+
+	if (comp.fw_control.slot >= ARRAY_SIZE(slot_labels))
+		snprintf(ctx->val.vstr, sizeof(ctx->val.vstr),
+			 "fw.slot%02d", comp.fw_control.slot);
+	else
+		snprintf(ctx->val.vstr, sizeof(ctx->val.vstr),
+			 "%s", slot_labels[comp.fw_control.slot]);
+
+	return 0;
+}
+
+static int pdsc_dl_fw_boot_set(struct devlink *dl, u32 id,
+			       struct devlink_param_gset_ctx *ctx)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	union pds_core_dev_cmd cmd = {
+		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
+		.fw_control.oper = PDS_CORE_FW_SET_BOOT,
+	};
+	union pds_core_dev_comp comp;
+	enum pds_core_fw_slot slot;
+	int timeout;
+
+	for (slot = 0; slot < ARRAY_SIZE(slot_labels); slot++)
+		if (!strcmp(ctx->val.vstr, slot_labels[slot]))
+			break;
+
+	if (slot >= ARRAY_SIZE(slot_labels))
+		return -EINVAL;
+
+	cmd.fw_control.slot = slot;
+
+	/* This is known to be a longer running command, so be sure
+	 * to use a larger timeout on the command than usual
+	 */
+#define PDSC_SET_BOOT_TIMEOUT	10
+	timeout = max_t(int, PDSC_SET_BOOT_TIMEOUT, pdsc->devcmd_timeout);
+	return pdsc_devcmd(pdsc, &cmd, &comp, timeout);
+}
+
+static int pdsc_dl_fw_boot_validate(struct devlink *dl, u32 id,
+				    union devlink_param_value val,
+				    struct netlink_ext_ack *extack)
+{
+	enum pds_core_fw_slot slot;
+
+	for (slot = 0; slot < ARRAY_SIZE(slot_labels); slot++)
+		if (!strcmp(val.vstr, slot_labels[slot]))
+			return 0;
+
+	return -EINVAL;
+}
+
+static const struct devlink_param pdsc_dl_params[] = {
+	DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_FW_BOOT,
+			     "boot_fw",
+			     DEVLINK_PARAM_TYPE_STRING,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     pdsc_dl_fw_boot_get,
+			     pdsc_dl_fw_boot_set,
+			     pdsc_dl_fw_boot_validate),
+};
+
+static int pdsc_dl_flash_update(struct devlink *dl,
+				struct devlink_flash_update_params *params,
+				struct netlink_ext_ack *extack)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+
+	return pdsc_firmware_update(pdsc, params->fw, extack);
+}
+
 static int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
 			    struct netlink_ext_ack *extack)
 {
@@ -71,6 +165,7 @@ static int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
 
 static const struct devlink_ops pdsc_dl_ops = {
 	.info_get	= pdsc_dl_info_get,
+	.flash_update	= pdsc_dl_flash_update,
 };
 
 struct pdsc *pdsc_dl_alloc(struct device *dev)
@@ -94,6 +189,12 @@ void pdsc_dl_free(struct pdsc *pdsc)
 int pdsc_dl_register(struct pdsc *pdsc)
 {
 	struct devlink *dl = priv_to_devlink(pdsc);
+	int err;
+
+	err = devlink_params_register(dl, pdsc_dl_params,
+				      ARRAY_SIZE(pdsc_dl_params));
+	if (err)
+		return err;
 
 	devlink_register(dl);
 
@@ -105,4 +206,6 @@ void pdsc_dl_unregister(struct pdsc *pdsc)
 	struct devlink *dl = priv_to_devlink(pdsc);
 
 	devlink_unregister(dl);
+	devlink_params_unregister(dl, pdsc_dl_params,
+				  ARRAY_SIZE(pdsc_dl_params));
 }
diff --git a/drivers/net/ethernet/pensando/pds_core/fw.c b/drivers/net/ethernet/pensando/pds_core/fw.c
new file mode 100644
index 000000000000..3c64deef5549
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/fw.c
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/firmware.h>
+
+#include "core.h"
+
+/* The worst case wait for the install activity is about 25 minutes when
+ * installing a new CPLD, which is very seldom.  Normal is about 30-35
+ * seconds.  Since the driver can't tell if a CPLD update will happen we
+ * set the timeout for the ugly case.
+ */
+#define PDSC_FW_INSTALL_TIMEOUT	(25 * 60)
+#define PDSC_FW_SELECT_TIMEOUT	30
+
+/* Number of periodic log updates during fw file download */
+#define PDSC_FW_INTERVAL_FRACTION	32
+
+static int pdsc_devcmd_firmware_download(struct pdsc *pdsc, u64 addr,
+					 u32 offset, u32 length)
+{
+	union pds_core_dev_cmd cmd = {
+		.fw_download.opcode = PDS_CORE_CMD_FW_DOWNLOAD,
+		.fw_download.offset = cpu_to_le32(offset),
+		.fw_download.addr = cpu_to_le64(addr),
+		.fw_download.length = cpu_to_le32(length),
+	};
+	union pds_core_dev_comp comp;
+
+	return pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+static int pdsc_devcmd_firmware_install(struct pdsc *pdsc)
+{
+	union pds_core_dev_cmd cmd = {
+		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
+		.fw_control.oper = PDS_CORE_FW_INSTALL_ASYNC
+	};
+	union pds_core_dev_comp comp;
+	int err;
+
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+	if (err < 0)
+		return err;
+
+	return comp.fw_control.slot;
+}
+
+static int pdsc_devcmd_firmware_activate(struct pdsc *pdsc,
+					 enum pds_core_fw_slot slot)
+{
+	union pds_core_dev_cmd cmd = {
+		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
+		.fw_control.oper = PDS_CORE_FW_ACTIVATE_ASYNC,
+		.fw_control.slot = slot
+	};
+	union pds_core_dev_comp comp;
+
+	return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+static int pdsc_fw_status_long_wait(struct pdsc *pdsc,
+				    const char *label,
+				    unsigned long timeout,
+				    u8 fw_cmd,
+				    struct netlink_ext_ack *extack)
+{
+	union pds_core_dev_cmd cmd = {
+		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
+		.fw_control.oper = fw_cmd,
+	};
+	union pds_core_dev_comp comp;
+	unsigned long start_time;
+	unsigned long end_time;
+	int err;
+
+	/* Ping on the status of the long running async install
+	 * command.  We get EAGAIN while the command is still
+	 * running, else we get the final command status.
+	 */
+	start_time = jiffies;
+	end_time = start_time + (timeout * HZ);
+	do {
+		err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+		msleep(20);
+	} while (time_before(jiffies, end_time) &&
+		 (err == -EAGAIN || err == -ETIMEDOUT));
+
+	if (err == -EAGAIN || err == -ETIMEDOUT) {
+		NL_SET_ERR_MSG_MOD(extack, "Firmware wait timed out");
+		dev_err(pdsc->dev, "DEV_CMD firmware wait %s timed out\n", label);
+	} else if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Firmware wait failed");
+	}
+
+	return err;
+}
+
+int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
+			 struct netlink_ext_ack *extack)
+{
+	u32 buf_sz, copy_sz, offset;
+	struct devlink *dl;
+	int next_interval;
+	u64 data_addr;
+	int err = 0;
+	u8 fw_slot;
+
+	dev_info(pdsc->dev, "Installing firmware\n");
+
+	dl = priv_to_devlink(pdsc);
+	devlink_flash_update_status_notify(dl, "Preparing to flash", NULL, 0, 0);
+
+	buf_sz = sizeof(pdsc->cmd_regs->data);
+
+	dev_dbg(pdsc->dev,
+		"downloading firmware - size %d part_sz %d nparts %lu\n",
+		(int)fw->size, buf_sz, DIV_ROUND_UP(fw->size, buf_sz));
+
+	offset = 0;
+	next_interval = 0;
+	data_addr = offsetof(struct pds_core_dev_cmd_regs, data);
+	while (offset < fw->size) {
+		if (offset >= next_interval) {
+			devlink_flash_update_status_notify(dl, "Downloading", NULL,
+							   offset, fw->size);
+			next_interval = offset + (fw->size / PDSC_FW_INTERVAL_FRACTION);
+		}
+
+		copy_sz = min_t(unsigned int, buf_sz, fw->size - offset);
+		mutex_lock(&pdsc->devcmd_lock);
+		memcpy_toio(&pdsc->cmd_regs->data, fw->data + offset, copy_sz);
+		err = pdsc_devcmd_firmware_download(pdsc, data_addr, offset, copy_sz);
+		mutex_unlock(&pdsc->devcmd_lock);
+		if (err) {
+			dev_err(pdsc->dev,
+				"download failed offset 0x%x addr 0x%llx len 0x%x: %pe\n",
+				offset, data_addr, copy_sz, ERR_PTR(err));
+			NL_SET_ERR_MSG_MOD(extack, "Segment download failed");
+			goto err_out;
+		}
+		offset += copy_sz;
+	}
+	devlink_flash_update_status_notify(dl, "Downloading", NULL,
+					   fw->size, fw->size);
+
+	devlink_flash_update_timeout_notify(dl, "Installing", NULL,
+					    PDSC_FW_INSTALL_TIMEOUT);
+
+	fw_slot = pdsc_devcmd_firmware_install(pdsc);
+	if (fw_slot < 0) {
+		err = fw_slot;
+		dev_err(pdsc->dev, "install failed: %pe\n", ERR_PTR(err));
+		NL_SET_ERR_MSG_MOD(extack, "Failed to start firmware install");
+		goto err_out;
+	}
+
+	err = pdsc_fw_status_long_wait(pdsc, "Installing",
+				       PDSC_FW_INSTALL_TIMEOUT,
+				       PDS_CORE_FW_INSTALL_STATUS,
+				       extack);
+	if (err)
+		goto err_out;
+
+	devlink_flash_update_timeout_notify(dl, "Selecting", NULL,
+					    PDSC_FW_SELECT_TIMEOUT);
+
+	err = pdsc_devcmd_firmware_activate(pdsc, fw_slot);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to start firmware select");
+		goto err_out;
+	}
+
+	err = pdsc_fw_status_long_wait(pdsc, "Selecting",
+				       PDSC_FW_SELECT_TIMEOUT,
+				       PDS_CORE_FW_ACTIVATE_STATUS,
+				       extack);
+	if (err)
+		goto err_out;
+
+	dev_info(pdsc->dev, "Firmware update completed, slot %d\n", fw_slot);
+
+err_out:
+	if (err)
+		devlink_flash_update_status_notify(dl, "Flash failed", NULL, 0, 0);
+	else
+		devlink_flash_update_status_notify(dl, "Flash done", NULL, 0, 0);
+	return err;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 07/19] pds_core: set up the VIF definitions and defaults
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (5 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 08/19] pds_core: initial VF configuration Shannon Nelson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

The Virtual Interfaces (VIFs) supported by the DSC's
configuration (VFio Live Migration, vDPA, etc) are reported
in the dev_ident struct.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/net/ethernet/pensando/pds_core/core.c | 54 +++++++++++++++++++
 drivers/net/ethernet/pensando/pds_core/core.h | 14 +++++
 .../net/ethernet/pensando/pds_core/debugfs.c  | 23 ++++++++
 include/linux/pds/pds_common.h                | 19 +++++++
 4 files changed, 110 insertions(+)

diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
index e2017cee8284..203a27a0fc5c 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.c
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -358,6 +358,47 @@ static int pdsc_core_init(struct pdsc *pdsc)
 	return err;
 }
 
+static struct pdsc_viftype pdsc_viftype_defaults[] = {
+	[PDS_DEV_TYPE_VDPA] = { .name = PDS_DEV_TYPE_VDPA_STR,
+				.vif_id = PDS_DEV_TYPE_VDPA,
+				.dl_id = DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET },
+	[PDS_DEV_TYPE_LM]   = { .name = PDS_DEV_TYPE_LM_STR,
+				.vif_id = PDS_DEV_TYPE_LM,
+				.dl_id = PDSC_DEVLINK_PARAM_ID_LM },
+	[PDS_DEV_TYPE_MAX] = { 0 }
+};
+
+static int pdsc_viftypes_init(struct pdsc *pdsc)
+{
+	enum pds_core_vif_types vt;
+
+	pdsc->viftype_status = devm_kzalloc(pdsc->dev,
+					    sizeof(pdsc_viftype_defaults),
+					    GFP_KERNEL);
+	if (!pdsc->viftype_status)
+		return -ENOMEM;
+
+	for (vt = 0; vt < PDS_DEV_TYPE_MAX; vt++) {
+		bool vt_support;
+
+		if (!pdsc_viftype_defaults[vt].name)
+			continue;
+
+		/* Grab the defaults */
+		pdsc->viftype_status[vt] = pdsc_viftype_defaults[vt];
+
+		/* See what the Core device has for support */
+		vt_support = !!le16_to_cpu(pdsc->dev_ident.vif_types[vt]);
+		dev_dbg(pdsc->dev, "VIF %s is %ssupported\n",
+			pdsc->viftype_status[vt].name,
+			vt_support ? "" : "not ");
+
+		pdsc->viftype_status[vt].supported = vt_support;
+	}
+
+	return 0;
+}
+
 int pdsc_setup(struct pdsc *pdsc, bool init)
 {
 	int numdescs;
@@ -400,6 +441,14 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
 	if (err)
 		goto err_out_teardown;
 
+	/* Set up the VIFs */
+	err = pdsc_viftypes_init(pdsc);
+	if (err)
+		goto err_out_teardown;
+
+	if (init)
+		pdsc_debugfs_add_viftype(pdsc);
+
 	clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
 	return 0;
 
@@ -416,6 +465,11 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 	pdsc_qcq_free(pdsc, &pdsc->notifyqcq);
 	pdsc_qcq_free(pdsc, &pdsc->adminqcq);
 
+	if (pdsc->viftype_status) {
+		devm_kfree(pdsc->dev, pdsc->viftype_status);
+		pdsc->viftype_status = NULL;
+	}
+
 	if (pdsc->intr_info) {
 		for (i = 0; i < pdsc->nintrs; i++)
 			pdsc_intr_free(pdsc, i);
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 687e1debd079..46d10afb0bde 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -123,8 +123,18 @@ struct pdsc_qcq {
 	struct dentry *dentry;
 };
 
+struct pdsc_viftype {
+	char *name;
+	bool supported;
+	bool enabled;
+	int dl_id;
+	int vif_id;
+	struct pds_auxiliary_dev *padev;
+};
+
 enum pdsc_devlink_param_id {
 	PDSC_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	PDSC_DEVLINK_PARAM_ID_LM,
 	PDSC_DEVLINK_PARAM_ID_FW_BOOT,
 };
 
@@ -178,6 +188,7 @@ struct pdsc {
 	struct pdsc_qcq adminqcq;
 	struct pdsc_qcq notifyqcq;
 	u64 last_eid;
+	struct pdsc_viftype *viftype_status;
 };
 
 /** enum pds_core_dbell_bits - bitwise composition of dbell values.
@@ -232,6 +243,7 @@ struct pdsc *pdsc_dl_alloc(struct device *dev);
 void pdsc_dl_free(struct pdsc *pdsc);
 int pdsc_dl_register(struct pdsc *pdsc);
 void pdsc_dl_unregister(struct pdsc *pdsc);
+int pdsc_dl_vif_add(struct pdsc *pdsc, enum pds_core_vif_types vt, const char *name);
 
 #ifdef CONFIG_DEBUG_FS
 void pdsc_debugfs_create(void);
@@ -239,6 +251,7 @@ void pdsc_debugfs_destroy(void);
 void pdsc_debugfs_add_dev(struct pdsc *pdsc);
 void pdsc_debugfs_del_dev(struct pdsc *pdsc);
 void pdsc_debugfs_add_ident(struct pdsc *pdsc);
+void pdsc_debugfs_add_viftype(struct pdsc *pdsc);
 void pdsc_debugfs_add_irqs(struct pdsc *pdsc);
 void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq);
 void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq);
@@ -248,6 +261,7 @@ static inline void pdsc_debugfs_destroy(void) { }
 static inline void pdsc_debugfs_add_dev(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_del_dev(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_add_ident(struct pdsc *pdsc) { }
+static inline void pdsc_debugfs_add_viftype(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_add_irqs(struct pdsc *pdsc) { }
 static inline void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq) { }
 static inline void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq) { }
diff --git a/drivers/net/ethernet/pensando/pds_core/debugfs.c b/drivers/net/ethernet/pensando/pds_core/debugfs.c
index 294bb97ca639..5b8d53912691 100644
--- a/drivers/net/ethernet/pensando/pds_core/debugfs.c
+++ b/drivers/net/ethernet/pensando/pds_core/debugfs.c
@@ -82,6 +82,29 @@ void pdsc_debugfs_add_ident(struct pdsc *pdsc)
 	debugfs_create_file("identity", 0400, pdsc->dentry, pdsc, &identity_fops);
 }
 
+static int viftype_show(struct seq_file *seq, void *v)
+{
+	struct pdsc *pdsc = seq->private;
+	int vt;
+
+	for (vt = 0; vt < PDS_DEV_TYPE_MAX; vt++) {
+		if (!pdsc->viftype_status[vt].name)
+			continue;
+
+		seq_printf(seq, "%s\t%d supported %d enabled\n",
+			   pdsc->viftype_status[vt].name,
+			   pdsc->viftype_status[vt].supported,
+			   pdsc->viftype_status[vt].enabled);
+	}
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(viftype);
+
+void pdsc_debugfs_add_viftype(struct pdsc *pdsc)
+{
+	debugfs_create_file("viftypes", 0400, pdsc->dentry, pdsc, &viftype_fops);
+}
+
 static int irqs_show(struct seq_file *seq, void *v)
 {
 	struct pdsc *pdsc = seq->private;
diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h
index e7fe84379a2f..2fa4ec440ef5 100644
--- a/include/linux/pds/pds_common.h
+++ b/include/linux/pds/pds_common.h
@@ -50,6 +50,25 @@ enum pds_core_driver_type {
 	PDS_DRIVER_ESXI    = 6,
 };
 
+enum pds_core_vif_types {
+	PDS_DEV_TYPE_CORE	= 0,
+	PDS_DEV_TYPE_VDPA	= 1,
+	PDS_DEV_TYPE_VFIO	= 2,
+	PDS_DEV_TYPE_ETH	= 3,
+	PDS_DEV_TYPE_RDMA	= 4,
+	PDS_DEV_TYPE_LM		= 5,
+
+	/* new ones added before this line */
+	PDS_DEV_TYPE_MAX	= 16   /* don't change - used in struct size */
+};
+
+#define PDS_DEV_TYPE_CORE_STR	"Core"
+#define PDS_DEV_TYPE_VDPA_STR	"vDPA"
+#define PDS_DEV_TYPE_VFIO_STR	"VFio"
+#define PDS_DEV_TYPE_ETH_STR	"Eth"
+#define PDS_DEV_TYPE_RDMA_STR	"RDMA"
+#define PDS_DEV_TYPE_LM_STR	"LM"
+
 /* PDSC interface uses identity version 1 and PDSC uses 2 */
 #define PDSC_IDENTITY_VERSION_1		1
 #define PDSC_IDENTITY_VERSION_2		2
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (6 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 07/19] pds_core: set up the VIF definitions and defaults Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-28 18:28   ` Jakub Kicinski
  2022-11-18 22:56 ` [RFC PATCH net-next 09/19] pds_core: add auxiliary_bus devices Shannon Nelson
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

In order to manage the VFs associated with thie Core PF we
need a PF netdev.  This netdev is a simple representor to make
available the "ip link set <if> vf ..." commands that are useful
for setting attributes used by the vDPA VF.  There is no packet
handling in this netdev as the Core device has no Tx/Rx queues.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/Makefile   |   1 +
 drivers/net/ethernet/pensando/pds_core/core.c |  17 +
 drivers/net/ethernet/pensando/pds_core/core.h |  24 +
 drivers/net/ethernet/pensando/pds_core/main.c |  65 +++
 .../net/ethernet/pensando/pds_core/netdev.c   | 504 ++++++++++++++++++
 5 files changed, 611 insertions(+)
 create mode 100644 drivers/net/ethernet/pensando/pds_core/netdev.c

diff --git a/drivers/net/ethernet/pensando/pds_core/Makefile b/drivers/net/ethernet/pensando/pds_core/Makefile
index 06bd3da8c38b..ee794cc08fda 100644
--- a/drivers/net/ethernet/pensando/pds_core/Makefile
+++ b/drivers/net/ethernet/pensando/pds_core/Makefile
@@ -8,6 +8,7 @@ pds_core-y := main.o \
 	      dev.o \
 	      adminq.o \
 	      core.o \
+	      netdev.o \
 	      fw.o
 
 pds_core-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
index 203a27a0fc5c..202f1a6b4605 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.c
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -449,6 +449,12 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
 	if (init)
 		pdsc_debugfs_add_viftype(pdsc);
 
+	if (init) {
+		err = pdsc_init_netdev(pdsc);
+		if (err)
+			goto err_out_teardown;
+	}
+
 	clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
 	return 0;
 
@@ -461,6 +467,12 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 {
 	int i;
 
+	if (removing && pdsc->netdev) {
+		unregister_netdev(pdsc->netdev);
+		free_netdev(pdsc->netdev);
+		pdsc->netdev = NULL;
+	}
+
 	pdsc_devcmd_reset(pdsc);
 	pdsc_qcq_free(pdsc, &pdsc->notifyqcq);
 	pdsc_qcq_free(pdsc, &pdsc->adminqcq);
@@ -528,6 +540,7 @@ static void pdsc_fw_down(struct pdsc *pdsc)
 		return;
 	}
 
+	netif_device_detach(pdsc->netdev);
 	pdsc_mask_interrupts(pdsc);
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY);
 
@@ -554,8 +567,12 @@ static void pdsc_fw_up(struct pdsc *pdsc)
 	if (err)
 		goto err_out;
 
+	netif_device_attach(pdsc->netdev);
+
 	mutex_unlock(&pdsc->config_lock);
 
+	pdsc_vf_attr_replay(pdsc);
+
 	return;
 
 err_out:
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 46d10afb0bde..07499a8aae21 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -30,6 +30,22 @@ struct pdsc_dev_bar {
 	int res_index;
 };
 
+struct pdsc_vf {
+	u16     index;
+	u8      macaddr[6];
+	__le32  maxrate;
+	__le16  vlanid;
+	u8      spoofchk;
+	u8      trusted;
+	u8      linkstate;
+	__le16  vif_types[PDS_DEV_TYPE_MAX];
+
+	struct pds_core_vf_stats stats;
+	dma_addr_t               stats_pa;
+
+	struct pds_auxiliary_dev *padev;
+};
+
 struct pdsc_devinfo {
 	u8 asic_type;
 	u8 asic_rev;
@@ -153,6 +169,7 @@ struct pdsc {
 	struct dentry *dentry;
 	struct device *dev;
 	struct pdsc_dev_bar bars[PDS_CORE_BARS_MAX];
+	struct pdsc_vf *vfs;
 	int num_vfs;
 	int hw_index;
 	int id;
@@ -172,6 +189,8 @@ struct pdsc {
 	struct pdsc_intr_info *intr_info;	/* array of nintrs elements */
 
 	struct workqueue_struct *wq;
+	struct net_device *netdev;
+	struct rw_semaphore vf_op_lock;	/* lock for VF operations */
 
 	unsigned int devcmd_timeout;
 	struct mutex devcmd_lock;	/* lock for dev_cmd operations */
@@ -309,4 +328,9 @@ irqreturn_t pdsc_adminq_isr(int irq, void *data);
 int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
 			 struct netlink_ext_ack *extack);
 
+int pdsc_init_netdev(struct pdsc *pdsc);
+int pdsc_set_vf_config(struct pdsc *pdsc, int vf,
+		       struct pds_core_vf_setattr_cmd *vfc);
+void pdsc_vf_attr_replay(struct pdsc *pdsc);
+
 #endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/pensando/pds_core/main.c b/drivers/net/ethernet/pensando/pds_core/main.c
index 856704f8827a..36e330c49360 100644
--- a/drivers/net/ethernet/pensando/pds_core/main.c
+++ b/drivers/net/ethernet/pensando/pds_core/main.c
@@ -165,6 +165,67 @@ void __iomem *pdsc_map_dbpage(struct pdsc *pdsc, int page_num)
 			       (u64)page_num << PAGE_SHIFT, PAGE_SIZE);
 }
 
+static int pdsc_sriov_configure(struct pci_dev *pdev, int num_vfs)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_STATSADDR };
+	struct pdsc *pdsc = pci_get_drvdata(pdev);
+	struct device *dev = pdsc->dev;
+	struct pdsc_vf *v;
+	int ret = 0;
+	int i;
+
+	if (num_vfs > 0) {
+		pdsc->vfs = kcalloc(num_vfs, sizeof(struct pdsc_vf), GFP_KERNEL);
+		if (!pdsc->vfs)
+			return -ENOMEM;
+		pdsc->num_vfs = num_vfs;
+
+		for (i = 0; i < num_vfs; i++) {
+			v = &pdsc->vfs[i];
+			v->stats_pa = dma_map_single(pdsc->dev, &v->stats,
+						     sizeof(v->stats), DMA_FROM_DEVICE);
+			if (dma_mapping_error(pdsc->dev, v->stats_pa)) {
+				dev_err(pdsc->dev, "DMA mapping failed for vf[%d] stats\n", i);
+				v->stats_pa = 0;
+			} else {
+				vfc.stats.len = cpu_to_le32(sizeof(v->stats));
+				vfc.stats.pa = cpu_to_le64(v->stats_pa);
+				pdsc_set_vf_config(pdsc, i, &vfc);
+			}
+		}
+
+		ret = pci_enable_sriov(pdev, num_vfs);
+		if (ret) {
+			dev_err(dev, "Cannot enable SRIOV: %pe\n", ERR_PTR(ret));
+			goto no_vfs;
+		}
+
+		return num_vfs;
+	}
+
+no_vfs:
+	pci_disable_sriov(pdev);
+
+	for (i = pdsc->num_vfs - 1; i >= 0; i--) {
+		v = &pdsc->vfs[i];
+
+		if (v->stats_pa) {
+			vfc.stats.len = 0;
+			vfc.stats.pa = 0;
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			dma_unmap_single(pdsc->dev, v->stats_pa,
+					 sizeof(v->stats), DMA_FROM_DEVICE);
+			v->stats_pa = 0;
+		}
+	}
+
+	kfree(pdsc->vfs);
+	pdsc->vfs = NULL;
+	pdsc->num_vfs = 0;
+
+	return ret;
+}
+
 static DEFINE_IDA(pdsc_pf_ida);
 
 #define PDSC_WQ_NAME_LEN 24
@@ -237,6 +298,7 @@ static int pdsc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	spin_lock_init(&pdsc->adminq_lock);
 
 	mutex_lock(&pdsc->config_lock);
+	init_rwsem(&pdsc->vf_op_lock);
 	err = pdsc_setup(pdsc, PDSC_SETUP_INIT);
 	if (err)
 		goto err_out_unmap_bars;
@@ -300,6 +362,8 @@ static void pdsc_remove(struct pci_dev *pdev)
 	 */
 	pdsc_dl_unregister(pdsc);
 
+	pdsc_sriov_configure(pdev, 0);
+
 	/* Now we can lock it up and tear it down */
 	mutex_lock(&pdsc->config_lock);
 	set_bit(PDSC_S_STOPPING_DRIVER, &pdsc->state);
@@ -337,6 +401,7 @@ static struct pci_driver pdsc_driver = {
 	.id_table = pdsc_id_table,
 	.probe = pdsc_probe,
 	.remove = pdsc_remove,
+	.sriov_configure = pdsc_sriov_configure,
 };
 
 static int __init pdsc_init_module(void)
diff --git a/drivers/net/ethernet/pensando/pds_core/netdev.c b/drivers/net/ethernet/pensando/pds_core/netdev.c
new file mode 100644
index 000000000000..0d7f9c2c7df8
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/netdev.c
@@ -0,0 +1,504 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <net/devlink.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+
+#include "core.h"
+
+static const char *pdsc_vf_attr_to_str(enum pds_core_vf_attr attr)
+{
+	switch (attr) {
+	case PDS_CORE_VF_ATTR_SPOOFCHK:
+		return "PDS_CORE_VF_ATTR_SPOOFCHK";
+	case PDS_CORE_VF_ATTR_TRUST:
+		return "PDS_CORE_VF_ATTR_TRUST";
+	case PDS_CORE_VF_ATTR_LINKSTATE:
+		return "PDS_CORE_VF_ATTR_LINKSTATE";
+	case PDS_CORE_VF_ATTR_MAC:
+		return "PDS_CORE_VF_ATTR_MAC";
+	case PDS_CORE_VF_ATTR_VLAN:
+		return "PDS_CORE_VF_ATTR_VLAN";
+	case PDS_CORE_VF_ATTR_RATE:
+		return "PDS_CORE_VF_ATTR_RATE";
+	case PDS_CORE_VF_ATTR_STATSADDR:
+		return "PDS_CORE_VF_ATTR_STATSADDR";
+	default:
+		return "PDS_CORE_VF_ATTR_UNKNOWN";
+	}
+}
+
+static int pdsc_get_vf_stats(struct net_device *netdev, int vf,
+			     struct ifla_vf_stats *vf_stats)
+{
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	struct pds_core_vf_stats *vs;
+	int err = 0;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_read(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		memset(vf_stats, 0, sizeof(*vf_stats));
+		vs = &pdsc->vfs[vf].stats;
+
+		vf_stats->rx_packets = le64_to_cpu(vs->rx_ucast_packets);
+		vf_stats->tx_packets = le64_to_cpu(vs->tx_ucast_packets);
+		vf_stats->rx_bytes   = le64_to_cpu(vs->rx_ucast_bytes);
+		vf_stats->tx_bytes   = le64_to_cpu(vs->tx_ucast_bytes);
+		vf_stats->broadcast  = le64_to_cpu(vs->rx_bcast_packets);
+		vf_stats->multicast  = le64_to_cpu(vs->rx_mcast_packets);
+		vf_stats->rx_dropped = le64_to_cpu(vs->rx_ucast_drop_packets) +
+				       le64_to_cpu(vs->rx_mcast_drop_packets) +
+				       le64_to_cpu(vs->rx_bcast_drop_packets);
+		vf_stats->tx_dropped = le64_to_cpu(vs->tx_ucast_drop_packets) +
+				       le64_to_cpu(vs->tx_mcast_drop_packets) +
+				       le64_to_cpu(vs->tx_bcast_drop_packets);
+	}
+
+	up_read(&pdsc->vf_op_lock);
+	return err;
+}
+
+static int pdsc_get_fw_vf_config(struct pdsc *pdsc, int vf, struct pdsc_vf *vfdata)
+{
+	struct pds_core_vf_getattr_comp comp = { 0 };
+	int err;
+	u8 attr;
+
+	attr = PDS_CORE_VF_ATTR_VLAN;
+	err = pdsc_dev_cmd_vf_getattr(pdsc, vf, attr, &comp);
+	if (err && comp.status != PDS_RC_ENOSUPP)
+		goto err_out;
+	if (!err)
+		vfdata->vlanid = comp.vlanid;
+
+	attr = PDS_CORE_VF_ATTR_SPOOFCHK;
+	err = pdsc_dev_cmd_vf_getattr(pdsc, vf, attr, &comp);
+	if (err && comp.status != PDS_RC_ENOSUPP)
+		goto err_out;
+	if (!err)
+		vfdata->spoofchk = comp.spoofchk;
+
+	attr = PDS_CORE_VF_ATTR_LINKSTATE;
+	err = pdsc_dev_cmd_vf_getattr(pdsc, vf, attr, &comp);
+	if (err && comp.status != PDS_RC_ENOSUPP)
+		goto err_out;
+	if (!err) {
+		switch (comp.linkstate) {
+		case PDS_CORE_VF_LINK_STATUS_UP:
+			vfdata->linkstate = IFLA_VF_LINK_STATE_ENABLE;
+			break;
+		case PDS_CORE_VF_LINK_STATUS_DOWN:
+			vfdata->linkstate = IFLA_VF_LINK_STATE_DISABLE;
+			break;
+		case PDS_CORE_VF_LINK_STATUS_AUTO:
+			vfdata->linkstate = IFLA_VF_LINK_STATE_AUTO;
+			break;
+		default:
+			dev_warn(pdsc->dev, "Unexpected link state %u\n", comp.linkstate);
+			break;
+		}
+	}
+
+	attr = PDS_CORE_VF_ATTR_RATE;
+	err = pdsc_dev_cmd_vf_getattr(pdsc, vf, attr, &comp);
+	if (err && comp.status != PDS_RC_ENOSUPP)
+		goto err_out;
+	if (!err)
+		vfdata->maxrate = comp.maxrate;
+
+	attr = PDS_CORE_VF_ATTR_TRUST;
+	err = pdsc_dev_cmd_vf_getattr(pdsc, vf, attr, &comp);
+	if (err && comp.status != PDS_RC_ENOSUPP)
+		goto err_out;
+	if (!err)
+		vfdata->trusted = comp.trust;
+
+	attr = PDS_CORE_VF_ATTR_MAC;
+	err = pdsc_dev_cmd_vf_getattr(pdsc, vf, attr, &comp);
+	if (err && comp.status != PDS_RC_ENOSUPP)
+		goto err_out;
+	if (!err)
+		ether_addr_copy(vfdata->macaddr, comp.macaddr);
+
+err_out:
+	if (err)
+		dev_err(pdsc->dev, "Failed to get %s for VF %d\n",
+			pdsc_vf_attr_to_str(attr), vf);
+
+	return err;
+}
+
+static int pdsc_get_vf_config(struct net_device *netdev,
+			      int vf, struct ifla_vf_info *ivf)
+{
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	struct pdsc_vf vfdata = { 0 };
+	int err = 0;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_read(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		ivf->vf = vf;
+		ivf->qos = 0;
+
+		err = pdsc_get_fw_vf_config(pdsc, vf, &vfdata);
+		if (!err) {
+			ivf->vlan         = le16_to_cpu(vfdata.vlanid);
+			ivf->spoofchk     = vfdata.spoofchk;
+			ivf->linkstate    = vfdata.linkstate;
+			ivf->max_tx_rate  = le32_to_cpu(vfdata.maxrate);
+			ivf->trusted      = vfdata.trusted;
+			ether_addr_copy(ivf->mac, vfdata.macaddr);
+		}
+	}
+
+	up_read(&pdsc->vf_op_lock);
+	return err;
+}
+
+int pdsc_set_vf_config(struct pdsc *pdsc, int vf,
+		       struct pds_core_vf_setattr_cmd *vfc)
+{
+	union pds_core_dev_comp comp = { 0 };
+	union pds_core_dev_cmd cmd = {
+		.vf_setattr.opcode = PDS_CORE_CMD_VF_SETATTR,
+		.vf_setattr.attr = vfc->attr,
+		.vf_setattr.vf_index = cpu_to_le16(vf),
+	};
+	int err;
+
+	if (vf >= pdsc->num_vfs)
+		return -EINVAL;
+
+	memcpy(cmd.vf_setattr.pad, vfc->pad, sizeof(vfc->pad));
+
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+
+	return err;
+}
+
+static int pdsc_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_MAC };
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	int err;
+
+	if (!(is_zero_ether_addr(mac) || is_valid_ether_addr(mac)))
+		return -EINVAL;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_write(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		ether_addr_copy(vfc.macaddr, mac);
+		dev_dbg(pdsc->dev, "%s: vf %d macaddr %pM\n",
+			__func__, vf, vfc.macaddr);
+
+		err = pdsc_set_vf_config(pdsc, vf, &vfc);
+		if (!err)
+			ether_addr_copy(pdsc->vfs[vf].macaddr, mac);
+	}
+
+	up_write(&pdsc->vf_op_lock);
+	return err;
+}
+
+static int pdsc_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
+			    u8 qos, __be16 proto)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_VLAN };
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	int err;
+
+	/* until someday when we support qos */
+	if (qos)
+		return -EINVAL;
+
+	if (vlan > 4095)
+		return -EINVAL;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_write(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		vfc.vlanid = cpu_to_le16(vlan);
+		dev_dbg(pdsc->dev, "%s: vf %d vlan %d\n",
+			__func__, vf, le16_to_cpu(vfc.vlanid));
+
+		err = pdsc_set_vf_config(pdsc, vf, &vfc);
+		if (!err)
+			pdsc->vfs[vf].vlanid = cpu_to_le16(vlan);
+	}
+
+	up_write(&pdsc->vf_op_lock);
+	return err;
+}
+
+static int pdsc_set_vf_trust(struct net_device *netdev, int vf, bool set)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_TRUST };
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	u8 data = set;  /* convert to u8 for config */
+	int err;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_write(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		vfc.trust = set;
+		dev_dbg(pdsc->dev, "%s: vf %d trust %d\n",
+			__func__, vf, vfc.trust);
+
+		err = pdsc_set_vf_config(pdsc, vf, &vfc);
+		if (!err)
+			pdsc->vfs[vf].trusted = data;
+	}
+
+	up_write(&pdsc->vf_op_lock);
+	return err;
+}
+
+static int pdsc_set_vf_rate(struct net_device *netdev, int vf,
+			    int tx_min, int tx_max)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_RATE };
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	int err;
+
+	/* setting the min just seems silly */
+	if (tx_min)
+		return -EINVAL;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_write(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		vfc.maxrate = cpu_to_le32(tx_max);
+		dev_dbg(pdsc->dev, "%s: vf %d maxrate %d\n",
+			__func__, vf, le32_to_cpu(vfc.maxrate));
+
+		err = pdsc_set_vf_config(pdsc, vf, &vfc);
+		if (!err)
+			pdsc->vfs[vf].maxrate = cpu_to_le32(tx_max);
+	}
+
+	up_write(&pdsc->vf_op_lock);
+	return err;
+}
+
+static int pdsc_set_vf_spoofchk(struct net_device *netdev, int vf, bool set)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_SPOOFCHK };
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	u8 data = set;  /* convert to u8 for config */
+	int err;
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_write(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		vfc.spoofchk = set;
+		dev_dbg(pdsc->dev, "%s: vf %d spoof %d\n",
+			__func__, vf, vfc.spoofchk);
+
+		err = pdsc_set_vf_config(pdsc, vf, &vfc);
+		if (!err)
+			pdsc->vfs[vf].spoofchk = data;
+	}
+
+	up_write(&pdsc->vf_op_lock);
+	return err;
+}
+
+static int pdsc_set_vf_link_state(struct net_device *netdev, int vf, int set)
+{
+	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_LINKSTATE };
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+	u8 data;
+	int err;
+
+	switch (set) {
+	case IFLA_VF_LINK_STATE_ENABLE:
+		data = PDS_CORE_VF_LINK_STATUS_UP;
+		break;
+	case IFLA_VF_LINK_STATE_DISABLE:
+		data = PDS_CORE_VF_LINK_STATUS_DOWN;
+		break;
+	case IFLA_VF_LINK_STATE_AUTO:
+		data = PDS_CORE_VF_LINK_STATUS_AUTO;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (!netif_device_present(netdev))
+		return -EBUSY;
+
+	down_write(&pdsc->vf_op_lock);
+
+	if (vf >= pci_num_vf(pdsc->pdev) || !pdsc->vfs) {
+		err = -EINVAL;
+	} else {
+		vfc.linkstate = data;
+		dev_dbg(pdsc->dev, "%s: vf %d linkstate %d\n",
+			__func__, vf, vfc.linkstate);
+
+		err = pdsc_set_vf_config(pdsc, vf, &vfc);
+		if (!err)
+			pdsc->vfs[vf].linkstate = set;
+	}
+
+	up_write(&pdsc->vf_op_lock);
+	return err;
+}
+
+static const struct net_device_ops pdsc_netdev_ops = {
+	.ndo_set_vf_vlan	= pdsc_set_vf_vlan,
+	.ndo_set_vf_mac		= pdsc_set_vf_mac,
+	.ndo_set_vf_trust	= pdsc_set_vf_trust,
+	.ndo_set_vf_rate	= pdsc_set_vf_rate,
+	.ndo_set_vf_spoofchk	= pdsc_set_vf_spoofchk,
+	.ndo_set_vf_link_state	= pdsc_set_vf_link_state,
+	.ndo_get_vf_config	= pdsc_get_vf_config,
+	.ndo_get_vf_stats       = pdsc_get_vf_stats,
+};
+
+static void pdsc_get_drvinfo(struct net_device *netdev,
+			     struct ethtool_drvinfo *drvinfo)
+{
+	struct pdsc *pdsc = *(struct pdsc **)netdev_priv(netdev);
+
+	strscpy(drvinfo->driver, PDS_CORE_DRV_NAME, sizeof(drvinfo->driver));
+	strscpy(drvinfo->fw_version, pdsc->dev_info.fw_version, sizeof(drvinfo->fw_version));
+	strscpy(drvinfo->bus_info, pci_name(pdsc->pdev), sizeof(drvinfo->bus_info));
+}
+
+static const struct ethtool_ops pdsc_ethtool_ops = {
+	.get_drvinfo = pdsc_get_drvinfo,
+};
+
+int pdsc_init_netdev(struct pdsc *pdsc)
+{
+	struct pdsc **p;
+
+	pdsc->netdev = alloc_netdev(sizeof(struct pdsc *), "pdsc%d",
+				    NET_NAME_UNKNOWN, ether_setup);
+	SET_NETDEV_DEV(pdsc->netdev, pdsc->dev);
+	pdsc->netdev->netdev_ops = &pdsc_netdev_ops;
+	pdsc->netdev->ethtool_ops = &pdsc_ethtool_ops;
+
+	p = netdev_priv(pdsc->netdev);
+	*p = pdsc;
+
+	netif_carrier_off(pdsc->netdev);
+
+	return register_netdev(pdsc->netdev);
+}
+
+void pdsc_vf_attr_replay(struct pdsc *pdsc)
+{
+	struct pds_core_vf_setattr_cmd vfc;
+	struct pdsc_vf *v;
+	int i;
+
+	if (!pdsc->vfs)
+		return;
+
+	down_read(&pdsc->vf_op_lock);
+
+	for (i = 0; i < pdsc->num_vfs; i++) {
+		v = &pdsc->vfs[i];
+
+		if (v->stats_pa) {
+			vfc.attr = PDS_CORE_VF_ATTR_STATSADDR;
+			vfc.stats.len = cpu_to_le32(sizeof(v->stats_pa));
+			vfc.stats.pa = cpu_to_le64(v->stats_pa);
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			vfc.stats.pa = 0;
+			vfc.stats.len = 0;
+		}
+
+		if (!is_zero_ether_addr(v->macaddr)) {
+			vfc.attr = PDS_CORE_VF_ATTR_MAC;
+			ether_addr_copy(vfc.macaddr, v->macaddr);
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			eth_zero_addr(vfc.macaddr);
+		}
+
+		if (v->vlanid) {
+			vfc.attr = PDS_CORE_VF_ATTR_VLAN;
+			vfc.vlanid = v->vlanid;
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			vfc.vlanid = 0;
+		}
+
+		if (v->maxrate) {
+			vfc.attr = PDS_CORE_VF_ATTR_RATE;
+			vfc.maxrate = v->maxrate;
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			vfc.maxrate = 0;
+		}
+
+		if (v->spoofchk) {
+			vfc.attr = PDS_CORE_VF_ATTR_SPOOFCHK;
+			vfc.spoofchk = v->spoofchk;
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			vfc.spoofchk = 0;
+		}
+
+		if (v->trusted) {
+			vfc.attr = PDS_CORE_VF_ATTR_TRUST;
+			vfc.trust = v->trusted;
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			vfc.trust = 0;
+		}
+
+		if (v->linkstate) {
+			vfc.attr = PDS_CORE_VF_ATTR_LINKSTATE;
+			vfc.linkstate = v->linkstate;
+			pdsc_set_vf_config(pdsc, i, &vfc);
+			vfc.linkstate = 0;
+		}
+	}
+
+	up_read(&pdsc->vf_op_lock);
+
+	pds_devcmd_vf_start(pdsc);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 09/19] pds_core: add auxiliary_bus devices
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (7 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 08/19] pds_core: initial VF configuration Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support Shannon Nelson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

An auxiliary_bus device is created for each VF, and the device
name is made up of the PF driver name, VIF name, and PCI BDF
of the VF.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/Makefile   |   1 +
 .../net/ethernet/pensando/pds_core/auxbus.c   | 126 ++++++++++++++++++
 drivers/net/ethernet/pensando/pds_core/core.h |   3 +
 drivers/net/ethernet/pensando/pds_core/main.c |  18 +++
 include/linux/pds/pds_auxbus.h                |  37 +++++
 5 files changed, 185 insertions(+)
 create mode 100644 drivers/net/ethernet/pensando/pds_core/auxbus.c
 create mode 100644 include/linux/pds/pds_auxbus.h

diff --git a/drivers/net/ethernet/pensando/pds_core/Makefile b/drivers/net/ethernet/pensando/pds_core/Makefile
index ee794cc08fda..22f23874354e 100644
--- a/drivers/net/ethernet/pensando/pds_core/Makefile
+++ b/drivers/net/ethernet/pensando/pds_core/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_PDS_CORE) := pds_core.o
 
 pds_core-y := main.o \
 	      devlink.o \
+	      auxbus.o \
 	      dev.o \
 	      adminq.o \
 	      core.o \
diff --git a/drivers/net/ethernet/pensando/pds_core/auxbus.c b/drivers/net/ethernet/pensando/pds_core/auxbus.c
new file mode 100644
index 000000000000..9b67cb4006d9
--- /dev/null
+++ b/drivers/net/ethernet/pensando/pds_core/auxbus.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+
+#include "core.h"
+
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+
+static void pdsc_auxbus_dev_release(struct device *dev)
+{
+	struct pds_auxiliary_dev *padev =
+		container_of(dev, struct pds_auxiliary_dev, aux_dev.dev);
+
+	devm_kfree(dev->parent, padev);
+}
+
+static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *pdsc,
+							  char *name, u32 id,
+							  struct pci_dev *client_dev)
+{
+	struct auxiliary_device *aux_dev;
+	struct pds_auxiliary_dev *padev;
+	int err;
+
+	padev = devm_kzalloc(pdsc->dev, sizeof(*padev), GFP_KERNEL);
+	if (!padev)
+		return NULL;
+
+	padev->pcidev = client_dev;
+
+	aux_dev = &padev->aux_dev;
+	aux_dev->name = name;
+	aux_dev->id = id;
+	padev->id = id;
+	aux_dev->dev.parent = pdsc->dev;
+	aux_dev->dev.release = pdsc_auxbus_dev_release;
+
+	err = auxiliary_device_init(aux_dev);
+	if (err < 0) {
+		dev_warn(pdsc->dev, "auxiliary_device_init of %s id %d failed: %pe\n",
+			 name, id, ERR_PTR(err));
+		goto err_out;
+	}
+
+	err = auxiliary_device_add(aux_dev);
+	if (err) {
+		auxiliary_device_uninit(aux_dev);
+		dev_warn(pdsc->dev, "auxiliary_device_add of %s id %d failed: %pe\n",
+			 name, id, ERR_PTR(err));
+		goto err_out;
+	}
+
+	dev_dbg(pdsc->dev, "%s: name %s id %d pdsc %p\n",
+		__func__, padev->aux_dev.name, id, pdsc);
+
+	return padev;
+
+err_out:
+	devm_kfree(pdsc->dev, padev);
+	return NULL;
+}
+
+int pdsc_auxbus_dev_add_vf(struct pdsc *pdsc, int vf_id)
+{
+	struct pds_auxiliary_dev *padev;
+	enum pds_core_vif_types vt;
+	int err = 0;
+
+	if (!pdsc->vfs)
+		return -ENOTTY;
+
+	if (vf_id >= pdsc->num_vfs)
+		return -ERANGE;
+
+	if (pdsc->vfs[vf_id].padev) {
+		dev_info(pdsc->dev, "%s: vfid %d already running\n", __func__, vf_id);
+		return -ENODEV;
+	}
+
+	for (vt = 0; vt < PDS_DEV_TYPE_MAX; vt++) {
+		u16 vt_support;
+		u32 id;
+
+		/* Verify that the type supported and enabled */
+		vt_support = !!le16_to_cpu(pdsc->dev_ident.vif_types[vt]);
+		if (!(vt_support &&
+		      pdsc->viftype_status[vt].supported &&
+		      pdsc->viftype_status[vt].enabled))
+			continue;
+
+		/* TODO: EXPORT pci_iov_virtfn_bus()
+		 *       so we don't need to assume the VF is on the same bus
+		 */
+		id = PCI_DEVID(pdsc->pdev->bus->number,
+			       pci_iov_virtfn_devfn(pdsc->pdev, vf_id));
+		padev = pdsc_auxbus_dev_register(pdsc, pdsc->viftype_status[vt].name, id,
+						 pdsc->pdev);
+		pdsc->vfs[vf_id].padev = padev;
+
+		/* We only support a single type per VF, so jump out here */
+		break;
+	}
+
+	return err;
+}
+
+int pdsc_auxbus_dev_del_vf(struct pdsc *pdsc, int vf_id)
+{
+	struct pds_auxiliary_dev *padev;
+
+	dev_info(pdsc->dev, "%s: vfid %d\n", __func__, vf_id);
+
+	padev = pdsc->vfs[vf_id].padev;
+	pdsc->vfs[vf_id].padev = NULL;
+	if (padev) {
+		auxiliary_device_delete(&padev->aux_dev);
+		auxiliary_device_uninit(&padev->aux_dev);
+	}
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 07499a8aae21..3ab314217464 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -321,6 +321,9 @@ int pdsc_start(struct pdsc *pdsc);
 void pdsc_stop(struct pdsc *pdsc);
 void pdsc_health_thread(struct work_struct *work);
 
+int pdsc_auxbus_dev_add_vf(struct pdsc *pdsc, int vf_id);
+int pdsc_auxbus_dev_del_vf(struct pdsc *pdsc, int vf_id);
+
 void pdsc_process_adminq(struct pdsc_qcq *qcq);
 void pdsc_work_thread(struct work_struct *work);
 irqreturn_t pdsc_adminq_isr(int irq, void *data);
diff --git a/drivers/net/ethernet/pensando/pds_core/main.c b/drivers/net/ethernet/pensando/pds_core/main.c
index 36e330c49360..95d2d25a0919 100644
--- a/drivers/net/ethernet/pensando/pds_core/main.c
+++ b/drivers/net/ethernet/pensando/pds_core/main.c
@@ -170,6 +170,8 @@ static int pdsc_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	struct pds_core_vf_setattr_cmd vfc = { .attr = PDS_CORE_VF_ATTR_STATSADDR };
 	struct pdsc *pdsc = pci_get_drvdata(pdev);
 	struct device *dev = pdsc->dev;
+	enum pds_core_vif_types vt;
+	bool enabled = false;
 	struct pdsc_vf *v;
 	int ret = 0;
 	int i;
@@ -200,9 +202,21 @@ static int pdsc_sriov_configure(struct pci_dev *pdev, int num_vfs)
 			goto no_vfs;
 		}
 
+		/* If any VF types are enabled, start the VF aux devices */
+		for (vt = 0; vt < PDS_DEV_TYPE_MAX && !enabled; vt++)
+			enabled = pdsc->viftype_status[vt].supported &&
+				  pdsc->viftype_status[vt].enabled;
+		if (enabled)
+			for (i = 0; i < num_vfs; i++)
+				pdsc_auxbus_dev_add_vf(pdsc, i);
+
 		return num_vfs;
 	}
 
+	i = pci_num_vf(pdev);
+	while (i--)
+		pdsc_auxbus_dev_del_vf(pdsc, i);
+
 no_vfs:
 	pci_disable_sriov(pdev);
 
@@ -362,6 +376,10 @@ static void pdsc_remove(struct pci_dev *pdev)
 	 */
 	pdsc_dl_unregister(pdsc);
 
+	/* Remove the VFs and their aux_bus connections before other
+	 * cleanup so that the clients can use the AdminQ to cleanly
+	 * shut themselves down.
+	 */
 	pdsc_sriov_configure(pdev, 0);
 
 	/* Now we can lock it up and tear it down */
diff --git a/include/linux/pds/pds_auxbus.h b/include/linux/pds/pds_auxbus.h
new file mode 100644
index 000000000000..7ad66d726b01
--- /dev/null
+++ b/include/linux/pds/pds_auxbus.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+
+#ifndef _PDSC_AUXBUS_H_
+#define _PDSC_AUXBUS_H_
+
+#include <linux/auxiliary_bus.h>
+
+struct pds_auxiliary_dev;
+
+struct pds_auxiliary_drv {
+
+	/* .event_handler() - callback for receiving events
+	 * padev:  ptr to the client device info
+	 * event:  ptr to event data
+	 * The client can provide an event handler callback that can
+	 * receive DSC events.  The Core driver may generate its
+	 * own events which can notify the client of changes in the
+	 * DSC status, such as a RESET event generated when the Core
+	 * has lost contact with the FW - in this case the event.eid
+	 * field will be 0.
+	 */
+	void (*event_handler)(struct pds_auxiliary_dev *padev,
+			      union pds_core_notifyq_comp *event);
+};
+
+struct pds_auxiliary_dev {
+	struct auxiliary_device aux_dev;
+	struct pci_dev *pcidev;
+	u32 id;
+	u16 client_id;
+	void (*event_handler)(struct pds_auxiliary_dev *padev,
+			      union pds_core_notifyq_comp *event);
+	void *priv;
+};
+#endif /* _PDSC_AUXBUS_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (8 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 09/19] pds_core: add auxiliary_bus devices Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-28 18:29   ` Jakub Kicinski
  2022-11-18 22:56 ` [RFC PATCH net-next 11/19] pds_core: add the aux client API Shannon Nelson
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Now that we have the code to start and stop the VFs and
set up the auxiliary_bus devices, let's add the devlink
parameter switches so the user can enable the features.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/devlink.c  | 99 +++++++++++++++++++
 1 file changed, 99 insertions(+)

diff --git a/drivers/net/ethernet/pensando/pds_core/devlink.c b/drivers/net/ethernet/pensando/pds_core/devlink.c
index 0568e8b7391c..2d09643b9add 100644
--- a/drivers/net/ethernet/pensando/pds_core/devlink.c
+++ b/drivers/net/ethernet/pensando/pds_core/devlink.c
@@ -8,6 +8,75 @@
 
 #include "core.h"
 
+static struct pdsc_viftype *pdsc_dl_find_viftype_by_id(struct pdsc *pdsc,
+						       enum devlink_param_type dl_id)
+{
+	int vt;
+
+	for (vt = 0; vt < PDS_DEV_TYPE_MAX; vt++) {
+		if (pdsc->viftype_status[vt].dl_id == dl_id)
+			return &pdsc->viftype_status[vt];
+	}
+
+	return NULL;
+}
+
+static int pdsc_dl_enable_get(struct devlink *dl, u32 id,
+			      struct devlink_param_gset_ctx *ctx)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	struct pdsc_viftype *vt_entry;
+
+	vt_entry = pdsc_dl_find_viftype_by_id(pdsc, id);
+	if (!vt_entry)
+		return -ENOENT;
+
+	ctx->val.vbool = vt_entry->enabled;
+
+	return 0;
+}
+
+static int pdsc_dl_enable_set(struct devlink *dl, u32 id,
+			      struct devlink_param_gset_ctx *ctx)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	struct pdsc_viftype *vt_entry;
+	int err = 0;
+	int vf;
+
+	vt_entry = pdsc_dl_find_viftype_by_id(pdsc, id);
+	if (!vt_entry || !vt_entry->supported)
+		return -EOPNOTSUPP;
+
+	if (vt_entry->enabled == ctx->val.vbool)
+		return 0;
+
+	vt_entry->enabled = ctx->val.vbool;
+	for (vf = 0; vf < pdsc->num_vfs; vf++) {
+		err = ctx->val.vbool ? pdsc_auxbus_dev_add_vf(pdsc, vf) :
+				       pdsc_auxbus_dev_del_vf(pdsc, vf);
+	}
+
+	return err;
+}
+
+static int pdsc_dl_enable_validate(struct devlink *dl, u32 id,
+				   union devlink_param_value val,
+				   struct netlink_ext_ack *extack)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	struct pdsc_viftype *vt_entry;
+
+	vt_entry = pdsc_dl_find_viftype_by_id(pdsc, id);
+	if (!vt_entry || !vt_entry->supported)
+		return -EOPNOTSUPP;
+
+	if (!pdsc->viftype_status[vt_entry->vif_id].supported)
+		return -ENODEV;
+
+	return 0;
+}
+
 static char *slot_labels[] = { "fw.gold", "fw.mainfwa", "fw.mainfwb" };
 
 static int pdsc_dl_fw_boot_get(struct devlink *dl, u32 id,
@@ -84,6 +153,18 @@ static int pdsc_dl_fw_boot_validate(struct devlink *dl, u32 id,
 }
 
 static const struct devlink_param pdsc_dl_params[] = {
+	DEVLINK_PARAM_GENERIC(ENABLE_VNET,
+			      BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			      pdsc_dl_enable_get,
+			      pdsc_dl_enable_set,
+			      pdsc_dl_enable_validate),
+	DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_LM,
+			     "enable_lm",
+			     DEVLINK_PARAM_TYPE_BOOL,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     pdsc_dl_enable_get,
+			     pdsc_dl_enable_set,
+			     pdsc_dl_enable_validate),
 	DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_FW_BOOT,
 			     "boot_fw",
 			     DEVLINK_PARAM_TYPE_STRING,
@@ -93,6 +174,23 @@ static const struct devlink_param pdsc_dl_params[] = {
 			     pdsc_dl_fw_boot_validate),
 };
 
+static void pdsc_dl_set_params_init_values(struct devlink *dl)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	union devlink_param_value value;
+	int vt;
+
+	for (vt = 0; vt < PDS_DEV_TYPE_MAX; vt++) {
+		if (!pdsc->viftype_status[vt].dl_id)
+			continue;
+
+		value.vbool = pdsc->viftype_status[vt].enabled;
+		devlink_param_driverinit_value_set(dl,
+						   pdsc->viftype_status[vt].dl_id,
+						   value);
+	}
+}
+
 static int pdsc_dl_flash_update(struct devlink *dl,
 				struct devlink_flash_update_params *params,
 				struct netlink_ext_ack *extack)
@@ -195,6 +293,7 @@ int pdsc_dl_register(struct pdsc *pdsc)
 				      ARRAY_SIZE(pdsc_dl_params));
 	if (err)
 		return err;
+	pdsc_dl_set_params_init_values(dl);
 
 	devlink_register(dl);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 11/19] pds_core: add the aux client API
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (9 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 12/19] pds_core: publish events to the clients Shannon Nelson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Add the client API operations for registering, unregistering,
and running adminq commands.  We expect to add additional
operations for other clients, including requesting additional
private adminqs and IRQs, but don't have the need yet,

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/auxbus.c   | 144 +++++++++++++++++-
 include/linux/pds/pds_auxbus.h                |  51 +++++++
 2 files changed, 193 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/pensando/pds_core/auxbus.c b/drivers/net/ethernet/pensando/pds_core/auxbus.c
index 9b67cb4006d9..1fcfe8ae9971 100644
--- a/drivers/net/ethernet/pensando/pds_core/auxbus.c
+++ b/drivers/net/ethernet/pensando/pds_core/auxbus.c
@@ -11,6 +11,144 @@
 #include <linux/pds/pds_adminq.h>
 #include <linux/pds/pds_auxbus.h>
 
+/**
+ * pds_client_register - Register the client with the device
+ * @padev:  ptr to the client device info
+ * @padrv:  ptr to the client driver info
+ *
+ * Register the client with the core and with the DSC.  The core
+ * will fill in the client padev->client_id for use in calls
+ * to the DSC AdminQ
+ */
+static int pds_client_register(struct pds_auxiliary_dev *padev,
+			       struct pds_auxiliary_drv *padrv)
+{
+	union pds_core_adminq_comp comp = { 0 };
+	union pds_core_adminq_cmd cmd = { 0 };
+	struct device *dev;
+	struct pdsc *pdsc;
+	int err;
+	u16 ci;
+
+	pdsc = (struct pdsc *)dev_get_drvdata(padev->aux_dev.dev.parent);
+	dev = pdsc->dev;
+
+	if (pdsc->state)
+		return -ENXIO;
+
+	cmd.client_reg.opcode = PDS_AQ_CMD_CLIENT_REG;
+	strscpy(cmd.client_reg.devname, dev_name(&padev->aux_dev.dev),
+		sizeof(cmd.client_reg.devname));
+
+	err = pdsc_adminq_post(pdsc, &pdsc->adminqcq, &cmd, &comp, false);
+	if (err) {
+		dev_info(dev, "register dev_name %s with DSC failed, status %d: %pe\n",
+			 dev_name(&padev->aux_dev.dev), comp.status, ERR_PTR(err));
+		return err;
+	}
+
+	ci = le16_to_cpu(comp.client_reg.client_id);
+	if (!ci) {
+		dev_err(dev, "%s: device returned null client_id\n", __func__);
+		return -EIO;
+	}
+
+	padev->client_id = ci;
+	padev->event_handler = padrv->event_handler;
+
+	return 0;
+}
+
+/**
+ * pds_client_unregister - Disconnect the client from the device
+ * @padev:  ptr to the client device info
+ *
+ * Disconnect the client from the core and with the DSC.
+ */
+static int pds_client_unregister(struct pds_auxiliary_dev *padev)
+{
+	union pds_core_adminq_comp comp = { 0 };
+	union pds_core_adminq_cmd cmd = { 0 };
+	struct device *dev;
+	struct pdsc *pdsc;
+	int err;
+
+	pdsc = (struct pdsc *)dev_get_drvdata(padev->aux_dev.dev.parent);
+	dev = pdsc->dev;
+
+	if (pdsc->state)
+		return -ENXIO;
+
+	cmd.client_unreg.opcode = PDS_AQ_CMD_CLIENT_UNREG;
+	cmd.client_unreg.client_id = cpu_to_le16(padev->client_id);
+
+	err = pdsc_adminq_post(pdsc, &pdsc->adminqcq, &cmd, &comp, false);
+	if (err)
+		dev_info(dev, "unregister dev_name %s failed, status %d: %pe\n",
+			 dev_name(&padev->aux_dev.dev), comp.status, ERR_PTR(err));
+
+	padev->client_id = 0;
+
+	return err;
+}
+
+/**
+ * pds_client_adminq_cmd - Process an adminq request for the client
+ * @padev:   ptr to the client device
+ * @req:     ptr to buffer with request
+ * @req_len: length of actual struct used for request
+ * @resp:    ptr to buffer where answer is to be copied
+ * @flags:   optional flags from pds_core_adminq_flags
+ *
+ * Return: 0 on success, or
+ *         negative for error
+ *
+ * Client sends pointers to request and response buffers
+ * Core copies request data into pds_core_client_request_cmd
+ * Core sets other fields as needed
+ * Core posts to AdminQ
+ * Core copies completion data into response buffer
+ */
+static int pds_client_adminq_cmd(struct pds_auxiliary_dev *padev,
+				 union pds_core_adminq_cmd *req,
+				 size_t req_len,
+				 union pds_core_adminq_comp *resp,
+				 u64 flags)
+{
+	union pds_core_adminq_cmd cmd = { 0 };
+	struct device *dev;
+	struct pdsc *pdsc;
+	size_t cp_len;
+	int err;
+
+	pdsc = (struct pdsc *)dev_get_drvdata(padev->aux_dev.dev.parent);
+	dev = pdsc->dev;
+
+	dev_dbg(dev, "%s: %s opcode %d\n",
+		__func__, dev_name(&padev->aux_dev.dev), req->opcode);
+
+	if (pdsc->state)
+		return -ENXIO;
+
+	/* Wrap the client's request */
+	cmd.client_request.opcode = PDS_AQ_CMD_CLIENT_CMD;
+	cmd.client_request.client_id = cpu_to_le16(padev->client_id);
+	cp_len = min_t(size_t, req_len, sizeof(cmd.client_request.client_cmd));
+	memcpy(cmd.client_request.client_cmd, req, cp_len);
+
+	err = pdsc_adminq_post(pdsc, &pdsc->adminqcq, &cmd, resp, !!(flags & PDS_AQ_FLAG_FASTPOLL));
+	if (err && err != -EAGAIN)
+		dev_info(dev, "client admin cmd failed: %pe\n", ERR_PTR(err));
+
+	return err;
+}
+
+static struct pds_core_ops pds_core_ops = {
+	.register_client = pds_client_register,
+	.unregister_client = pds_client_unregister,
+	.adminq_cmd = pds_client_adminq_cmd,
+};
+
 static void pdsc_auxbus_dev_release(struct device *dev)
 {
 	struct pds_auxiliary_dev *padev =
@@ -21,7 +159,8 @@ static void pdsc_auxbus_dev_release(struct device *dev)
 
 static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *pdsc,
 							  char *name, u32 id,
-							  struct pci_dev *client_dev)
+							  struct pci_dev *client_dev,
+							  struct pds_core_ops *ops)
 {
 	struct auxiliary_device *aux_dev;
 	struct pds_auxiliary_dev *padev;
@@ -31,6 +170,7 @@ static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *pdsc,
 	if (!padev)
 		return NULL;
 
+	padev->ops = ops;
 	padev->pcidev = client_dev;
 
 	aux_dev = &padev->aux_dev;
@@ -99,7 +239,7 @@ int pdsc_auxbus_dev_add_vf(struct pdsc *pdsc, int vf_id)
 		id = PCI_DEVID(pdsc->pdev->bus->number,
 			       pci_iov_virtfn_devfn(pdsc->pdev, vf_id));
 		padev = pdsc_auxbus_dev_register(pdsc, pdsc->viftype_status[vt].name, id,
-						 pdsc->pdev);
+						 pdsc->pdev, &pds_core_ops);
 		pdsc->vfs[vf_id].padev = padev;
 
 		/* We only support a single type per VF, so jump out here */
diff --git a/include/linux/pds/pds_auxbus.h b/include/linux/pds/pds_auxbus.h
index 7ad66d726b01..ac121b44c71a 100644
--- a/include/linux/pds/pds_auxbus.h
+++ b/include/linux/pds/pds_auxbus.h
@@ -27,6 +27,7 @@ struct pds_auxiliary_drv {
 
 struct pds_auxiliary_dev {
 	struct auxiliary_device aux_dev;
+	struct pds_core_ops *ops;
 	struct pci_dev *pcidev;
 	u32 id;
 	u16 client_id;
@@ -34,4 +35,54 @@ struct pds_auxiliary_dev {
 			      union pds_core_notifyq_comp *event);
 	void *priv;
 };
+
+struct pds_fw_state {
+	unsigned long last_fw_time;
+	u32 fw_heartbeat;
+	u8  fw_status;
+};
+
+/*
+ *   ptrs to functions to be used by the client for core services
+ */
+struct pds_core_ops {
+
+	/* .register() - register the client with the device
+	 * padev:  ptr to the client device info
+	 * padrv:  ptr to the client driver info
+	 * Register the client with the core and with the DSC.  The core
+	 * will fill in the client padrv->client_id for use in calls
+	 * to the DSC AdminQ
+	 */
+	int (*register_client)(struct pds_auxiliary_dev *padev,
+			       struct pds_auxiliary_drv *padrv);
+
+	/* .unregister() - disconnect the client from the device
+	 * padev:  ptr to the client device info
+	 * Disconnect the client from the core and with the DSC.
+	 */
+	int (*unregister_client)(struct pds_auxiliary_dev *padev);
+
+	/* .adminq_cmd() - process an adminq request for the client
+	 * padev:  ptr to the client device
+	 * req:     ptr to buffer with request
+	 * req_len: length of actual struct used for request
+	 * resp:    ptr to buffer where answer is to be copied
+	 * flags:   optional flags defined by enum pds_core_adminq_flags
+	 *	    and used for more flexible adminq behvior
+	 *
+	 * returns 0 on success, or
+	 *         negative for error
+	 * Client sends pointers to request and response buffers
+	 * Core copies request data into pds_core_client_request_cmd
+	 * Core sets other fields as needed
+	 * Core posts to AdminQ
+	 * Core copies completion data into response buffer
+	 */
+	int (*adminq_cmd)(struct pds_auxiliary_dev *padev,
+			  union pds_core_adminq_cmd *req,
+			  size_t req_len,
+			  union pds_core_adminq_comp *resp,
+			  u64 flags);
+};
 #endif /* _PDSC_AUXBUS_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 12/19] pds_core: publish events to the clients
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (10 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 11/19] pds_core: add the aux client API Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 13/19] pds_core: Kconfig and pds_core.rst Shannon Nelson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

When the Core device gets an event from the device, or notices
the device FW to be up or down, it needs to send those events
on to the clients that have an event handler.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../net/ethernet/pensando/pds_core/adminq.c   | 17 ++++++++
 .../net/ethernet/pensando/pds_core/auxbus.c   | 40 +++++++++++++++++++
 drivers/net/ethernet/pensando/pds_core/core.c | 15 +++++++
 drivers/net/ethernet/pensando/pds_core/core.h |  3 ++
 4 files changed, 75 insertions(+)

diff --git a/drivers/net/ethernet/pensando/pds_core/adminq.c b/drivers/net/ethernet/pensando/pds_core/adminq.c
index ba9e84a7ca92..4d2d69ce81f4 100644
--- a/drivers/net/ethernet/pensando/pds_core/adminq.c
+++ b/drivers/net/ethernet/pensando/pds_core/adminq.c
@@ -34,11 +34,13 @@ static int pdsc_process_notifyq(struct pdsc_qcq *qcq)
 		case PDS_EVENT_LINK_CHANGE:
 			dev_info(pdsc->dev, "NotifyQ LINK_CHANGE ecode %d eid %lld\n",
 				 ecode, eid);
+			pdsc_auxbus_publish(pdsc, PDSC_ALL_CLIENT_IDS, comp);
 			break;
 
 		case PDS_EVENT_RESET:
 			dev_info(pdsc->dev, "NotifyQ RESET ecode %d eid %lld\n",
 				 ecode, eid);
+			pdsc_auxbus_publish(pdsc, PDSC_ALL_CLIENT_IDS, comp);
 			break;
 
 		case PDS_EVENT_XCVR:
@@ -46,6 +48,21 @@ static int pdsc_process_notifyq(struct pdsc_qcq *qcq)
 				 ecode, eid);
 			break;
 
+		case PDS_EVENT_CLIENT:
+		{
+			struct pds_core_client_event *ce;
+			union pds_core_notifyq_comp *cc;
+			u16 client_id;
+
+			ce = (struct pds_core_client_event *)comp;
+			cc = (union pds_core_notifyq_comp *)&ce->client_event;
+			client_id = le16_to_cpu(ce->client_id);
+			dev_info(pdsc->dev, "NotifyQ CLIENT %d ecode %d eid %lld cc->ecode %d\n",
+				 client_id, ecode, eid, le16_to_cpu(cc->ecode));
+			pdsc_auxbus_publish(pdsc, client_id, cc);
+			break;
+		}
+
 		default:
 			dev_info(pdsc->dev, "NotifyQ ecode %d eid %lld\n",
 				 ecode, eid);
diff --git a/drivers/net/ethernet/pensando/pds_core/auxbus.c b/drivers/net/ethernet/pensando/pds_core/auxbus.c
index 1fcfe8ae9971..53c1164565b8 100644
--- a/drivers/net/ethernet/pensando/pds_core/auxbus.c
+++ b/drivers/net/ethernet/pensando/pds_core/auxbus.c
@@ -205,6 +205,46 @@ static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *pdsc,
 	return NULL;
 }
 
+static int pdsc_core_match(struct device *dev, const void *data)
+{
+	struct pds_auxiliary_dev *curr_padev;
+	struct pdsc *curr_pdsc;
+	const struct pdsc *pdsc;
+
+	/* Match the core device searching for its clients */
+	curr_padev = container_of(dev, struct pds_auxiliary_dev, aux_dev.dev);
+	curr_pdsc = (struct pdsc *)dev_get_drvdata(curr_padev->aux_dev.dev.parent);
+	pdsc = data;
+
+	if (curr_pdsc == pdsc)
+		return 1;
+
+	return 0;
+}
+
+int pdsc_auxbus_publish(struct pdsc *pdsc, u16 client_id,
+			union pds_core_notifyq_comp *event)
+{
+	struct pds_auxiliary_dev *padev;
+	struct auxiliary_device *aux_dev;
+
+	/* Search aux bus for this core's devices */
+	aux_dev = auxiliary_find_device(NULL, pdsc, pdsc_core_match);
+	while (aux_dev) {
+		padev = container_of(aux_dev, struct pds_auxiliary_dev, aux_dev);
+		if ((padev->client_id == client_id ||
+		     client_id == PDSC_ALL_CLIENT_IDS) &&
+		    padev->event_handler)
+			padev->event_handler(padev, event);
+		put_device(&aux_dev->dev);
+
+		aux_dev = auxiliary_find_device(&aux_dev->dev,
+						pdsc, pdsc_core_match);
+	}
+
+	return 0;
+}
+
 int pdsc_auxbus_dev_add_vf(struct pdsc *pdsc, int vf_id)
 {
 	struct pds_auxiliary_dev *padev;
diff --git a/drivers/net/ethernet/pensando/pds_core/core.c b/drivers/net/ethernet/pensando/pds_core/core.c
index 202f1a6b4605..d1ef6acd8dd0 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.c
+++ b/drivers/net/ethernet/pensando/pds_core/core.c
@@ -532,6 +532,11 @@ void pdsc_stop(struct pdsc *pdsc)
 
 static void pdsc_fw_down(struct pdsc *pdsc)
 {
+	union pds_core_notifyq_comp reset_event = {
+		.reset.ecode = cpu_to_le16(PDS_EVENT_RESET),
+		.reset.state = 0,
+	};
+
 	mutex_lock(&pdsc->config_lock);
 
 	if (test_and_set_bit(PDSC_S_FW_DEAD, &pdsc->state)) {
@@ -540,6 +545,9 @@ static void pdsc_fw_down(struct pdsc *pdsc)
 		return;
 	}
 
+	/* Notify clients of fw_down */
+	pdsc_auxbus_publish(pdsc, PDSC_ALL_CLIENT_IDS, &reset_event);
+
 	netif_device_detach(pdsc->netdev);
 	pdsc_mask_interrupts(pdsc);
 	pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY);
@@ -549,6 +557,10 @@ static void pdsc_fw_down(struct pdsc *pdsc)
 
 static void pdsc_fw_up(struct pdsc *pdsc)
 {
+	union pds_core_notifyq_comp reset_event = {
+		.reset.ecode = cpu_to_le16(PDS_EVENT_RESET),
+		.reset.state = 1,
+	};
 	int err;
 
 	mutex_lock(&pdsc->config_lock);
@@ -573,6 +585,9 @@ static void pdsc_fw_up(struct pdsc *pdsc)
 
 	pdsc_vf_attr_replay(pdsc);
 
+	/* Notify clients of fw_up */
+	pdsc_auxbus_publish(pdsc, PDSC_ALL_CLIENT_IDS, &reset_event);
+
 	return;
 
 err_out:
diff --git a/drivers/net/ethernet/pensando/pds_core/core.h b/drivers/net/ethernet/pensando/pds_core/core.h
index 3ab314217464..25f09f4f035d 100644
--- a/drivers/net/ethernet/pensando/pds_core/core.h
+++ b/drivers/net/ethernet/pensando/pds_core/core.h
@@ -321,6 +321,9 @@ int pdsc_start(struct pdsc *pdsc);
 void pdsc_stop(struct pdsc *pdsc);
 void pdsc_health_thread(struct work_struct *work);
 
+#define PDSC_ALL_CLIENT_IDS   0xffff
+int pdsc_auxbus_publish(struct pdsc *pdsc, u16 client_id,
+			union pds_core_notifyq_comp *event);
 int pdsc_auxbus_dev_add_vf(struct pdsc *pdsc, int vf_id);
 int pdsc_auxbus_dev_del_vf(struct pdsc *pdsc, int vf_id);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 13/19] pds_core: Kconfig and pds_core.rst
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (11 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 12/19] pds_core: publish events to the clients Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services Shannon Nelson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Documentation and Kconfig hook for building the driver.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../ethernet/pensando/pds_core.rst            | 162 ++++++++++++++++++
 MAINTAINERS                                   |   3 +-
 drivers/net/ethernet/pensando/Kconfig         |  12 ++
 3 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_core.rst

diff --git a/Documentation/networking/device_drivers/ethernet/pensando/pds_core.rst b/Documentation/networking/device_drivers/ethernet/pensando/pds_core.rst
new file mode 100644
index 000000000000..9c2c0c866e0a
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/pensando/pds_core.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0+
+.. note: can be edited and viewed with /usr/bin/formiko-vim
+
+========================================================
+Linux Driver for the Pensando(R) DSC adapter family
+========================================================
+
+Pensando Linux Core driver.
+Copyright(c) 2022 Pensando Systems, Inc
+
+Identifying the Adapter
+=======================
+
+To find if one or more Pensando PCI Core devices are installed on the
+host, check for the PCI devices::
+
+  # lspci -d 1dd8:100c
+  39:00.0 Processing accelerators: Pensando Systems Device 100c
+  3a:00.0 Processing accelerators: Pensando Systems Device 100c
+
+If such devices are listed as above, then the pds_core.ko driver should find
+and configure them for use.  There should be log entries in the kernel
+messages such as these::
+
+  $ dmesg | grep pds_core
+  pds_core 0000:b5:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
+  pds_core 0000:b5:00.0: FW: 1.51.0-73
+  pds_core 0000:b6:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
+  pds_core 0000:b5:00.0: FW: 1.51.0-73
+
+Driver and firmware version information can be gathered with devlink::
+
+  $ devlink dev info pci/0000:b5:00.0
+  pci/0000:b5:00.0:
+    driver pds_core
+    serial_number FLM18420073
+    versions:
+        fixed:
+          asic.id 0x0
+          asic.rev 0x0
+        running:
+          fw 1.51.0-73
+        stored:
+          fw.goldfw 1.15.9-C-22
+          fw.mainfwa 1.51.0-73
+          fw.mainfwb 1.51.0-57
+
+
+Info versions
+=============
+
+The ``pds_core`` driver reports the following versions
+
+.. list-table:: devlink info versions implemented
+   :widths: 5 5 90
+
+   * - Name
+     - Type
+     - Description
+   * - ``fw``
+     - running
+     - Version of firmware running on the device
+   * - ``fw.goldfw``
+     - stored
+     - Version of firmware stored in the goldfw slot
+   * - ``fw.mainfwa``
+     - stored
+     - Version of firmware stored in the mainfwa slot
+   * - ``fw.mainfwb``
+     - stored
+     - Version of firmware stored in the mainfwb slot
+   * - ``asic.id``
+     - fixed
+     - The ASIC type for this device
+   * - ``asic.rev``
+     - fixed
+     - The revision of the ASIC for this device
+
+
+Parameters
+==========
+
+The ``pds_core`` driver implements the following generic
+parameters for controlling the functionality to be made available
+as auxiliary_bus devices.
+
+.. list-table:: Generic parameters implemented
+   :widths: 5 5 8 82
+
+   * - Name
+     - Mode
+     - Type
+     - Description
+   * - ``enable_vnet``
+     - runtime
+     - Boolean
+     - Enables vDPA functionality through an auxiliary_bus device
+
+
+The ``pds_core`` driver also implements the following driver-specific
+parameters for similar uses, as well as for selecting the next boot firmware:
+
+.. list-table:: Driver-specific parameters implemented
+   :widths: 5 5 8 82
+
+   * - Name
+     - Mode
+     - Type
+     - Description
+   * - ``enable_lm``
+     - runtime
+     - Boolean
+     - Enables Live Migration functionality through an auxiliary_bus device
+   * - ``boot_fw``
+     - runtime
+     - String
+     - Selects the Firmware slot to use for the next DSC boot
+
+
+Firmware Management
+===================
+
+Using the ``devlink`` utility's ``flash`` command the DSC firmware can be
+updated.  The downloaded firmware will be loaded into either of mainfwa or
+mainfwb firmware slots, whichever is not currrently in use, and that slot
+will be then selected for the next boot.  The firmware currently in use can
+be found by inspecting the ``running`` firmware from the devlink dev info.
+
+The ``boot_fw`` parameter can inspect and select the firmware slot to be
+used in the next DSC boot up.  The mainfwa and mainfwb slots are used for
+for normal operations, and the goldfw slot should only be selected for
+recovery purposes if both the other slots have bad or corrupted firmware.
+
+
+Enabling the driver
+===================
+
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+  make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+  -> Device Drivers
+    -> Network device support (NETDEVICES [=y])
+      -> Ethernet driver support
+        -> Pensando devices
+          -> Pensando Ethernet PDS_CORE Support
+
+Support
+=======
+
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by Pensando personnel::
+
+  netdev@vger.kernel.org
+
+For more specific support needs, please use the Pensando driver support
+email::
+
+  drivers@pensando.io
diff --git a/MAINTAINERS b/MAINTAINERS
index 14ee1c72d01a..a4f989fa8192 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16150,8 +16150,9 @@ M:	Shannon Nelson <snelson@pensando.io>
 M:	drivers@pensando.io
 L:	netdev@vger.kernel.org
 S:	Supported
-F:	Documentation/networking/device_drivers/ethernet/pensando/ionic.rst
+F:	Documentation/networking/device_drivers/ethernet/pensando/
 F:	drivers/net/ethernet/pensando/
+F:	include/linux/pds/
 
 PER-CPU MEMORY ALLOCATOR
 M:	Dennis Zhou <dennis@kernel.org>
diff --git a/drivers/net/ethernet/pensando/Kconfig b/drivers/net/ethernet/pensando/Kconfig
index 3f7519e435b8..d9e8973d54f6 100644
--- a/drivers/net/ethernet/pensando/Kconfig
+++ b/drivers/net/ethernet/pensando/Kconfig
@@ -17,6 +17,18 @@ config NET_VENDOR_PENSANDO
 
 if NET_VENDOR_PENSANDO
 
+config PDS_CORE
+	tristate "Pensando Data Systems Core Device Support"
+	depends on 64BIT && PCI
+	help
+	  This enables the support for the Pensando Core device family of
+	  adapters.  More specific information on this driver can be
+	  found in
+	  <file:Documentation/networking/device_drivers/ethernet/pensando/pds_core.rst>.
+
+	  To compile this driver as a module, choose M here. The module
+	  will be called pds_core.
+
 config IONIC
 	tristate "Pensando Ethernet IONIC Support"
 	depends on 64BIT && PCI
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (12 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 13/19] pds_core: Kconfig and pds_core.rst Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-22  3:53   ` Jason Wang
  2022-11-18 22:56 ` [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa Shannon Nelson
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

This is the initial PCI driver framework for the new pds_vdpa VF
device driver, an auxiliary_bus client of the pds_core driver.
This does the very basics of registering for the new PCI
device 1dd8:100b, setting up debugfs entries, and registering
with devlink.

The new PCI device id has not made it to the official PCI ID Repository
yet, but will soon be registered there.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/vdpa/pds/Makefile       |   7 +
 drivers/vdpa/pds/debugfs.c      |  44 +++++++
 drivers/vdpa/pds/debugfs.h      |  22 ++++
 drivers/vdpa/pds/pci_drv.c      | 143 +++++++++++++++++++++
 drivers/vdpa/pds/pci_drv.h      |  46 +++++++
 include/linux/pds/pds_core_if.h |   1 +
 include/linux/pds/pds_vdpa.h    | 219 ++++++++++++++++++++++++++++++++
 7 files changed, 482 insertions(+)
 create mode 100644 drivers/vdpa/pds/Makefile
 create mode 100644 drivers/vdpa/pds/debugfs.c
 create mode 100644 drivers/vdpa/pds/debugfs.h
 create mode 100644 drivers/vdpa/pds/pci_drv.c
 create mode 100644 drivers/vdpa/pds/pci_drv.h
 create mode 100644 include/linux/pds/pds_vdpa.h

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
new file mode 100644
index 000000000000..3ba28a875574
--- /dev/null
+++ b/drivers/vdpa/pds/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright(c) 2022 Pensando Systems, Inc
+
+obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
+
+pds_vdpa-y := pci_drv.o	\
+	      debugfs.o
diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
new file mode 100644
index 000000000000..f5b6654ae89b
--- /dev/null
+++ b/drivers/vdpa/pds/debugfs.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/types.h>
+
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_vdpa.h>
+
+#include "pci_drv.h"
+#include "debugfs.h"
+
+#ifdef CONFIG_DEBUG_FS
+
+static struct dentry *dbfs_dir;
+
+void
+pds_vdpa_debugfs_create(void)
+{
+	dbfs_dir = debugfs_create_dir(PDS_VDPA_DRV_NAME, NULL);
+}
+
+void
+pds_vdpa_debugfs_destroy(void)
+{
+	debugfs_remove_recursive(dbfs_dir);
+	dbfs_dir = NULL;
+}
+
+void
+pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
+{
+	vdpa_pdev->dentry = debugfs_create_dir(pci_name(vdpa_pdev->pdev), dbfs_dir);
+}
+
+void
+pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
+{
+	debugfs_remove_recursive(vdpa_pdev->dentry);
+	vdpa_pdev->dentry = NULL;
+}
+
+#endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
new file mode 100644
index 000000000000..ac31ab47746b
--- /dev/null
+++ b/drivers/vdpa/pds/debugfs.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _PDS_VDPA_DEBUGFS_H_
+#define _PDS_VDPA_DEBUGFS_H_
+
+#include <linux/debugfs.h>
+
+#ifdef CONFIG_DEBUG_FS
+
+void pds_vdpa_debugfs_create(void);
+void pds_vdpa_debugfs_destroy(void);
+void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
+void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
+#else
+static inline void pds_vdpa_debugfs_create(void) { }
+static inline void pds_vdpa_debugfs_destroy(void) { }
+static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
+static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
+#endif
+
+#endif /* _PDS_VDPA_DEBUGFS_H_ */
diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
new file mode 100644
index 000000000000..369e11153f21
--- /dev/null
+++ b/drivers/vdpa/pds/pci_drv.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/aer.h>
+#include <linux/types.h>
+#include <linux/vdpa.h>
+
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_vdpa.h>
+
+#include "pci_drv.h"
+#include "debugfs.h"
+
+static void
+pds_vdpa_dma_action(void *data)
+{
+	pci_free_irq_vectors((struct pci_dev *)data);
+}
+
+static int
+pds_vdpa_pci_probe(struct pci_dev *pdev,
+		   const struct pci_device_id *id)
+{
+	struct pds_vdpa_pci_device *vdpa_pdev;
+	struct device *dev = &pdev->dev;
+	int err;
+
+	vdpa_pdev = kzalloc(sizeof(*vdpa_pdev), GFP_KERNEL);
+	if (!vdpa_pdev)
+		return -ENOMEM;
+	pci_set_drvdata(pdev, vdpa_pdev);
+
+	vdpa_pdev->pdev = pdev;
+	vdpa_pdev->vf_id = pci_iov_vf_id(pdev);
+	vdpa_pdev->pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
+
+	/* Query system for DMA addressing limitation for the device. */
+	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(PDS_CORE_ADDR_LEN));
+	if (err) {
+		dev_err(dev, "Unable to obtain 64-bit DMA for consistent allocations, aborting. %pe\n",
+			ERR_PTR(err));
+		goto err_out_free_mem;
+	}
+
+	pci_enable_pcie_error_reporting(pdev);
+
+	/* Use devres management */
+	err = pcim_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Cannot enable PCI device: %pe\n", ERR_PTR(err));
+		goto err_out_free_mem;
+	}
+
+	err = devm_add_action_or_reset(dev, pds_vdpa_dma_action, pdev);
+	if (err) {
+		dev_err(dev, "Failed adding devres for freeing irq vectors: %pe\n",
+			ERR_PTR(err));
+		goto err_out_pci_release_device;
+	}
+
+	pci_set_master(pdev);
+
+	pds_vdpa_debugfs_add_pcidev(vdpa_pdev);
+
+	dev_info(dev, "%s: PF %#04x VF %#04x (%d) vf_id %d domain %d vdpa_aux %p vdpa_pdev %p\n",
+		 __func__, pci_dev_id(vdpa_pdev->pdev->physfn),
+		 vdpa_pdev->pci_id, vdpa_pdev->pci_id, vdpa_pdev->vf_id,
+		 pci_domain_nr(pdev->bus), vdpa_pdev->vdpa_aux, vdpa_pdev);
+
+	return 0;
+
+err_out_pci_release_device:
+	pci_disable_device(pdev);
+err_out_free_mem:
+	pci_disable_pcie_error_reporting(pdev);
+	kfree(vdpa_pdev);
+	return err;
+}
+
+static void
+pds_vdpa_pci_remove(struct pci_dev *pdev)
+{
+	struct pds_vdpa_pci_device *vdpa_pdev = pci_get_drvdata(pdev);
+
+	pds_vdpa_debugfs_del_pcidev(vdpa_pdev);
+	pci_clear_master(pdev);
+	pci_disable_pcie_error_reporting(pdev);
+	pci_disable_device(pdev);
+	kfree(vdpa_pdev);
+
+	dev_info(&pdev->dev, "Removed\n");
+}
+
+static const struct pci_device_id
+pds_vdpa_pci_table[] = {
+	{ PCI_VDEVICE(PENSANDO, PCI_DEVICE_ID_PENSANDO_VDPA_VF) },
+	{ 0, }
+};
+MODULE_DEVICE_TABLE(pci, pds_vdpa_pci_table);
+
+static struct pci_driver
+pds_vdpa_pci_driver = {
+	.name = PDS_VDPA_DRV_NAME,
+	.id_table = pds_vdpa_pci_table,
+	.probe = pds_vdpa_pci_probe,
+	.remove = pds_vdpa_pci_remove
+};
+
+static void __exit
+pds_vdpa_pci_cleanup(void)
+{
+	pci_unregister_driver(&pds_vdpa_pci_driver);
+
+	pds_vdpa_debugfs_destroy();
+}
+module_exit(pds_vdpa_pci_cleanup);
+
+static int __init
+pds_vdpa_pci_init(void)
+{
+	int err;
+
+	pds_vdpa_debugfs_create();
+
+	err = pci_register_driver(&pds_vdpa_pci_driver);
+	if (err) {
+		pr_err("%s: pci driver register failed: %pe\n", __func__, ERR_PTR(err));
+		goto err_pci;
+	}
+
+	return 0;
+
+err_pci:
+	pds_vdpa_debugfs_destroy();
+	return err;
+}
+module_init(pds_vdpa_pci_init);
+
+MODULE_DESCRIPTION(PDS_VDPA_DRV_DESCRIPTION);
+MODULE_AUTHOR("Pensando Systems, Inc");
+MODULE_LICENSE("GPL");
diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
new file mode 100644
index 000000000000..747809e0df9e
--- /dev/null
+++ b/drivers/vdpa/pds/pci_drv.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _PCI_DRV_H
+#define _PCI_DRV_H
+
+#include <linux/pci.h>
+#include <linux/virtio_pci_modern.h>
+
+#define PDS_VDPA_DRV_NAME           "pds_vdpa"
+#define PDS_VDPA_DRV_DESCRIPTION    "Pensando vDPA VF Device Driver"
+
+#define PDS_VDPA_BAR_BASE	0
+#define PDS_VDPA_BAR_INTR	2
+#define PDS_VDPA_BAR_DBELL	4
+
+struct pds_dev_bar {
+	int           index;
+	void __iomem  *vaddr;
+	phys_addr_t   pa;
+	unsigned long len;
+};
+
+struct pds_vdpa_intr_info {
+	int index;
+	int irq;
+	int qid;
+	char name[32];
+};
+
+struct pds_vdpa_pci_device {
+	struct pci_dev *pdev;
+	struct pds_vdpa_aux *vdpa_aux;
+
+	int vf_id;
+	int pci_id;
+
+	int nintrs;
+	struct pds_vdpa_intr_info *intrs;
+
+	struct dentry *dentry;
+
+	struct virtio_pci_modern_device vd_mdev;
+};
+
+#endif /* _PCI_DRV_H */
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
index 6333ec351e14..6e92697657e4 100644
--- a/include/linux/pds/pds_core_if.h
+++ b/include/linux/pds/pds_core_if.h
@@ -8,6 +8,7 @@
 
 #define PCI_VENDOR_ID_PENSANDO			0x1dd8
 #define PCI_DEVICE_ID_PENSANDO_CORE_PF		0x100c
+#define PCI_DEVICE_ID_PENSANDO_VDPA_VF          0x100b
 
 #define PDS_CORE_BARS_MAX			4
 #define PDS_CORE_PCI_BAR_DBELL			1
diff --git a/include/linux/pds/pds_vdpa.h b/include/linux/pds/pds_vdpa.h
new file mode 100644
index 000000000000..7ecef890f175
--- /dev/null
+++ b/include/linux/pds/pds_vdpa.h
@@ -0,0 +1,219 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _PDS_VDPA_IF_H_
+#define _PDS_VDPA_IF_H_
+
+#include <linux/pds/pds_common.h>
+
+#define PDS_DEV_TYPE_VDPA_STR	"vDPA"
+#define PDS_VDPA_DEV_NAME	PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR
+
+/*
+ * enum pds_vdpa_cmd_opcode - vDPA Device commands
+ */
+enum pds_vdpa_cmd_opcode {
+	PDS_VDPA_CMD_INIT		= 48,
+	PDS_VDPA_CMD_IDENT		= 49,
+	PDS_VDPA_CMD_RESET		= 51,
+	PDS_VDPA_CMD_VQ_RESET		= 52,
+	PDS_VDPA_CMD_VQ_INIT		= 53,
+	PDS_VDPA_CMD_STATUS_UPDATE	= 54,
+	PDS_VDPA_CMD_SET_FEATURES	= 55,
+	PDS_VDPA_CMD_SET_ATTR		= 56,
+};
+
+/**
+ * struct pds_vdpa_cmd - generic command
+ * @opcode:	Opcode
+ * @vdpa_index:	Index for vdpa subdevice
+ * @vf_id:	VF id
+ */
+struct pds_vdpa_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+};
+
+/**
+ * struct pds_vdpa_comp - generic command completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @rsvd:	Word boundary padding
+ * @color:	Color bit
+ */
+struct pds_vdpa_comp {
+	u8 status;
+	u8 rsvd[14];
+	u8 color;
+};
+
+/**
+ * struct pds_vdpa_init_cmd - INIT command
+ * @opcode:	Opcode PDS_VDPA_CMD_INIT
+ * @vdpa_index: Index for vdpa subdevice
+ * @vf_id:	VF id
+ * @len:	length of config info DMA space
+ * @config_pa:	address for DMA of virtio_net_config struct
+ */
+struct pds_vdpa_init_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+	__le32 len;
+	__le64 config_pa;
+};
+
+/**
+ * struct pds_vdpa_ident - vDPA identification data
+ * @hw_features:	vDPA features supported by device
+ * @max_vqs:		max queues available (2 queues for a single queuepair)
+ * @max_qlen:		log(2) of maximum number of descriptors
+ * @min_qlen:		log(2) of minimum number of descriptors
+ *
+ * This struct is used in a DMA block that is set up for the PDS_VDPA_CMD_IDENT
+ * transaction.  Set up the DMA block and send the address in the IDENT cmd
+ * data, the DSC will write the ident information, then we can remove the DMA
+ * block after reading the answer.  If the completion status is 0, then there
+ * is valid information, else there was an error and the data should be invalid.
+ */
+struct pds_vdpa_ident {
+	__le64 hw_features;
+	__le16 max_vqs;
+	__le16 max_qlen;
+	__le16 min_qlen;
+};
+
+/**
+ * struct pds_vdpa_ident_cmd - IDENT command
+ * @opcode:	Opcode PDS_VDPA_CMD_IDENT
+ * @rsvd:       Word boundary padding
+ * @vf_id:	VF id
+ * @len:	length of ident info DMA space
+ * @ident_pa:	address for DMA of ident info (struct pds_vdpa_ident)
+ *			only used for this transaction, then forgotten by DSC
+ */
+struct pds_vdpa_ident_cmd {
+	u8     opcode;
+	u8     rsvd;
+	__le16 vf_id;
+	__le32 len;
+	__le64 ident_pa;
+};
+
+/**
+ * struct pds_vdpa_status_cmd - STATUS_UPDATE command
+ * @opcode:	Opcode PDS_VDPA_CMD_STATUS_UPDATE
+ * @vdpa_index: Index for vdpa subdevice
+ * @vf_id:	VF id
+ * @status:	new status bits
+ */
+struct pds_vdpa_status_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+	u8     status;
+};
+
+/**
+ * enum pds_vdpa_attr - List of VDPA device attributes
+ * @PDS_VDPA_ATTR_MAC:          MAC address
+ * @PDS_VDPA_ATTR_MAX_VQ_PAIRS: Max virtqueue pairs
+ */
+enum pds_vdpa_attr {
+	PDS_VDPA_ATTR_MAC          = 1,
+	PDS_VDPA_ATTR_MAX_VQ_PAIRS = 2,
+};
+
+/**
+ * struct pds_vdpa_setattr_cmd - SET_ATTR command
+ * @opcode:		Opcode PDS_VDPA_CMD_SET_ATTR
+ * @vdpa_index:		Index for vdpa subdevice
+ * @vf_id:		VF id
+ * @attr:		attribute to be changed (enum pds_vdpa_attr)
+ * @pad:		Word boundary padding
+ * @mac:		new mac address to be assigned as vdpa device address
+ * @max_vq_pairs:	new limit of virtqueue pairs
+ */
+struct pds_vdpa_setattr_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+	u8     attr;
+	u8     pad[3];
+	union {
+		u8 mac[6];
+		__le16 max_vq_pairs;
+	} __packed;
+};
+
+/**
+ * struct pds_vdpa_vq_init_cmd - queue init command
+ * @opcode: Opcode PDS_VDPA_CMD_VQ_INIT
+ * @vdpa_index:	Index for vdpa subdevice
+ * @vf_id:	VF id
+ * @qid:	Queue id (bit0 clear = rx, bit0 set = tx, qid=N is ctrlq)
+ * @len:	log(2) of max descriptor count
+ * @desc_addr:	DMA address of descriptor area
+ * @avail_addr:	DMA address of available descriptors (aka driver area)
+ * @used_addr:	DMA address of used descriptors (aka device area)
+ * @intr_index:	interrupt index
+ */
+struct pds_vdpa_vq_init_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+	__le16 qid;
+	__le16 len;
+	__le64 desc_addr;
+	__le64 avail_addr;
+	__le64 used_addr;
+	__le16 intr_index;
+};
+
+/**
+ * struct pds_vdpa_vq_init_comp - queue init completion
+ * @status:	Status of the command (enum pds_core_status_code)
+ * @hw_qtype:	HW queue type, used in doorbell selection
+ * @hw_qindex:	HW queue index, used in doorbell selection
+ * @rsvd:	Word boundary padding
+ * @color:	Color bit
+ */
+struct pds_vdpa_vq_init_comp {
+	u8     status;
+	u8     hw_qtype;
+	__le16 hw_qindex;
+	u8     rsvd[11];
+	u8     color;
+};
+
+/**
+ * struct pds_vdpa_vq_reset_cmd - queue reset command
+ * @opcode:	Opcode PDS_VDPA_CMD_VQ_RESET
+ * @vdpa_index:	Index for vdpa subdevice
+ * @vf_id:	VF id
+ * @qid:	Queue id
+ */
+struct pds_vdpa_vq_reset_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+	__le16 qid;
+};
+
+/**
+ * struct pds_vdpa_set_features_cmd - set hw features
+ * @opcode: Opcode PDS_VDPA_CMD_SET_FEATURES
+ * @vdpa_index:	Index for vdpa subdevice
+ * @vf_id:	VF id
+ * @rsvd:       Word boundary padding
+ * @features:	Feature bit mask
+ */
+struct pds_vdpa_set_features_cmd {
+	u8     opcode;
+	u8     vdpa_index;
+	__le16 vf_id;
+	__le32 rsvd;
+	__le64 features;
+};
+
+#endif /* _PDS_VDPA_IF_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (13 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-22  3:32   ` Jason Wang
  2022-11-18 22:56 ` [RFC PATCH net-next 16/19] pds_vdpa: add auxiliary driver Shannon Nelson
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

The PDS vDPA device has a virtio BAR for describing itself, and
the pds_vdpa driver needs to access it.  Here we copy liberally
from the existing drivers/virtio/virtio_pci_modern_dev.c as it
has what we need, but we need to modify it so that it can work
with our device id and so we can use our own DMA mask.

We suspect there is room for discussion here about making the
existing code a little more flexible, but we thought we'd at
least start the discussion here.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/vdpa/pds/Makefile     |   3 +-
 drivers/vdpa/pds/pci_drv.c    |  10 ++
 drivers/vdpa/pds/pci_drv.h    |   2 +
 drivers/vdpa/pds/virtio_pci.c | 283 ++++++++++++++++++++++++++++++++++
 4 files changed, 297 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vdpa/pds/virtio_pci.c

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
index 3ba28a875574..b8376ab165bc 100644
--- a/drivers/vdpa/pds/Makefile
+++ b/drivers/vdpa/pds/Makefile
@@ -4,4 +4,5 @@
 obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
 
 pds_vdpa-y := pci_drv.o	\
-	      debugfs.o
+	      debugfs.o \
+	      virtio_pci.o
diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
index 369e11153f21..10491e22778c 100644
--- a/drivers/vdpa/pds/pci_drv.c
+++ b/drivers/vdpa/pds/pci_drv.c
@@ -44,6 +44,14 @@ pds_vdpa_pci_probe(struct pci_dev *pdev,
 		goto err_out_free_mem;
 	}
 
+	vdpa_pdev->vd_mdev.pci_dev = pdev;
+	err = pds_vdpa_probe_virtio(&vdpa_pdev->vd_mdev);
+	if (err) {
+		dev_err(dev, "Unable to probe for virtio configuration: %pe\n",
+			ERR_PTR(err));
+		goto err_out_free_mem;
+	}
+
 	pci_enable_pcie_error_reporting(pdev);
 
 	/* Use devres management */
@@ -74,6 +82,7 @@ pds_vdpa_pci_probe(struct pci_dev *pdev,
 err_out_pci_release_device:
 	pci_disable_device(pdev);
 err_out_free_mem:
+	pds_vdpa_remove_virtio(&vdpa_pdev->vd_mdev);
 	pci_disable_pcie_error_reporting(pdev);
 	kfree(vdpa_pdev);
 	return err;
@@ -88,6 +97,7 @@ pds_vdpa_pci_remove(struct pci_dev *pdev)
 	pci_clear_master(pdev);
 	pci_disable_pcie_error_reporting(pdev);
 	pci_disable_device(pdev);
+	pds_vdpa_remove_virtio(&vdpa_pdev->vd_mdev);
 	kfree(vdpa_pdev);
 
 	dev_info(&pdev->dev, "Removed\n");
diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
index 747809e0df9e..15f3b34fafa9 100644
--- a/drivers/vdpa/pds/pci_drv.h
+++ b/drivers/vdpa/pds/pci_drv.h
@@ -43,4 +43,6 @@ struct pds_vdpa_pci_device {
 	struct virtio_pci_modern_device vd_mdev;
 };
 
+int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev);
+void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev);
 #endif /* _PCI_DRV_H */
diff --git a/drivers/vdpa/pds/virtio_pci.c b/drivers/vdpa/pds/virtio_pci.c
new file mode 100644
index 000000000000..0f4ac9467199
--- /dev/null
+++ b/drivers/vdpa/pds/virtio_pci.c
@@ -0,0 +1,283 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/*
+ * adapted from drivers/virtio/virtio_pci_modern_dev.c, v6.0-rc1
+ */
+
+#include <linux/virtio_pci_modern.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+
+#include "pci_drv.h"
+
+/*
+ * pds_vdpa_map_capability - map a part of virtio pci capability
+ * @mdev: the modern virtio-pci device
+ * @off: offset of the capability
+ * @minlen: minimal length of the capability
+ * @align: align requirement
+ * @start: start from the capability
+ * @size: map size
+ * @len: the length that is actually mapped
+ * @pa: physical address of the capability
+ *
+ * Returns the io address of for the part of the capability
+ */
+static void __iomem *
+pds_vdpa_map_capability(struct virtio_pci_modern_device *mdev, int off,
+			 size_t minlen, u32 align, u32 start, u32 size,
+			 size_t *len, resource_size_t *pa)
+{
+	struct pci_dev *dev = mdev->pci_dev;
+	u8 bar;
+	u32 offset, length;
+	void __iomem *p;
+
+	pci_read_config_byte(dev, off + offsetof(struct virtio_pci_cap,
+						 bar),
+			     &bar);
+	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, offset),
+			     &offset);
+	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
+			      &length);
+
+	/* Check if the BAR may have changed since we requested the region. */
+	if (bar >= PCI_STD_NUM_BARS || !(mdev->modern_bars & (1 << bar))) {
+		dev_err(&dev->dev,
+			"virtio_pci: bar unexpectedly changed to %u\n", bar);
+		return NULL;
+	}
+
+	if (length <= start) {
+		dev_err(&dev->dev,
+			"virtio_pci: bad capability len %u (>%u expected)\n",
+			length, start);
+		return NULL;
+	}
+
+	if (length - start < minlen) {
+		dev_err(&dev->dev,
+			"virtio_pci: bad capability len %u (>=%zu expected)\n",
+			length, minlen);
+		return NULL;
+	}
+
+	length -= start;
+
+	if (start + offset < offset) {
+		dev_err(&dev->dev,
+			"virtio_pci: map wrap-around %u+%u\n",
+			start, offset);
+		return NULL;
+	}
+
+	offset += start;
+
+	if (offset & (align - 1)) {
+		dev_err(&dev->dev,
+			"virtio_pci: offset %u not aligned to %u\n",
+			offset, align);
+		return NULL;
+	}
+
+	if (length > size)
+		length = size;
+
+	if (len)
+		*len = length;
+
+	if (minlen + offset < minlen ||
+	    minlen + offset > pci_resource_len(dev, bar)) {
+		dev_err(&dev->dev,
+			"virtio_pci: map virtio %zu@%u out of range on bar %i length %lu\n",
+			minlen, offset,
+			bar, (unsigned long)pci_resource_len(dev, bar));
+		return NULL;
+	}
+
+	p = pci_iomap_range(dev, bar, offset, length);
+	if (!p)
+		dev_err(&dev->dev,
+			"virtio_pci: unable to map virtio %u@%u on bar %i\n",
+			length, offset, bar);
+	else if (pa)
+		*pa = pci_resource_start(dev, bar) + offset;
+
+	return p;
+}
+
+/**
+ * virtio_pci_find_capability - walk capabilities to find device info.
+ * @dev: the pci device
+ * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
+ * @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
+ * @bars: the bitmask of BARs
+ *
+ * Returns offset of the capability, or 0.
+ */
+static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
+					     u32 ioresource_types, int *bars)
+{
+	int pos;
+
+	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+	     pos > 0;
+	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+		u8 type, bar;
+
+		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
+							 cfg_type),
+				     &type);
+		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
+							 bar),
+				     &bar);
+
+		/* Ignore structures with reserved BAR values */
+		if (bar >= PCI_STD_NUM_BARS)
+			continue;
+
+		if (type == cfg_type) {
+			if (pci_resource_len(dev, bar) &&
+			    pci_resource_flags(dev, bar) & ioresource_types) {
+				*bars |= (1 << bar);
+				return pos;
+			}
+		}
+	}
+	return 0;
+}
+
+/*
+ * pds_vdpa_probe_virtio: probe the modern virtio pci device, note that the
+ * caller is required to enable PCI device before calling this function.
+ * @mdev: the modern virtio-pci device
+ *
+ * Return 0 on succeed otherwise fail
+ */
+int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev)
+{
+	struct pci_dev *pci_dev = mdev->pci_dev;
+	int err, common, isr, notify, device;
+	u32 notify_length;
+	u32 notify_offset;
+
+	/* check for a common config: if not, use legacy mode (bar 0). */
+	common = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
+					    IORESOURCE_IO | IORESOURCE_MEM,
+					    &mdev->modern_bars);
+	if (!common) {
+		dev_info(&pci_dev->dev,
+			 "virtio_pci: missing common config\n");
+		return -ENODEV;
+	}
+
+	/* If common is there, these should be too... */
+	isr = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_ISR_CFG,
+					 IORESOURCE_IO | IORESOURCE_MEM,
+					 &mdev->modern_bars);
+	notify = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_NOTIFY_CFG,
+					    IORESOURCE_IO | IORESOURCE_MEM,
+					    &mdev->modern_bars);
+	if (!isr || !notify) {
+		dev_err(&pci_dev->dev,
+			"virtio_pci: missing capabilities %i/%i/%i\n",
+			common, isr, notify);
+		return -EINVAL;
+	}
+
+	/* Device capability is only mandatory for devices that have
+	 * device-specific configuration.
+	 */
+	device = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_DEVICE_CFG,
+					    IORESOURCE_IO | IORESOURCE_MEM,
+					    &mdev->modern_bars);
+
+	err = pci_request_selected_regions(pci_dev, mdev->modern_bars,
+					   "virtio-pci-modern");
+	if (err)
+		return err;
+
+	err = -EINVAL;
+	mdev->common = pds_vdpa_map_capability(mdev, common,
+				      sizeof(struct virtio_pci_common_cfg), 4,
+				      0, sizeof(struct virtio_pci_common_cfg),
+				      NULL, NULL);
+	if (!mdev->common)
+		goto err_map_common;
+	mdev->isr = pds_vdpa_map_capability(mdev, isr, sizeof(u8), 1,
+					     0, 1,
+					     NULL, NULL);
+	if (!mdev->isr)
+		goto err_map_isr;
+
+	/* Read notify_off_multiplier from config space. */
+	pci_read_config_dword(pci_dev,
+			      notify + offsetof(struct virtio_pci_notify_cap,
+						notify_off_multiplier),
+			      &mdev->notify_offset_multiplier);
+	/* Read notify length and offset from config space. */
+	pci_read_config_dword(pci_dev,
+			      notify + offsetof(struct virtio_pci_notify_cap,
+						cap.length),
+			      &notify_length);
+
+	pci_read_config_dword(pci_dev,
+			      notify + offsetof(struct virtio_pci_notify_cap,
+						cap.offset),
+			      &notify_offset);
+
+	/* We don't know how many VQs we'll map, ahead of the time.
+	 * If notify length is small, map it all now.
+	 * Otherwise, map each VQ individually later.
+	 */
+	if ((u64)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
+		mdev->notify_base = pds_vdpa_map_capability(mdev, notify,
+							     2, 2,
+							     0, notify_length,
+							     &mdev->notify_len,
+							     &mdev->notify_pa);
+		if (!mdev->notify_base)
+			goto err_map_notify;
+	} else {
+		mdev->notify_map_cap = notify;
+	}
+
+	/* Again, we don't know how much we should map, but PAGE_SIZE
+	 * is more than enough for all existing devices.
+	 */
+	if (device) {
+		mdev->device = pds_vdpa_map_capability(mdev, device, 0, 4,
+							0, PAGE_SIZE,
+							&mdev->device_len,
+							NULL);
+		if (!mdev->device)
+			goto err_map_device;
+	}
+
+	return 0;
+
+err_map_device:
+	if (mdev->notify_base)
+		pci_iounmap(pci_dev, mdev->notify_base);
+err_map_notify:
+	pci_iounmap(pci_dev, mdev->isr);
+err_map_isr:
+	pci_iounmap(pci_dev, mdev->common);
+err_map_common:
+	pci_release_selected_regions(pci_dev, mdev->modern_bars);
+	return err;
+}
+
+void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev)
+{
+	struct pci_dev *pci_dev = mdev->pci_dev;
+
+	if (mdev->device)
+		pci_iounmap(pci_dev, mdev->device);
+	if (mdev->notify_base)
+		pci_iounmap(pci_dev, mdev->notify_base);
+	pci_iounmap(pci_dev, mdev->isr);
+	pci_iounmap(pci_dev, mdev->common);
+	pci_release_selected_regions(pci_dev, mdev->modern_bars);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 16/19] pds_vdpa: add auxiliary driver
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (14 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands Shannon Nelson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

The auxiliary_bus driver is registered after the PCI driver
is loaded, and when the pds_core has created a device for
it, after the VF has been enabled, this driver gets probed.
It then registers itself with the DSC through the pds_core in
order to start the firmware services for this device.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/vdpa/pds/Makefile  |   3 +-
 drivers/vdpa/pds/aux_drv.c | 123 +++++++++++++++++++++++++++++++++++++
 drivers/vdpa/pds/aux_drv.h |  28 +++++++++
 drivers/vdpa/pds/debugfs.c |  23 +++++++
 drivers/vdpa/pds/debugfs.h |   2 +
 drivers/vdpa/pds/pci_drv.c |  19 ++++++
 drivers/vdpa/pds/pci_drv.h |   1 +
 7 files changed, 198 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vdpa/pds/aux_drv.c
 create mode 100644 drivers/vdpa/pds/aux_drv.h

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
index b8376ab165bc..82ee258f6122 100644
--- a/drivers/vdpa/pds/Makefile
+++ b/drivers/vdpa/pds/Makefile
@@ -3,6 +3,7 @@
 
 obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
 
-pds_vdpa-y := pci_drv.o	\
+pds_vdpa-y := aux_drv.o \
+	      pci_drv.o	\
 	      debugfs.o \
 	      virtio_pci.o
diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
new file mode 100644
index 000000000000..aef3c984dc90
--- /dev/null
+++ b/drivers/vdpa/pds/aux_drv.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/auxiliary_bus.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/vdpa.h>
+
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+#include <linux/pds/pds_vdpa.h>
+
+#include "aux_drv.h"
+#include "pci_drv.h"
+#include "debugfs.h"
+
+static const
+struct auxiliary_device_id pds_vdpa_aux_id_table[] = {
+	{ .name = PDS_VDPA_DEV_NAME, },
+	{},
+};
+
+static void
+pds_vdpa_aux_notify_handler(struct pds_auxiliary_dev *padev,
+			    union pds_core_notifyq_comp *event)
+{
+	struct device *dev = &padev->aux_dev.dev;
+	u16 ecode = le16_to_cpu(event->ecode);
+
+	dev_info(dev, "%s: event code %d\n", __func__, ecode);
+}
+
+static int
+pds_vdpa_aux_probe(struct auxiliary_device *aux_dev,
+		   const struct auxiliary_device_id *id)
+
+{
+	struct pds_auxiliary_dev *padev =
+		container_of(aux_dev, struct pds_auxiliary_dev, aux_dev);
+	struct device *dev = &aux_dev->dev;
+	struct pds_vdpa_aux *vdpa_aux;
+	struct pci_dev *pdev;
+	struct pci_bus *bus;
+	int busnr;
+	u16 devfn;
+	int err;
+
+	vdpa_aux = kzalloc(sizeof(*vdpa_aux), GFP_KERNEL);
+	if (!vdpa_aux)
+		return -ENOMEM;
+
+	vdpa_aux->padev = padev;
+	auxiliary_set_drvdata(aux_dev, vdpa_aux);
+
+	/* Find our VF PCI device */
+	busnr = PCI_BUS_NUM(padev->id);
+	devfn = padev->id & 0xff;
+	bus = pci_find_bus(0, busnr);
+	pdev = pci_get_slot(bus, devfn);
+
+	vdpa_aux->vdpa_vf = pci_get_drvdata(pdev);
+	vdpa_aux->vdpa_vf->vdpa_aux = vdpa_aux;
+	pdev = vdpa_aux->vdpa_vf->pdev;
+	if (!pds_vdpa_is_vdpa_pci_driver(pdev)) {
+		dev_err(&pdev->dev, "%s: PCI driver is not pds_vdpa_pci_driver\n", __func__);
+		err = -EINVAL;
+		goto err_invalid_driver;
+	}
+
+	dev_info(dev, "%s: id %#04x busnr %#x devfn %#x bus %p vdpa_vf %p\n",
+		 __func__, padev->id, busnr, devfn, bus, vdpa_aux->vdpa_vf);
+
+	/* Register our PDS client with the pds_core */
+	vdpa_aux->padrv.event_handler = pds_vdpa_aux_notify_handler;
+	err = padev->ops->register_client(padev, &vdpa_aux->padrv);
+	if (err) {
+		dev_err(dev, "%s: Failed to register as client: %pe\n",
+			__func__, ERR_PTR(err));
+		goto err_register_client;
+	}
+
+	pds_vdpa_debugfs_add_ident(vdpa_aux);
+
+	return 0;
+
+err_register_client:
+	auxiliary_set_drvdata(aux_dev, NULL);
+err_invalid_driver:
+	kfree(vdpa_aux);
+
+	return err;
+}
+
+static void
+pds_vdpa_aux_remove(struct auxiliary_device *aux_dev)
+{
+	struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
+	struct device *dev = &aux_dev->dev;
+
+	vdpa_aux->padev->ops->unregister_client(vdpa_aux->padev);
+	if (vdpa_aux->vdpa_vf)
+		pci_dev_put(vdpa_aux->vdpa_vf->pdev);
+
+	kfree(vdpa_aux);
+	auxiliary_set_drvdata(aux_dev, NULL);
+
+	dev_info(dev, "Removed\n");
+}
+
+static struct auxiliary_driver
+pds_vdpa_aux_driver = {
+	.name = PDS_DEV_TYPE_VDPA_STR,
+	.probe = pds_vdpa_aux_probe,
+	.remove = pds_vdpa_aux_remove,
+	.id_table = pds_vdpa_aux_id_table,
+};
+
+struct auxiliary_driver *
+pds_vdpa_aux_driver_info(void)
+{
+	return &pds_vdpa_aux_driver;
+}
diff --git a/drivers/vdpa/pds/aux_drv.h b/drivers/vdpa/pds/aux_drv.h
new file mode 100644
index 000000000000..a6bd644fb957
--- /dev/null
+++ b/drivers/vdpa/pds/aux_drv.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _AUX_DRV_H_
+#define _AUX_DRV_H_
+
+#include <linux/auxiliary_bus.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+
+struct pds_vdpa_pci_device;
+
+struct pds_vdpa_aux {
+	struct pds_auxiliary_dev *padev;
+	struct pds_auxiliary_drv padrv;
+
+	struct pds_vdpa_pci_device *vdpa_vf;
+	struct vdpa_mgmt_dev vdpa_mdev;
+	struct pds_vdpa_device *pdsv;
+
+	struct pds_vdpa_ident ident;
+	bool local_mac_bit;
+};
+
+struct auxiliary_driver *
+pds_vdpa_aux_driver_info(void);
+
+#endif /* _AUX_DRV_H_ */
diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
index f5b6654ae89b..f766412209df 100644
--- a/drivers/vdpa/pds/debugfs.c
+++ b/drivers/vdpa/pds/debugfs.c
@@ -4,10 +4,14 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/types.h>
+#include <linux/vdpa.h>
 
 #include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
 #include <linux/pds/pds_vdpa.h>
 
+#include "aux_drv.h"
 #include "pci_drv.h"
 #include "debugfs.h"
 
@@ -41,4 +45,23 @@ pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
 	vdpa_pdev->dentry = NULL;
 }
 
+static int
+identity_show(struct seq_file *seq, void *v)
+{
+	struct pds_vdpa_aux *vdpa_aux = seq->private;
+
+	seq_printf(seq, "aux_dev:            %s\n",
+		   dev_name(&vdpa_aux->padev->aux_dev.dev));
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(identity);
+
+void
+pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux)
+{
+	debugfs_create_file("identity", 0400, vdpa_aux->vdpa_vf->dentry,
+			    vdpa_aux, &identity_fops);
+}
+
 #endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
index ac31ab47746b..939a4c248aac 100644
--- a/drivers/vdpa/pds/debugfs.h
+++ b/drivers/vdpa/pds/debugfs.h
@@ -12,11 +12,13 @@ void pds_vdpa_debugfs_create(void);
 void pds_vdpa_debugfs_destroy(void);
 void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
 void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
+void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux);
 #else
 static inline void pds_vdpa_debugfs_create(void) { }
 static inline void pds_vdpa_debugfs_destroy(void) { }
 static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
 static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
+static inline void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux) { }
 #endif
 
 #endif /* _PDS_VDPA_DEBUGFS_H_ */
diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
index 10491e22778c..54a73ae023f9 100644
--- a/drivers/vdpa/pds/pci_drv.c
+++ b/drivers/vdpa/pds/pci_drv.c
@@ -6,11 +6,15 @@
 #include <linux/aer.h>
 #include <linux/types.h>
 #include <linux/vdpa.h>
+#include <linux/auxiliary_bus.h>
 
 #include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
 #include <linux/pds/pds_vdpa.h>
 
 #include "pci_drv.h"
+#include "aux_drv.h"
 #include "debugfs.h"
 
 static void
@@ -118,9 +122,16 @@ pds_vdpa_pci_driver = {
 	.remove = pds_vdpa_pci_remove
 };
 
+bool
+pds_vdpa_is_vdpa_pci_driver(struct pci_dev *pdev)
+{
+	return (to_pci_driver(pdev->dev.driver) == &pds_vdpa_pci_driver);
+}
+
 static void __exit
 pds_vdpa_pci_cleanup(void)
 {
+	auxiliary_driver_unregister(pds_vdpa_aux_driver_info());
 	pci_unregister_driver(&pds_vdpa_pci_driver);
 
 	pds_vdpa_debugfs_destroy();
@@ -140,8 +151,16 @@ pds_vdpa_pci_init(void)
 		goto err_pci;
 	}
 
+	err = auxiliary_driver_register(pds_vdpa_aux_driver_info());
+	if (err) {
+		pr_err("%s: aux driver register failed: %pe\n", __func__, ERR_PTR(err));
+		goto err_aux;
+	}
+
 	return 0;
 
+err_aux:
+	pci_unregister_driver(&pds_vdpa_pci_driver);
 err_pci:
 	pds_vdpa_debugfs_destroy();
 	return err;
diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
index 15f3b34fafa9..97ba75a7ce50 100644
--- a/drivers/vdpa/pds/pci_drv.h
+++ b/drivers/vdpa/pds/pci_drv.h
@@ -43,6 +43,7 @@ struct pds_vdpa_pci_device {
 	struct virtio_pci_modern_device vd_mdev;
 };
 
+bool pds_vdpa_is_vdpa_pci_driver(struct pci_dev *pdev);
 int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev);
 void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev);
 #endif /* _PCI_DRV_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (15 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 16/19] pds_vdpa: add auxiliary driver Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-22  6:32   ` Jason Wang
  2022-11-18 22:56 ` [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces Shannon Nelson
  2022-11-18 22:56 ` [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst Shannon Nelson
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

These are the adminq commands that will be needed for
setting up and using the vDPA device.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/vdpa/pds/Makefile   |   1 +
 drivers/vdpa/pds/cmds.c     | 266 ++++++++++++++++++++++++++++++++++++
 drivers/vdpa/pds/cmds.h     |  17 +++
 drivers/vdpa/pds/vdpa_dev.h |  60 ++++++++
 4 files changed, 344 insertions(+)
 create mode 100644 drivers/vdpa/pds/cmds.c
 create mode 100644 drivers/vdpa/pds/cmds.h
 create mode 100644 drivers/vdpa/pds/vdpa_dev.h

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
index 82ee258f6122..fafd356ddf86 100644
--- a/drivers/vdpa/pds/Makefile
+++ b/drivers/vdpa/pds/Makefile
@@ -4,6 +4,7 @@
 obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
 
 pds_vdpa-y := aux_drv.o \
+	      cmds.o \
 	      pci_drv.o	\
 	      debugfs.o \
 	      virtio_pci.o
diff --git a/drivers/vdpa/pds/cmds.c b/drivers/vdpa/pds/cmds.c
new file mode 100644
index 000000000000..2428ecdcf671
--- /dev/null
+++ b/drivers/vdpa/pds/cmds.c
@@ -0,0 +1,266 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/io.h>
+#include <linux/types.h>
+#include <linux/vdpa.h>
+
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+#include <linux/pds/pds_vdpa.h>
+
+#include "vdpa_dev.h"
+#include "aux_drv.h"
+#include "pci_drv.h"
+#include "cmds.h"
+
+static void
+pds_vdpa_check_needs_reset(struct pds_vdpa_device *pdsv, int err)
+{
+	if (err == -ENXIO)
+		pdsv->hw.status |= VIRTIO_CONFIG_S_NEEDS_RESET;
+}
+
+int
+pds_vdpa_init_hw(struct pds_vdpa_device *pdsv)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_init_cmd init_cmd = {
+		.opcode = PDS_VDPA_CMD_INIT,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.len = cpu_to_le32(sizeof(pdsv->vn_config)),
+		.config_pa = cpu_to_le64(pdsv->vn_config_pa),
+	};
+	struct pds_vdpa_comp init_comp = {0};
+	int err;
+
+	/* Initialize the vdpa/virtio device */
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&init_cmd,
+				     sizeof(init_cmd),
+				     (union pds_core_adminq_comp *)&init_comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to init hw, status %d: %pe\n",
+			init_comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_reset(struct pds_vdpa_device *pdsv)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_RESET,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+	};
+	struct pds_vdpa_comp comp = {0};
+	int err;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to reset hw, status %d: %pe\n",
+			comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_set_status(struct pds_vdpa_device *pdsv, u8 status)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_status_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_STATUS_UPDATE,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.status = status
+	};
+	struct pds_vdpa_comp comp = {0};
+	int err;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to set status update %#x, status %d: %pe\n",
+			status, comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_set_mac(struct pds_vdpa_device *pdsv, u8 *mac)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_setattr_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_SET_ATTR,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.attr = PDS_VDPA_ATTR_MAC,
+	};
+	struct pds_vdpa_comp comp = {0};
+	int err;
+
+	ether_addr_copy(cmd.mac, mac);
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to set mac address %pM, status %d: %pe\n",
+			mac, comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_set_max_vq_pairs(struct pds_vdpa_device *pdsv, u16 max_vqp)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_setattr_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_SET_ATTR,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.attr = PDS_VDPA_ATTR_MAX_VQ_PAIRS,
+		.max_vq_pairs = cpu_to_le16(max_vqp),
+	};
+	struct pds_vdpa_comp comp = {0};
+	int err;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to set max vq pairs %u, status %d: %pe\n",
+			max_vqp, comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_init_vq(struct pds_vdpa_device *pdsv, u16 qid,
+		     struct pds_vdpa_vq_info *vq_info)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_vq_init_comp comp = {0};
+	struct pds_vdpa_vq_init_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_VQ_INIT,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.qid = cpu_to_le16(qid),
+		.len = cpu_to_le16(ilog2(vq_info->q_len)),
+		.desc_addr = cpu_to_le64(vq_info->desc_addr),
+		.avail_addr = cpu_to_le64(vq_info->avail_addr),
+		.used_addr = cpu_to_le64(vq_info->used_addr),
+		.intr_index = cpu_to_le16(vq_info->intr_index),
+	};
+	int err;
+
+	dev_dbg(dev, "%s: qid %d len %d desc_addr %#llx avail_addr %#llx used_addr %#llx intr_index %d\n",
+		 __func__, qid, ilog2(vq_info->q_len),
+		 vq_info->desc_addr, vq_info->avail_addr,
+		 vq_info->used_addr, vq_info->intr_index);
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to init vq %d, status %d: %pe\n",
+			qid, comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	} else {
+		vq_info->hw_qtype = comp.hw_qtype;
+		vq_info->hw_qindex = le16_to_cpu(comp.hw_qindex);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_reset_vq(struct pds_vdpa_device *pdsv, u16 qid)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_vq_reset_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_VQ_RESET,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.qid = cpu_to_le16(qid),
+	};
+	struct pds_vdpa_comp comp = {0};
+	int err;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to reset vq %d, status %d: %pe\n",
+			qid, comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
+
+int
+pds_vdpa_cmd_set_features(struct pds_vdpa_device *pdsv, u64 features)
+{
+	struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+	struct device *dev = &padev->aux_dev.dev;
+	struct pds_vdpa_set_features_cmd cmd = {
+		.opcode = PDS_VDPA_CMD_SET_FEATURES,
+		.vdpa_index = pdsv->hw.vdpa_index,
+		.vf_id = cpu_to_le16(pdsv->vdpa_aux->vdpa_vf->vf_id),
+		.features = cpu_to_le64(features),
+	};
+	struct pds_vdpa_comp comp = {0};
+	int err;
+
+	err = padev->ops->adminq_cmd(padev,
+				     (union pds_core_adminq_cmd *)&cmd,
+				     sizeof(cmd),
+				     (union pds_core_adminq_comp *)&comp,
+				     0);
+	if (err) {
+		dev_err(dev, "Failed to set features %#llx, status %d: %pe\n",
+			features, comp.status, ERR_PTR(err));
+		pds_vdpa_check_needs_reset(pdsv, err);
+	}
+
+	return err;
+}
diff --git a/drivers/vdpa/pds/cmds.h b/drivers/vdpa/pds/cmds.h
new file mode 100644
index 000000000000..88ecc9b33646
--- /dev/null
+++ b/drivers/vdpa/pds/cmds.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _VDPA_CMDS_H_
+#define _VDPA_CMDS_H_
+
+int pds_vdpa_init_hw(struct pds_vdpa_device *pdsv);
+
+int pds_vdpa_cmd_reset(struct pds_vdpa_device *pdsv);
+int pds_vdpa_cmd_set_status(struct pds_vdpa_device *pdsv, u8 status);
+int pds_vdpa_cmd_set_mac(struct pds_vdpa_device *pdsv, u8 *mac);
+int pds_vdpa_cmd_set_max_vq_pairs(struct pds_vdpa_device *pdsv, u16 max_vqp);
+int pds_vdpa_cmd_init_vq(struct pds_vdpa_device *pdsv, u16 qid,
+			 struct pds_vdpa_vq_info *vq_info);
+int pds_vdpa_cmd_reset_vq(struct pds_vdpa_device *pdsv, u16 qid);
+int pds_vdpa_cmd_set_features(struct pds_vdpa_device *pdsv, u64 features);
+#endif /* _VDPA_CMDS_H_ */
diff --git a/drivers/vdpa/pds/vdpa_dev.h b/drivers/vdpa/pds/vdpa_dev.h
new file mode 100644
index 000000000000..ac881687dc3e
--- /dev/null
+++ b/drivers/vdpa/pds/vdpa_dev.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#ifndef _VDPA_DEV_H_
+#define _VDPA_DEV_H_
+
+#include <linux/pci.h>
+#include <linux/vdpa.h>
+
+
+struct pds_vdpa_aux;
+
+struct pds_vdpa_vq_info {
+	bool ready;
+	u64 desc_addr;
+	u64 avail_addr;
+	u64 used_addr;
+	u32 q_len;
+	u16 qid;
+
+	void __iomem *notify;
+	dma_addr_t notify_pa;
+
+	u64 doorbell;
+	u16 avail_idx;
+	u16 used_idx;
+	int intr_index;
+
+	u8 hw_qtype;
+	u16 hw_qindex;
+
+	struct vdpa_callback event_cb;
+	struct pds_vdpa_device *pdsv;
+};
+
+#define PDS_VDPA_MAX_QUEUES	65
+#define PDS_VDPA_MAX_QLEN	32768
+struct pds_vdpa_hw {
+	struct pds_vdpa_vq_info vqs[PDS_VDPA_MAX_QUEUES];
+	u64 req_features;		/* features requested by vdpa */
+	u64 actual_features;		/* features negotiated and in use */
+	u8 vdpa_index;			/* rsvd for future subdevice use */
+	u8 num_vqs;			/* num vqs in use */
+	u16 status;			/* vdpa status */
+	struct vdpa_callback config_cb;
+};
+
+struct pds_vdpa_device {
+	struct vdpa_device vdpa_dev;
+	struct pds_vdpa_aux *vdpa_aux;
+	struct pds_vdpa_hw hw;
+
+	struct virtio_net_config vn_config;
+	dma_addr_t vn_config_pa;
+	struct dentry *dentry;
+};
+
+int pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux);
+
+#endif /* _VDPA_DEV_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (16 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-22  6:32   ` Jason Wang
  2022-11-18 22:56 ` [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst Shannon Nelson
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

This is the vDPA device support, where we advertise that we can
support the virtio queues and deal with the configuration work
through the pds_core's adminq.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 drivers/vdpa/pds/Makefile   |   3 +-
 drivers/vdpa/pds/aux_drv.c  |  33 ++
 drivers/vdpa/pds/debugfs.c  | 167 ++++++++
 drivers/vdpa/pds/debugfs.h  |   4 +
 drivers/vdpa/pds/vdpa_dev.c | 796 ++++++++++++++++++++++++++++++++++++
 5 files changed, 1002 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vdpa/pds/vdpa_dev.c

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
index fafd356ddf86..7fde4a4a1620 100644
--- a/drivers/vdpa/pds/Makefile
+++ b/drivers/vdpa/pds/Makefile
@@ -7,4 +7,5 @@ pds_vdpa-y := aux_drv.o \
 	      cmds.o \
 	      pci_drv.o	\
 	      debugfs.o \
-	      virtio_pci.o
+	      virtio_pci.o \
+	      vdpa_dev.o
diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
index aef3c984dc90..83b9a5a79325 100644
--- a/drivers/vdpa/pds/aux_drv.c
+++ b/drivers/vdpa/pds/aux_drv.c
@@ -12,6 +12,7 @@
 #include <linux/pds/pds_vdpa.h>
 
 #include "aux_drv.h"
+#include "vdpa_dev.h"
 #include "pci_drv.h"
 #include "debugfs.h"
 
@@ -25,10 +26,25 @@ static void
 pds_vdpa_aux_notify_handler(struct pds_auxiliary_dev *padev,
 			    union pds_core_notifyq_comp *event)
 {
+	struct pds_vdpa_device *pdsv = padev->priv;
 	struct device *dev = &padev->aux_dev.dev;
 	u16 ecode = le16_to_cpu(event->ecode);
 
 	dev_info(dev, "%s: event code %d\n", __func__, ecode);
+
+	/* Give the upper layers a hint that something interesting
+	 * may have happened.  It seems that the only thing this
+	 * triggers in the virtio-net drivers above us is a check
+	 * of link status.
+	 *
+	 * We don't set the NEEDS_RESET flag for EVENT_RESET
+	 * because we're likely going through a recovery or
+	 * fw_update and will be back up and running soon.
+	 */
+	if (ecode == PDS_EVENT_RESET || ecode == PDS_EVENT_LINK_CHANGE) {
+		if (pdsv->hw.config_cb.callback)
+			pdsv->hw.config_cb.callback(pdsv->hw.config_cb.private);
+	}
 }
 
 static int
@@ -80,10 +96,25 @@ pds_vdpa_aux_probe(struct auxiliary_device *aux_dev,
 		goto err_register_client;
 	}
 
+	/* Get device ident info and set up the vdpa_mgmt_dev */
+	err = pds_vdpa_get_mgmt_info(vdpa_aux);
+	if (err)
+		goto err_register_client;
+
+	/* Let vdpa know that we can provide devices */
+	err = vdpa_mgmtdev_register(&vdpa_aux->vdpa_mdev);
+	if (err) {
+		dev_err(dev, "%s: Failed to initialize vdpa_mgmt interface: %pe\n",
+			__func__, ERR_PTR(err));
+		goto err_mgmt_reg;
+	}
+
 	pds_vdpa_debugfs_add_ident(vdpa_aux);
 
 	return 0;
 
+err_mgmt_reg:
+	padev->ops->unregister_client(padev);
 err_register_client:
 	auxiliary_set_drvdata(aux_dev, NULL);
 err_invalid_driver:
@@ -98,6 +129,8 @@ pds_vdpa_aux_remove(struct auxiliary_device *aux_dev)
 	struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
 	struct device *dev = &aux_dev->dev;
 
+	vdpa_mgmtdev_unregister(&vdpa_aux->vdpa_mdev);
+
 	vdpa_aux->padev->ops->unregister_client(vdpa_aux->padev);
 	if (vdpa_aux->vdpa_vf)
 		pci_dev_put(vdpa_aux->vdpa_vf->pdev);
diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
index f766412209df..aa3143126a7e 100644
--- a/drivers/vdpa/pds/debugfs.c
+++ b/drivers/vdpa/pds/debugfs.c
@@ -11,6 +11,7 @@
 #include <linux/pds/pds_auxbus.h>
 #include <linux/pds/pds_vdpa.h>
 
+#include "vdpa_dev.h"
 #include "aux_drv.h"
 #include "pci_drv.h"
 #include "debugfs.h"
@@ -19,6 +20,72 @@
 
 static struct dentry *dbfs_dir;
 
+#define PRINT_SBIT_NAME(__seq, __f, __name)                     \
+	do {                                                    \
+		if (__f & __name)                               \
+			seq_printf(__seq, " %s", &#__name[16]); \
+	} while (0)
+
+static void
+print_status_bits(struct seq_file *seq, u16 status)
+{
+	seq_puts(seq, "status:");
+	PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER);
+	PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER_OK);
+	PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FEATURES_OK);
+	PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_NEEDS_RESET);
+	PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FAILED);
+	seq_puts(seq, "\n");
+}
+
+#define PRINT_FBIT_NAME(__seq, __f, __name)                \
+	do {                                               \
+		if (__f & BIT_ULL(__name))                 \
+			seq_printf(__seq, " %s", #__name); \
+	} while (0)
+
+static void
+print_feature_bits(struct seq_file *seq, u64 features)
+{
+	seq_puts(seq, "features:");
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CSUM);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_CSUM);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MTU);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MAC);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO4);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO6);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ECN);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_UFO);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO4);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO6);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_ECN);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_UFO);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MRG_RXBUF);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STATUS);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VQ);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VLAN);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX_EXTRA);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ANNOUNCE);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MQ);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_MAC_ADDR);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HASH_REPORT);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSS);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSC_EXT);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STANDBY);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_SPEED_DUPLEX);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_NOTIFY_ON_EMPTY);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_ANY_LAYOUT);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_VERSION_1);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_ACCESS_PLATFORM);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_RING_PACKED);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_ORDER_PLATFORM);
+	PRINT_FBIT_NAME(seq, features, VIRTIO_F_SR_IOV);
+	seq_puts(seq, "\n");
+}
+
 void
 pds_vdpa_debugfs_create(void)
 {
@@ -49,10 +116,18 @@ static int
 identity_show(struct seq_file *seq, void *v)
 {
 	struct pds_vdpa_aux *vdpa_aux = seq->private;
+	struct vdpa_mgmt_dev *mgmt;
 
 	seq_printf(seq, "aux_dev:            %s\n",
 		   dev_name(&vdpa_aux->padev->aux_dev.dev));
 
+	mgmt = &vdpa_aux->vdpa_mdev;
+	seq_printf(seq, "max_vqs:            %d\n", mgmt->max_supported_vqs);
+	seq_printf(seq, "config_attr_mask:   %#llx\n", mgmt->config_attr_mask);
+	seq_printf(seq, "supported_features: %#llx\n", mgmt->supported_features);
+	print_feature_bits(seq, mgmt->supported_features);
+	seq_printf(seq, "local_mac_bit:      %d\n", vdpa_aux->local_mac_bit);
+
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(identity);
@@ -64,4 +139,96 @@ pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux)
 			    vdpa_aux, &identity_fops);
 }
 
+static int
+config_show(struct seq_file *seq, void *v)
+{
+	struct pds_vdpa_device *pdsv = seq->private;
+	struct virtio_net_config *vc = &pdsv->vn_config;
+
+	seq_printf(seq, "mac:                  %pM\n", vc->mac);
+	seq_printf(seq, "max_virtqueue_pairs:  %d\n",
+		   __virtio16_to_cpu(true, vc->max_virtqueue_pairs));
+	seq_printf(seq, "mtu:                  %d\n", __virtio16_to_cpu(true, vc->mtu));
+	seq_printf(seq, "speed:                %d\n", le32_to_cpu(vc->speed));
+	seq_printf(seq, "duplex:               %d\n", vc->duplex);
+	seq_printf(seq, "rss_max_key_size:     %d\n", vc->rss_max_key_size);
+	seq_printf(seq, "rss_max_indirection_table_length: %d\n",
+		   le16_to_cpu(vc->rss_max_indirection_table_length));
+	seq_printf(seq, "supported_hash_types: %#x\n",
+		   le32_to_cpu(vc->supported_hash_types));
+	seq_printf(seq, "vn_status:            %#x\n",
+		   __virtio16_to_cpu(true, vc->status));
+	print_status_bits(seq, __virtio16_to_cpu(true, vc->status));
+
+	seq_printf(seq, "hw_status:            %#x\n", pdsv->hw.status);
+	print_status_bits(seq, pdsv->hw.status);
+	seq_printf(seq, "req_features:         %#llx\n", pdsv->hw.req_features);
+	print_feature_bits(seq, pdsv->hw.req_features);
+	seq_printf(seq, "actual_features:      %#llx\n", pdsv->hw.actual_features);
+	print_feature_bits(seq, pdsv->hw.actual_features);
+	seq_printf(seq, "vdpa_index:           %d\n", pdsv->hw.vdpa_index);
+	seq_printf(seq, "num_vqs:              %d\n", pdsv->hw.num_vqs);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(config);
+
+static int
+vq_show(struct seq_file *seq, void *v)
+{
+	struct pds_vdpa_vq_info *vq = seq->private;
+	struct pds_vdpa_intr_info *intrs;
+
+	seq_printf(seq, "ready:      %d\n", vq->ready);
+	seq_printf(seq, "desc_addr:  %#llx\n", vq->desc_addr);
+	seq_printf(seq, "avail_addr: %#llx\n", vq->avail_addr);
+	seq_printf(seq, "used_addr:  %#llx\n", vq->used_addr);
+	seq_printf(seq, "q_len:      %d\n", vq->q_len);
+	seq_printf(seq, "qid:        %d\n", vq->qid);
+
+	seq_printf(seq, "doorbell:   %#llx\n", vq->doorbell);
+	seq_printf(seq, "avail_idx:  %d\n", vq->avail_idx);
+	seq_printf(seq, "used_idx:   %d\n", vq->used_idx);
+	seq_printf(seq, "intr_index: %d\n", vq->intr_index);
+
+	intrs = vq->pdsv->vdpa_aux->vdpa_vf->intrs;
+	seq_printf(seq, "irq:        %d\n", intrs[vq->intr_index].irq);
+	seq_printf(seq, "irq-name:   %s\n", intrs[vq->intr_index].name);
+
+	seq_printf(seq, "hw_qtype:   %d\n", vq->hw_qtype);
+	seq_printf(seq, "hw_qindex:  %d\n", vq->hw_qindex);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(vq);
+
+void
+pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv)
+{
+	struct dentry *dentry;
+	const char *name;
+	int i;
+
+	dentry = pdsv->vdpa_aux->vdpa_vf->dentry;
+	name = dev_name(&pdsv->vdpa_dev.dev);
+
+	pdsv->dentry = debugfs_create_dir(name, dentry);
+
+	debugfs_create_file("config", 0400, pdsv->dentry, pdsv, &config_fops);
+
+	for (i = 0; i < pdsv->hw.num_vqs; i++) {
+		char name[8];
+
+		snprintf(name, sizeof(name), "vq%02d", i);
+		debugfs_create_file(name, 0400, pdsv->dentry, &pdsv->hw.vqs[i], &vq_fops);
+	}
+}
+
+void
+pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv)
+{
+	debugfs_remove_recursive(pdsv->dentry);
+	pdsv->dentry = NULL;
+}
+
 #endif /* CONFIG_DEBUG_FS */
diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
index 939a4c248aac..f0567e4ee4e4 100644
--- a/drivers/vdpa/pds/debugfs.h
+++ b/drivers/vdpa/pds/debugfs.h
@@ -13,12 +13,16 @@ void pds_vdpa_debugfs_destroy(void);
 void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
 void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
 void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux);
+void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv);
+void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv);
 #else
 static inline void pds_vdpa_debugfs_create(void) { }
 static inline void pds_vdpa_debugfs_destroy(void) { }
 static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
 static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
 static inline void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux) { }
+static inline void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv) { }
+static inline void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv) { }
 #endif
 
 #endif /* _PDS_VDPA_DEBUGFS_H_ */
diff --git a/drivers/vdpa/pds/vdpa_dev.c b/drivers/vdpa/pds/vdpa_dev.c
new file mode 100644
index 000000000000..824be42aff0d
--- /dev/null
+++ b/drivers/vdpa/pds/vdpa_dev.c
@@ -0,0 +1,796 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Pensando Systems, Inc */
+
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/sysfs.h>
+#include <linux/types.h>
+#include <linux/vdpa.h>
+#include <uapi/linux/virtio_pci.h>
+#include <uapi/linux/vdpa.h>
+
+#include <linux/pds/pds_intr.h>
+#include <linux/pds/pds_core_if.h>
+#include <linux/pds/pds_adminq.h>
+#include <linux/pds/pds_auxbus.h>
+#include <linux/pds/pds_vdpa.h>
+
+#include "vdpa_dev.h"
+#include "pci_drv.h"
+#include "aux_drv.h"
+#include "pci_drv.h"
+#include "cmds.h"
+#include "debugfs.h"
+
+static int
+pds_vdpa_setup_driver(struct pds_vdpa_device *pdsv)
+{
+	struct device *dev = &pdsv->vdpa_dev.dev;
+	int err = 0;
+	int i;
+
+	/* Verify all vqs[] are in ready state */
+	for (i = 0; i < pdsv->hw.num_vqs; i++) {
+		if (!pdsv->hw.vqs[i].ready) {
+			dev_warn(dev, "%s: qid %d not ready\n", __func__, i);
+			err = -ENOENT;
+		}
+	}
+
+	return err;
+}
+
+static struct pds_vdpa_device *
+vdpa_to_pdsv(struct vdpa_device *vdpa_dev)
+{
+	return container_of(vdpa_dev, struct pds_vdpa_device, vdpa_dev);
+}
+
+static struct pds_vdpa_hw *
+vdpa_to_hw(struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+
+	return &pdsv->hw;
+}
+
+static int
+pds_vdpa_set_vq_address(struct vdpa_device *vdpa_dev, u16 qid,
+			u64 desc_addr, u64 driver_addr, u64 device_addr)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	hw->vqs[qid].desc_addr = desc_addr;
+	hw->vqs[qid].avail_addr = driver_addr;
+	hw->vqs[qid].used_addr = device_addr;
+
+	return 0;
+}
+
+static void
+pds_vdpa_set_vq_num(struct vdpa_device *vdpa_dev, u16 qid, u32 num)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	hw->vqs[qid].q_len = num;
+}
+
+static void
+pds_vdpa_kick_vq(struct vdpa_device *vdpa_dev, u16 qid)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+
+	iowrite16(qid, pdsv->hw.vqs[qid].notify);
+}
+
+static void
+pds_vdpa_set_vq_cb(struct vdpa_device *vdpa_dev, u16 qid,
+		   struct vdpa_callback *cb)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	hw->vqs[qid].event_cb = *cb;
+}
+
+static irqreturn_t
+pds_vdpa_isr(int irq, void *data)
+{
+	struct pds_core_intr __iomem *intr_ctrl;
+	struct pds_vdpa_device *pdsv;
+	struct pds_vdpa_vq_info *vq;
+
+	vq = data;
+	pdsv = vq->pdsv;
+
+	if (vq->event_cb.callback)
+		vq->event_cb.callback(vq->event_cb.private);
+
+	/* Since we don't actually know how many vq descriptors are
+	 * covered in this interrupt cycle, we simply clean all the
+	 * credits and re-enable the interrupt.
+	 */
+	intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
+	pds_core_intr_clean_flags(&intr_ctrl[vq->intr_index],
+				  PDS_CORE_INTR_CRED_REARM);
+
+	return IRQ_HANDLED;
+}
+
+static void
+pds_vdpa_release_irq(struct pds_vdpa_device *pdsv, int qid)
+{
+	struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
+	struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
+	struct pds_core_intr __iomem *intr_ctrl;
+	int index;
+
+	intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
+	index = pdsv->hw.vqs[qid].intr_index;
+	if (index == VIRTIO_MSI_NO_VECTOR)
+		return;
+
+	if (intrs[index].irq == VIRTIO_MSI_NO_VECTOR)
+		return;
+
+	if (qid & 0x1) {
+		pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
+	} else {
+		pds_core_intr_mask(&intr_ctrl[index], PDS_CORE_INTR_MASK_SET);
+		devm_free_irq(&pdev->dev, intrs[index].irq, &pdsv->hw.vqs[qid]);
+		pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
+		intrs[index].irq = VIRTIO_MSI_NO_VECTOR;
+	}
+}
+
+static void
+pds_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev, u16 qid, bool ready)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+	struct device *dev = &pdsv->vdpa_dev.dev;
+	struct pds_core_intr __iomem *intr_ctrl;
+	int err;
+
+	dev_dbg(dev, "%s: qid %d ready %d => %d\n",
+		 __func__, qid, hw->vqs[qid].ready, ready);
+	if (ready == hw->vqs[qid].ready)
+		return;
+
+	intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
+	if (ready) {
+		struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
+		int index = VIRTIO_MSI_NO_VECTOR;
+		int i;
+
+		/*  Tx and Rx queues share interrupts, and they start with
+		 *  even numbers, so only find an interrupt for the even numbered
+		 *  qid, and let the odd number use what the previous queue got.
+		 */
+		if (qid & 0x1) {
+			int even = qid & ~0x1;
+
+			index = hw->vqs[even].intr_index;
+		} else {
+			for (i = 0; i < pdsv->vdpa_aux->vdpa_vf->nintrs; i++) {
+				if (intrs[i].irq == VIRTIO_MSI_NO_VECTOR) {
+					index = i;
+					break;
+				}
+			}
+		}
+
+		if (qid & 0x1) {
+			hw->vqs[qid].intr_index = index;
+		} else if (index != VIRTIO_MSI_NO_VECTOR) {
+			int irq;
+
+			irq = pci_irq_vector(pdev, index);
+			snprintf(intrs[index].name, sizeof(intrs[index].name),
+				 "vdpa-%s-%d", dev_name(dev), index);
+
+			err = devm_request_irq(&pdev->dev, irq, pds_vdpa_isr, 0,
+					       intrs[index].name, &hw->vqs[qid]);
+			if (err) {
+				dev_info(dev, "%s: no irq for qid %d: %pe\n",
+					 __func__, qid, ERR_PTR(err));
+			} else {
+				intrs[index].irq = irq;
+				hw->vqs[qid].intr_index = index;
+				pds_core_intr_mask(&intr_ctrl[index],
+						   PDS_CORE_INTR_MASK_CLEAR);
+			}
+		} else {
+			dev_info(dev, "%s: no intr slot for qid %d\n",
+				 __func__, qid);
+		}
+
+		/* Pass vq setup info to DSC */
+		err = pds_vdpa_cmd_init_vq(pdsv, qid, &hw->vqs[qid]);
+		if (err) {
+			pds_vdpa_release_irq(pdsv, qid);
+			ready = false;
+		}
+	} else {
+		pds_vdpa_release_irq(pdsv, qid);
+		(void) pds_vdpa_cmd_reset_vq(pdsv, qid);
+	}
+
+	hw->vqs[qid].ready = ready;
+}
+
+static bool
+pds_vdpa_get_vq_ready(struct vdpa_device *vdpa_dev, u16 qid)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	return hw->vqs[qid].ready;
+}
+
+static int
+pds_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+		      const struct vdpa_vq_state *state)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	hw->vqs[qid].used_idx = state->split.avail_index;
+	hw->vqs[qid].avail_idx = state->split.avail_index;
+
+	return 0;
+}
+
+static int
+pds_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+		      struct vdpa_vq_state *state)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	state->split.avail_index = hw->vqs[qid].avail_idx;
+
+	return 0;
+}
+
+static struct vdpa_notification_area
+pds_vdpa_get_vq_notification(struct vdpa_device *vdpa_dev, u16 qid)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+	struct virtio_pci_modern_device *vd_mdev;
+	struct vdpa_notification_area area;
+
+	area.addr = hw->vqs[qid].notify_pa;
+
+	vd_mdev = &pdsv->vdpa_aux->vdpa_vf->vd_mdev;
+	if (!vd_mdev->notify_offset_multiplier)
+		area.size = PAGE_SIZE;
+	else
+		area.size = vd_mdev->notify_offset_multiplier;
+
+	return area;
+}
+
+static int
+pds_vdpa_get_vq_irq(struct vdpa_device *vdpa_dev, u16 qid)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+	int irq = VIRTIO_MSI_NO_VECTOR;
+	int index;
+
+	if (pdsv->vdpa_aux->vdpa_vf->intrs) {
+		index = hw->vqs[qid].intr_index;
+		irq = pdsv->vdpa_aux->vdpa_vf->intrs[index].irq;
+	}
+
+	return irq;
+}
+
+static u32
+pds_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
+{
+
+	return PAGE_SIZE;
+}
+
+static u32
+pds_vdpa_get_vq_group(struct vdpa_device *vdpa_dev, u16 idx)
+{
+	return 0;
+}
+
+static u64
+pds_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+
+	return le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
+}
+
+static int
+pds_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 features)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+	struct device *dev = &pdsv->vdpa_dev.dev;
+	u64 nego_features;
+	u64 set_features;
+	u64 missing;
+	int err;
+
+	if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) {
+		dev_err(dev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n");
+		return -EOPNOTSUPP;
+	}
+
+	hw->req_features = features;
+
+	/* Check for valid feature bits */
+	nego_features = features & le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
+	missing = hw->req_features & ~nego_features;
+	if (missing) {
+		dev_err(dev, "Can't support all requested features in %#llx, missing %#llx features\n",
+			hw->req_features, missing);
+		return -EOPNOTSUPP;
+	}
+
+	dev_dbg(dev, "%s: %#llx => %#llx\n",
+		 __func__, hw->actual_features, nego_features);
+
+	if (hw->actual_features == nego_features)
+		return 0;
+
+	/* Update hw feature configuration, strip MAC bit if locally set */
+	if (pdsv->vdpa_aux->local_mac_bit)
+		set_features = nego_features & ~BIT_ULL(VIRTIO_NET_F_MAC);
+	else
+		set_features = nego_features;
+	err = pds_vdpa_cmd_set_features(pdsv, set_features);
+	if (!err)
+		hw->actual_features = nego_features;
+
+	return err;
+}
+
+static u64
+pds_vdpa_get_driver_features(struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	return hw->actual_features;
+}
+
+static void
+pds_vdpa_set_config_cb(struct vdpa_device *vdpa_dev, struct vdpa_callback *cb)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	hw->config_cb.callback = cb->callback;
+	hw->config_cb.private = cb->private;
+}
+
+static u16
+pds_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	u32 max_qlen;
+
+	max_qlen = min_t(u32, PDS_VDPA_MAX_QLEN,
+			      1 << le16_to_cpu(pdsv->vdpa_aux->ident.max_qlen));
+
+	return (u16)max_qlen;
+}
+
+static u32
+pds_vdpa_get_device_id(struct vdpa_device *vdpa_dev)
+{
+	return VIRTIO_ID_NET;
+}
+
+static u32
+pds_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev)
+{
+	return PCI_VENDOR_ID_PENSANDO;
+}
+
+static u8
+pds_vdpa_get_status(struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+
+	return hw->status;
+}
+
+static void
+pds_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+	struct device *dev = &pdsv->vdpa_dev.dev;
+	int err;
+
+	if (hw->status == status)
+		return;
+
+	/* If the DRIVER_OK bit turns on, time to start the queues */
+	if ((status ^ hw->status) & VIRTIO_CONFIG_S_DRIVER_OK) {
+		if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+			err = pds_vdpa_setup_driver(pdsv);
+			if (err) {
+				dev_err(dev, "failed to setup driver: %pe\n", ERR_PTR(err));
+				status = hw->status | VIRTIO_CONFIG_S_FAILED;
+			}
+		} else {
+			dev_warn(dev, "did not expect DRIVER_OK to be cleared\n");
+		}
+	}
+
+	err = pds_vdpa_cmd_set_status(pdsv, status);
+	if (!err)
+		hw->status = status;
+}
+
+static int
+pds_vdpa_reset(struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
+	int i;
+
+	if (hw->status == 0)
+		return 0;
+
+	if (hw->status & VIRTIO_CONFIG_S_DRIVER_OK) {
+
+		/* Reset the vqs */
+		for (i = 0; i < hw->num_vqs; i++) {
+			pds_vdpa_release_irq(pdsv, i);
+			(void) pds_vdpa_cmd_reset_vq(pdsv, i);
+
+			memset(&pdsv->hw.vqs[i], 0, sizeof(pdsv->hw.vqs[0]));
+			pdsv->hw.vqs[i].ready = false;
+		}
+	}
+
+	hw->status = 0;
+	(void) pds_vdpa_cmd_set_status(pdsv, 0);
+
+	return 0;
+}
+
+static size_t
+pds_vdpa_get_config_size(struct vdpa_device *vdpa_dev)
+{
+	return sizeof(struct virtio_net_config);
+}
+
+static void
+pds_vdpa_get_config(struct vdpa_device *vdpa_dev,
+		    unsigned int offset,
+		    void *buf, unsigned int len)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+
+	if (offset + len <= sizeof(struct virtio_net_config))
+		memcpy(buf, (u8 *)&pdsv->vn_config + offset, len);
+}
+
+static void
+pds_vdpa_set_config(struct vdpa_device *vdpa_dev,
+		    unsigned int offset, const void *buf,
+		    unsigned int len)
+{
+	/* In the virtio_net context, this callback seems to only be
+	 * called in drivers supporting the older non-VERSION_1 API,
+	 * so we can leave this an empty function, but we need  to
+	 * define the function in case it does get called, as there
+	 * are currently no checks for existence before calling in
+	 * that path.
+	 *
+	 * The implementation would be something like:
+	 * if (offset + len <= sizeof(struct virtio_net_config))
+	 *	memcpy((u8 *)&pdsv->vn_config + offset, buf, len);
+	 */
+}
+
+static const struct vdpa_config_ops pds_vdpa_ops = {
+	.set_vq_address		= pds_vdpa_set_vq_address,
+	.set_vq_num		= pds_vdpa_set_vq_num,
+	.kick_vq		= pds_vdpa_kick_vq,
+	.set_vq_cb		= pds_vdpa_set_vq_cb,
+	.set_vq_ready		= pds_vdpa_set_vq_ready,
+	.get_vq_ready		= pds_vdpa_get_vq_ready,
+	.set_vq_state		= pds_vdpa_set_vq_state,
+	.get_vq_state		= pds_vdpa_get_vq_state,
+	.get_vq_notification	= pds_vdpa_get_vq_notification,
+	.get_vq_irq		= pds_vdpa_get_vq_irq,
+	.get_vq_align		= pds_vdpa_get_vq_align,
+	.get_vq_group		= pds_vdpa_get_vq_group,
+
+	.get_device_features	= pds_vdpa_get_device_features,
+	.set_driver_features	= pds_vdpa_set_driver_features,
+	.get_driver_features	= pds_vdpa_get_driver_features,
+	.set_config_cb		= pds_vdpa_set_config_cb,
+	.get_vq_num_max		= pds_vdpa_get_vq_num_max,
+/*	.get_vq_num_min (optional) */
+	.get_device_id		= pds_vdpa_get_device_id,
+	.get_vendor_id		= pds_vdpa_get_vendor_id,
+	.get_status		= pds_vdpa_get_status,
+	.set_status		= pds_vdpa_set_status,
+	.reset			= pds_vdpa_reset,
+	.get_config_size	= pds_vdpa_get_config_size,
+	.get_config		= pds_vdpa_get_config,
+	.set_config		= pds_vdpa_set_config,
+
+/*	.get_generation (optional) */
+/*	.get_iova_range (optional) */
+/*	.set_group_asid */
+/*	.set_map (optional) */
+/*	.dma_map (optional) */
+/*	.dma_unmap (optional) */
+/*	.free (optional) */
+};
+static struct virtio_device_id pds_vdpa_id_table[] = {
+	{VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID},
+	{0},
+};
+
+static int
+pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
+		 const struct vdpa_dev_set_config *add_config)
+{
+	struct pds_vdpa_aux *vdpa_aux;
+	struct pds_vdpa_device *pdsv;
+	struct vdpa_mgmt_dev *mgmt;
+	u16 fw_max_vqs, vq_pairs;
+	struct device *dma_dev;
+	struct pds_vdpa_hw *hw;
+	struct pci_dev *pdev;
+	struct device *dev;
+	u8 mac[ETH_ALEN];
+	int err;
+	int i;
+
+	vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
+	dev = &vdpa_aux->padev->aux_dev.dev;
+	mgmt = &vdpa_aux->vdpa_mdev;
+
+	if (vdpa_aux->pdsv) {
+		dev_warn(dev, "Multiple vDPA devices on a VF is not supported.\n");
+		return -EOPNOTSUPP;
+	}
+
+	pdsv = vdpa_alloc_device(struct pds_vdpa_device, vdpa_dev,
+				 dev, &pds_vdpa_ops, 1, 1, name, false);
+	if (IS_ERR(pdsv)) {
+		dev_err(dev, "Failed to allocate vDPA structure: %pe\n", pdsv);
+		return PTR_ERR(pdsv);
+	}
+
+	vdpa_aux->pdsv = pdsv;
+	pdsv->vdpa_aux = vdpa_aux;
+	pdsv->vdpa_aux->padev->priv = pdsv;
+
+	pdev = vdpa_aux->vdpa_vf->pdev;
+	pdsv->vdpa_dev.dma_dev = &pdev->dev;
+	dma_dev = pdsv->vdpa_dev.dma_dev;
+	hw = &pdsv->hw;
+
+	pdsv->vn_config_pa = dma_map_single(dma_dev, &pdsv->vn_config,
+					    sizeof(pdsv->vn_config), DMA_FROM_DEVICE);
+	if (dma_mapping_error(dma_dev, pdsv->vn_config_pa)) {
+		dev_err(dma_dev, "Failed to map vn_config space\n");
+		pdsv->vn_config_pa = 0;
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	err = pds_vdpa_init_hw(pdsv);
+	if (err) {
+		dev_err(dev, "Failed to init hw: %pe\n", ERR_PTR(err));
+		goto err_unmap;
+	}
+
+	fw_max_vqs = le16_to_cpu(pdsv->vdpa_aux->ident.max_vqs);
+	vq_pairs = fw_max_vqs / 2;
+
+	/* Make sure we have the queues being requested */
+	if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
+		vq_pairs = add_config->net.max_vq_pairs;
+
+	hw->num_vqs = 2 * vq_pairs;
+	if (mgmt->supported_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
+		hw->num_vqs++;
+
+	if (hw->num_vqs > fw_max_vqs) {
+		dev_err(dev, "%s: queue count requested %u greater than max %u\n",
+			 __func__, hw->num_vqs, fw_max_vqs);
+		err = -ENOSPC;
+		goto err_unmap;
+	}
+
+	if (hw->num_vqs != fw_max_vqs) {
+		err = pds_vdpa_cmd_set_max_vq_pairs(pdsv, vq_pairs);
+		if (err == -ERANGE) {
+			hw->num_vqs = fw_max_vqs;
+			dev_warn(dev, "Known FW issue - overriding to use max_vq_pairs %d\n",
+				 hw->num_vqs / 2);
+		} else if (err) {
+			dev_err(dev, "Failed to update max_vq_pairs: %pe\n",
+				ERR_PTR(err));
+			goto err_unmap;
+		}
+	}
+
+	/* Set a mac, either from the user config if provided
+	 * or set a random mac if default is 00:..:00
+	 */
+	if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) {
+		ether_addr_copy(mac, add_config->net.mac);
+		pds_vdpa_cmd_set_mac(pdsv, mac);
+	} else if (is_zero_ether_addr(pdsv->vn_config.mac)) {
+		eth_random_addr(mac);
+		pds_vdpa_cmd_set_mac(pdsv, mac);
+	}
+
+	for (i = 0; i < hw->num_vqs; i++) {
+		hw->vqs[i].qid = i;
+		hw->vqs[i].pdsv = pdsv;
+		hw->vqs[i].intr_index = VIRTIO_MSI_NO_VECTOR;
+		hw->vqs[i].notify = vp_modern_map_vq_notify(&pdsv->vdpa_aux->vdpa_vf->vd_mdev,
+							    i, &hw->vqs[i].notify_pa);
+	}
+
+	pdsv->vdpa_dev.mdev = &vdpa_aux->vdpa_mdev;
+
+	/* We use the _vdpa_register_device() call rather than the
+	 * vdpa_register_device() to avoid a deadlock because this
+	 * dev_add() is called with the vdpa_dev_lock already set
+	 * by vdpa_nl_cmd_dev_add_set_doit()
+	 */
+	err = _vdpa_register_device(&pdsv->vdpa_dev, hw->num_vqs);
+	if (err) {
+		dev_err(dev, "Failed to register to vDPA bus: %pe\n", ERR_PTR(err));
+		goto err_unmap;
+	}
+
+	pds_vdpa_debugfs_add_vdpadev(pdsv);
+	dev_info(&pdsv->vdpa_dev.dev, "Added with mac %pM\n", pdsv->vn_config.mac);
+
+	return 0;
+
+err_unmap:
+	dma_unmap_single(dma_dev, pdsv->vn_config_pa,
+			 sizeof(pdsv->vn_config), DMA_FROM_DEVICE);
+err_out:
+	put_device(&pdsv->vdpa_dev.dev);
+	vdpa_aux->pdsv = NULL;
+	return err;
+}
+
+static void
+pds_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *vdpa_dev)
+{
+	struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
+	struct pds_vdpa_aux *vdpa_aux;
+
+	dev_info(&vdpa_dev->dev, "Removed\n");
+
+	vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
+	_vdpa_unregister_device(vdpa_dev);
+	pds_vdpa_debugfs_del_vdpadev(pdsv);
+
+	if (vdpa_aux->pdsv->vn_config_pa)
+		dma_unmap_single(vdpa_dev->dma_dev, vdpa_aux->pdsv->vn_config_pa,
+				 sizeof(vdpa_aux->pdsv->vn_config), DMA_FROM_DEVICE);
+
+	vdpa_aux->pdsv = NULL;
+}
+
+static const struct vdpa_mgmtdev_ops pds_vdpa_mgmt_dev_ops = {
+	.dev_add = pds_vdpa_dev_add,
+	.dev_del = pds_vdpa_dev_del
+};
+
+int
+pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux)
+{
+	struct pds_vdpa_pci_device *vdpa_pdev;
+	struct pds_vdpa_ident_cmd ident_cmd = {
+		.opcode = PDS_VDPA_CMD_IDENT,
+		.vf_id = cpu_to_le16(vdpa_aux->vdpa_vf->vf_id),
+	};
+	struct pds_vdpa_comp ident_comp = {0};
+	struct vdpa_mgmt_dev *mgmt;
+	struct device *dma_dev;
+	dma_addr_t ident_pa;
+	struct pci_dev *pdev;
+	struct device *dev;
+	__le64 mac_bit;
+	u16 max_vqs;
+	int err;
+	int i;
+
+	vdpa_pdev = vdpa_aux->vdpa_vf;
+	pdev = vdpa_pdev->pdev;
+	dev = &vdpa_aux->padev->aux_dev.dev;
+	mgmt = &vdpa_aux->vdpa_mdev;
+
+	/* Get resource info from the device */
+	dma_dev = &pdev->dev;
+	ident_pa = dma_map_single(dma_dev, &vdpa_aux->ident,
+				  sizeof(vdpa_aux->ident), DMA_FROM_DEVICE);
+	if (dma_mapping_error(dma_dev, ident_pa)) {
+		dev_err(dma_dev, "Failed to map ident space\n");
+		return -ENOMEM;
+	}
+
+	ident_cmd.ident_pa = cpu_to_le64(ident_pa);
+	ident_cmd.len = cpu_to_le32(sizeof(vdpa_aux->ident));
+	err = vdpa_aux->padev->ops->adminq_cmd(vdpa_aux->padev,
+					       (union pds_core_adminq_cmd *)&ident_cmd,
+					       sizeof(ident_cmd),
+					       (union pds_core_adminq_comp *)&ident_comp,
+					       0);
+	dma_unmap_single(dma_dev, ident_pa,
+			 sizeof(vdpa_aux->ident), DMA_FROM_DEVICE);
+	if (err) {
+		dev_err(dev, "Failed to ident hw, status %d: %pe\n",
+			ident_comp.status, ERR_PTR(err));
+		return err;
+	}
+
+	/* The driver adds a default mac address if the device doesn't,
+	 * so we need to sure we advertise VIRTIO_NET_F_MAC
+	 */
+	mac_bit = cpu_to_le64(BIT_ULL(VIRTIO_NET_F_MAC));
+	if (!(vdpa_aux->ident.hw_features & mac_bit)) {
+		vdpa_aux->ident.hw_features |= mac_bit;
+		vdpa_aux->local_mac_bit = true;
+	}
+
+	max_vqs = le16_to_cpu(vdpa_aux->ident.max_vqs);
+	mgmt->max_supported_vqs = min_t(u16, PDS_VDPA_MAX_QUEUES, max_vqs);
+	if (max_vqs > PDS_VDPA_MAX_QUEUES)
+		dev_info(dev, "FYI - Device supports more vqs (%d) than driver (%d)\n",
+			 max_vqs, PDS_VDPA_MAX_QUEUES);
+
+	mgmt->ops = &pds_vdpa_mgmt_dev_ops;
+	mgmt->id_table = pds_vdpa_id_table;
+	mgmt->device = dev;
+	mgmt->supported_features = le64_to_cpu(vdpa_aux->ident.hw_features);
+	mgmt->config_attr_mask = BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR);
+	mgmt->config_attr_mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
+
+	/* Set up interrupts now that we know how many we might want
+	 * TX and RX pairs will share interrupts, so halve the vq count
+	 * Add another for a control queue if supported
+	 */
+	vdpa_pdev->nintrs = mgmt->max_supported_vqs / 2;
+	if (mgmt->supported_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
+		vdpa_pdev->nintrs++;
+
+	err = pci_alloc_irq_vectors(pdev, vdpa_pdev->nintrs, vdpa_pdev->nintrs,
+				    PCI_IRQ_MSIX);
+	if (err < 0) {
+		dev_err(dma_dev, "Couldn't get %d msix vectors: %pe\n",
+			vdpa_pdev->nintrs, ERR_PTR(err));
+		return err;
+	}
+	vdpa_pdev->nintrs = err;
+	err = 0;
+
+	vdpa_pdev->intrs = devm_kcalloc(&pdev->dev, vdpa_pdev->nintrs,
+					sizeof(*vdpa_pdev->intrs),
+					GFP_KERNEL);
+	if (!vdpa_pdev->intrs) {
+		vdpa_pdev->nintrs = 0;
+		pci_free_irq_vectors(pdev);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < vdpa_pdev->nintrs; i++)
+		vdpa_pdev->intrs[i].irq = VIRTIO_MSI_NO_VECTOR;
+
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst
  2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
                   ` (17 preceding siblings ...)
  2022-11-18 22:56 ` [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces Shannon Nelson
@ 2022-11-18 22:56 ` Shannon Nelson
  2022-11-22  6:35   ` Jason Wang
  18 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-18 22:56 UTC (permalink / raw)
  To: netdev, davem, kuba, mst, jasowang, virtualization
  Cc: drivers, Shannon Nelson

Signed-off-by: Shannon Nelson <snelson@pensando.io>
---
 .../ethernet/pensando/pds_vdpa.rst            | 85 +++++++++++++++++++
 MAINTAINERS                                   |  1 +
 drivers/vdpa/Kconfig                          |  7 ++
 3 files changed, 93 insertions(+)
 create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst

diff --git a/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
new file mode 100644
index 000000000000..c517f337d212
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
@@ -0,0 +1,85 @@
+.. SPDX-License-Identifier: GPL-2.0+
+.. note: can be edited and viewed with /usr/bin/formiko-vim
+
+==========================================================
+PCI vDPA driver for the Pensando(R) DSC adapter family
+==========================================================
+
+Pensando vDPA VF Device Driver
+Copyright(c) 2022 Pensando Systems, Inc
+
+Overview
+========
+
+The ``pds_vdpa`` driver is a PCI and auxiliary bus driver and supplies
+a vDPA device for use by the virtio network stack.  It is used with
+the Pensando Virtual Function devices that offer vDPA and virtio queue
+services.  It depends on the ``pds_core`` driver and hardware for the PF
+and for device configuration services.
+
+Using the device
+================
+
+The ``pds_vdpa`` device is enabled via multiple configuration steps and
+depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
+Function devices.
+
+Shown below are the steps to bind the driver to a VF and also to the
+associated auxiliary device created by the ``pds_core`` driver. This
+example assumes the pds_core and pds_vdpa modules are already
+loaded.
+
+.. code-block:: bash
+
+  #!/bin/bash
+
+  modprobe pds_core
+  modprobe pds_vdpa
+
+  PF_BDF=`grep "vDPA.*1" /sys/kernel/debug/pds_core/*/viftypes | head -1 | awk -F / '{print $6}'`
+
+  # Enable vDPA VF auxiliary device(s) in the PF
+  devlink dev param set pci/$PF_BDF name enable_vnet value true cmode runtime
+
+  # Create a VF for vDPA use
+  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
+
+  # Find the vDPA services/devices available
+  PDS_VDPA_MGMT=`vdpa mgmtdev show | grep vDPA | head -1 | cut -d: -f1`
+
+  # Create a vDPA device for use in virtio network configurations
+  vdpa dev add name vdpa1 mgmtdev $PDS_VDPA_MGMT mac 00:11:22:33:44:55
+
+  # Set up an ethernet interface on the vdpa device
+  modprobe virtio_vdpa
+
+
+
+Enabling the driver
+===================
+
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+  make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+  -> Device Drivers
+    -> Network device support (NETDEVICES [=y])
+      -> Ethernet driver support
+        -> Pensando devices
+          -> Pensando Ethernet PDS_VDPA Support
+
+Support
+=======
+
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by Pensando personnel::
+
+  netdev@vger.kernel.org
+
+For more specific support needs, please use the Pensando driver support
+email::
+
+  drivers@pensando.io
diff --git a/MAINTAINERS b/MAINTAINERS
index a4f989fa8192..a4d96e854757 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16152,6 +16152,7 @@ L:	netdev@vger.kernel.org
 S:	Supported
 F:	Documentation/networking/device_drivers/ethernet/pensando/
 F:	drivers/net/ethernet/pensando/
+F:	drivers/vdpa/pds/
 F:	include/linux/pds/
 
 PER-CPU MEMORY ALLOCATOR
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 50f45d037611..1c44df18f3da 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -86,4 +86,11 @@ config ALIBABA_ENI_VDPA
 	  VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon
 	  virtio 0.9.5 specification.
 
+config PDS_VDPA
+	tristate "vDPA driver for Pensando DSC devices"
+	select VHOST_RING
+	depends on PDS_CORE
+	help
+	  VDPA network driver for Pensando's PDS Core devices.
+
 endif # VDPA
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa
  2022-11-18 22:56 ` [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa Shannon Nelson
@ 2022-11-22  3:32   ` Jason Wang
  2022-11-22  6:36     ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-11-22  3:32 UTC (permalink / raw)
  To: Shannon Nelson, netdev, davem, kuba, mst, virtualization; +Cc: drivers


在 2022/11/19 06:56, Shannon Nelson 写道:
> The PDS vDPA device has a virtio BAR for describing itself, and
> the pds_vdpa driver needs to access it.  Here we copy liberally
> from the existing drivers/virtio/virtio_pci_modern_dev.c as it
> has what we need, but we need to modify it so that it can work
> with our device id and so we can use our own DMA mask.
>
> We suspect there is room for discussion here about making the
> existing code a little more flexible, but we thought we'd at
> least start the discussion here.


Exactly, since the virtio_pci_modern_dev.c is a library, we could tweak 
it to allow the caller to pass the device_id with the DMA mask. Then we 
can avoid code/bug duplication here.

Thanks


>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>   drivers/vdpa/pds/Makefile     |   3 +-
>   drivers/vdpa/pds/pci_drv.c    |  10 ++
>   drivers/vdpa/pds/pci_drv.h    |   2 +
>   drivers/vdpa/pds/virtio_pci.c | 283 ++++++++++++++++++++++++++++++++++
>   4 files changed, 297 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/vdpa/pds/virtio_pci.c
>
> diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
> index 3ba28a875574..b8376ab165bc 100644
> --- a/drivers/vdpa/pds/Makefile
> +++ b/drivers/vdpa/pds/Makefile
> @@ -4,4 +4,5 @@
>   obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
>   
>   pds_vdpa-y := pci_drv.o	\
> -	      debugfs.o
> +	      debugfs.o \
> +	      virtio_pci.o
> diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
> index 369e11153f21..10491e22778c 100644
> --- a/drivers/vdpa/pds/pci_drv.c
> +++ b/drivers/vdpa/pds/pci_drv.c
> @@ -44,6 +44,14 @@ pds_vdpa_pci_probe(struct pci_dev *pdev,
>   		goto err_out_free_mem;
>   	}
>   
> +	vdpa_pdev->vd_mdev.pci_dev = pdev;
> +	err = pds_vdpa_probe_virtio(&vdpa_pdev->vd_mdev);
> +	if (err) {
> +		dev_err(dev, "Unable to probe for virtio configuration: %pe\n",
> +			ERR_PTR(err));
> +		goto err_out_free_mem;
> +	}
> +
>   	pci_enable_pcie_error_reporting(pdev);
>   
>   	/* Use devres management */
> @@ -74,6 +82,7 @@ pds_vdpa_pci_probe(struct pci_dev *pdev,
>   err_out_pci_release_device:
>   	pci_disable_device(pdev);
>   err_out_free_mem:
> +	pds_vdpa_remove_virtio(&vdpa_pdev->vd_mdev);
>   	pci_disable_pcie_error_reporting(pdev);
>   	kfree(vdpa_pdev);
>   	return err;
> @@ -88,6 +97,7 @@ pds_vdpa_pci_remove(struct pci_dev *pdev)
>   	pci_clear_master(pdev);
>   	pci_disable_pcie_error_reporting(pdev);
>   	pci_disable_device(pdev);
> +	pds_vdpa_remove_virtio(&vdpa_pdev->vd_mdev);
>   	kfree(vdpa_pdev);
>   
>   	dev_info(&pdev->dev, "Removed\n");
> diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
> index 747809e0df9e..15f3b34fafa9 100644
> --- a/drivers/vdpa/pds/pci_drv.h
> +++ b/drivers/vdpa/pds/pci_drv.h
> @@ -43,4 +43,6 @@ struct pds_vdpa_pci_device {
>   	struct virtio_pci_modern_device vd_mdev;
>   };
>   
> +int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev);
> +void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev);
>   #endif /* _PCI_DRV_H */
> diff --git a/drivers/vdpa/pds/virtio_pci.c b/drivers/vdpa/pds/virtio_pci.c
> new file mode 100644
> index 000000000000..0f4ac9467199
> --- /dev/null
> +++ b/drivers/vdpa/pds/virtio_pci.c
> @@ -0,0 +1,283 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/*
> + * adapted from drivers/virtio/virtio_pci_modern_dev.c, v6.0-rc1
> + */
> +
> +#include <linux/virtio_pci_modern.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/delay.h>
> +
> +#include "pci_drv.h"
> +
> +/*
> + * pds_vdpa_map_capability - map a part of virtio pci capability
> + * @mdev: the modern virtio-pci device
> + * @off: offset of the capability
> + * @minlen: minimal length of the capability
> + * @align: align requirement
> + * @start: start from the capability
> + * @size: map size
> + * @len: the length that is actually mapped
> + * @pa: physical address of the capability
> + *
> + * Returns the io address of for the part of the capability
> + */
> +static void __iomem *
> +pds_vdpa_map_capability(struct virtio_pci_modern_device *mdev, int off,
> +			 size_t minlen, u32 align, u32 start, u32 size,
> +			 size_t *len, resource_size_t *pa)
> +{
> +	struct pci_dev *dev = mdev->pci_dev;
> +	u8 bar;
> +	u32 offset, length;
> +	void __iomem *p;
> +
> +	pci_read_config_byte(dev, off + offsetof(struct virtio_pci_cap,
> +						 bar),
> +			     &bar);
> +	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, offset),
> +			     &offset);
> +	pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
> +			      &length);
> +
> +	/* Check if the BAR may have changed since we requested the region. */
> +	if (bar >= PCI_STD_NUM_BARS || !(mdev->modern_bars & (1 << bar))) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: bar unexpectedly changed to %u\n", bar);
> +		return NULL;
> +	}
> +
> +	if (length <= start) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: bad capability len %u (>%u expected)\n",
> +			length, start);
> +		return NULL;
> +	}
> +
> +	if (length - start < minlen) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: bad capability len %u (>=%zu expected)\n",
> +			length, minlen);
> +		return NULL;
> +	}
> +
> +	length -= start;
> +
> +	if (start + offset < offset) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: map wrap-around %u+%u\n",
> +			start, offset);
> +		return NULL;
> +	}
> +
> +	offset += start;
> +
> +	if (offset & (align - 1)) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: offset %u not aligned to %u\n",
> +			offset, align);
> +		return NULL;
> +	}
> +
> +	if (length > size)
> +		length = size;
> +
> +	if (len)
> +		*len = length;
> +
> +	if (minlen + offset < minlen ||
> +	    minlen + offset > pci_resource_len(dev, bar)) {
> +		dev_err(&dev->dev,
> +			"virtio_pci: map virtio %zu@%u out of range on bar %i length %lu\n",
> +			minlen, offset,
> +			bar, (unsigned long)pci_resource_len(dev, bar));
> +		return NULL;
> +	}
> +
> +	p = pci_iomap_range(dev, bar, offset, length);
> +	if (!p)
> +		dev_err(&dev->dev,
> +			"virtio_pci: unable to map virtio %u@%u on bar %i\n",
> +			length, offset, bar);
> +	else if (pa)
> +		*pa = pci_resource_start(dev, bar) + offset;
> +
> +	return p;
> +}
> +
> +/**
> + * virtio_pci_find_capability - walk capabilities to find device info.
> + * @dev: the pci device
> + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> + * @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
> + * @bars: the bitmask of BARs
> + *
> + * Returns offset of the capability, or 0.
> + */
> +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
> +					     u32 ioresource_types, int *bars)
> +{
> +	int pos;
> +
> +	for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> +	     pos > 0;
> +	     pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> +		u8 type, bar;
> +
> +		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +							 cfg_type),
> +				     &type);
> +		pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> +							 bar),
> +				     &bar);
> +
> +		/* Ignore structures with reserved BAR values */
> +		if (bar >= PCI_STD_NUM_BARS)
> +			continue;
> +
> +		if (type == cfg_type) {
> +			if (pci_resource_len(dev, bar) &&
> +			    pci_resource_flags(dev, bar) & ioresource_types) {
> +				*bars |= (1 << bar);
> +				return pos;
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/*
> + * pds_vdpa_probe_virtio: probe the modern virtio pci device, note that the
> + * caller is required to enable PCI device before calling this function.
> + * @mdev: the modern virtio-pci device
> + *
> + * Return 0 on succeed otherwise fail
> + */
> +int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev)
> +{
> +	struct pci_dev *pci_dev = mdev->pci_dev;
> +	int err, common, isr, notify, device;
> +	u32 notify_length;
> +	u32 notify_offset;
> +
> +	/* check for a common config: if not, use legacy mode (bar 0). */
> +	common = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
> +					    IORESOURCE_IO | IORESOURCE_MEM,
> +					    &mdev->modern_bars);
> +	if (!common) {
> +		dev_info(&pci_dev->dev,
> +			 "virtio_pci: missing common config\n");
> +		return -ENODEV;
> +	}
> +
> +	/* If common is there, these should be too... */
> +	isr = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_ISR_CFG,
> +					 IORESOURCE_IO | IORESOURCE_MEM,
> +					 &mdev->modern_bars);
> +	notify = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_NOTIFY_CFG,
> +					    IORESOURCE_IO | IORESOURCE_MEM,
> +					    &mdev->modern_bars);
> +	if (!isr || !notify) {
> +		dev_err(&pci_dev->dev,
> +			"virtio_pci: missing capabilities %i/%i/%i\n",
> +			common, isr, notify);
> +		return -EINVAL;
> +	}
> +
> +	/* Device capability is only mandatory for devices that have
> +	 * device-specific configuration.
> +	 */
> +	device = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_DEVICE_CFG,
> +					    IORESOURCE_IO | IORESOURCE_MEM,
> +					    &mdev->modern_bars);
> +
> +	err = pci_request_selected_regions(pci_dev, mdev->modern_bars,
> +					   "virtio-pci-modern");
> +	if (err)
> +		return err;
> +
> +	err = -EINVAL;
> +	mdev->common = pds_vdpa_map_capability(mdev, common,
> +				      sizeof(struct virtio_pci_common_cfg), 4,
> +				      0, sizeof(struct virtio_pci_common_cfg),
> +				      NULL, NULL);
> +	if (!mdev->common)
> +		goto err_map_common;
> +	mdev->isr = pds_vdpa_map_capability(mdev, isr, sizeof(u8), 1,
> +					     0, 1,
> +					     NULL, NULL);
> +	if (!mdev->isr)
> +		goto err_map_isr;
> +
> +	/* Read notify_off_multiplier from config space. */
> +	pci_read_config_dword(pci_dev,
> +			      notify + offsetof(struct virtio_pci_notify_cap,
> +						notify_off_multiplier),
> +			      &mdev->notify_offset_multiplier);
> +	/* Read notify length and offset from config space. */
> +	pci_read_config_dword(pci_dev,
> +			      notify + offsetof(struct virtio_pci_notify_cap,
> +						cap.length),
> +			      &notify_length);
> +
> +	pci_read_config_dword(pci_dev,
> +			      notify + offsetof(struct virtio_pci_notify_cap,
> +						cap.offset),
> +			      &notify_offset);
> +
> +	/* We don't know how many VQs we'll map, ahead of the time.
> +	 * If notify length is small, map it all now.
> +	 * Otherwise, map each VQ individually later.
> +	 */
> +	if ((u64)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
> +		mdev->notify_base = pds_vdpa_map_capability(mdev, notify,
> +							     2, 2,
> +							     0, notify_length,
> +							     &mdev->notify_len,
> +							     &mdev->notify_pa);
> +		if (!mdev->notify_base)
> +			goto err_map_notify;
> +	} else {
> +		mdev->notify_map_cap = notify;
> +	}
> +
> +	/* Again, we don't know how much we should map, but PAGE_SIZE
> +	 * is more than enough for all existing devices.
> +	 */
> +	if (device) {
> +		mdev->device = pds_vdpa_map_capability(mdev, device, 0, 4,
> +							0, PAGE_SIZE,
> +							&mdev->device_len,
> +							NULL);
> +		if (!mdev->device)
> +			goto err_map_device;
> +	}
> +
> +	return 0;
> +
> +err_map_device:
> +	if (mdev->notify_base)
> +		pci_iounmap(pci_dev, mdev->notify_base);
> +err_map_notify:
> +	pci_iounmap(pci_dev, mdev->isr);
> +err_map_isr:
> +	pci_iounmap(pci_dev, mdev->common);
> +err_map_common:
> +	pci_release_selected_regions(pci_dev, mdev->modern_bars);
> +	return err;
> +}
> +
> +void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev)
> +{
> +	struct pci_dev *pci_dev = mdev->pci_dev;
> +
> +	if (mdev->device)
> +		pci_iounmap(pci_dev, mdev->device);
> +	if (mdev->notify_base)
> +		pci_iounmap(pci_dev, mdev->notify_base);
> +	pci_iounmap(pci_dev, mdev->isr);
> +	pci_iounmap(pci_dev, mdev->common);
> +	pci_release_selected_regions(pci_dev, mdev->modern_bars);
> +}


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services
  2022-11-18 22:56 ` [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services Shannon Nelson
@ 2022-11-22  3:53   ` Jason Wang
  2022-11-29 22:24     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-11-22  3:53 UTC (permalink / raw)
  To: Shannon Nelson, netdev, davem, kuba, mst, virtualization; +Cc: drivers


在 2022/11/19 06:56, Shannon Nelson 写道:
> This is the initial PCI driver framework for the new pds_vdpa VF
> device driver, an auxiliary_bus client of the pds_core driver.
> This does the very basics of registering for the new PCI
> device 1dd8:100b, setting up debugfs entries, and registering
> with devlink.
>
> The new PCI device id has not made it to the official PCI ID Repository
> yet, but will soon be registered there.
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>   drivers/vdpa/pds/Makefile       |   7 +
>   drivers/vdpa/pds/debugfs.c      |  44 +++++++
>   drivers/vdpa/pds/debugfs.h      |  22 ++++
>   drivers/vdpa/pds/pci_drv.c      | 143 +++++++++++++++++++++
>   drivers/vdpa/pds/pci_drv.h      |  46 +++++++
>   include/linux/pds/pds_core_if.h |   1 +
>   include/linux/pds/pds_vdpa.h    | 219 ++++++++++++++++++++++++++++++++
>   7 files changed, 482 insertions(+)
>   create mode 100644 drivers/vdpa/pds/Makefile
>   create mode 100644 drivers/vdpa/pds/debugfs.c
>   create mode 100644 drivers/vdpa/pds/debugfs.h
>   create mode 100644 drivers/vdpa/pds/pci_drv.c
>   create mode 100644 drivers/vdpa/pds/pci_drv.h
>   create mode 100644 include/linux/pds/pds_vdpa.h
>
> diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
> new file mode 100644
> index 000000000000..3ba28a875574
> --- /dev/null
> +++ b/drivers/vdpa/pds/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +# Copyright(c) 2022 Pensando Systems, Inc
> +
> +obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
> +
> +pds_vdpa-y := pci_drv.o	\
> +	      debugfs.o
> diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
> new file mode 100644
> index 000000000000..f5b6654ae89b
> --- /dev/null
> +++ b/drivers/vdpa/pds/debugfs.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2022 Pensando Systems, Inc */
> +
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/types.h>
> +
> +#include <linux/pds/pds_core_if.h>
> +#include <linux/pds/pds_vdpa.h>
> +
> +#include "pci_drv.h"
> +#include "debugfs.h"
> +
> +#ifdef CONFIG_DEBUG_FS
> +
> +static struct dentry *dbfs_dir;
> +
> +void
> +pds_vdpa_debugfs_create(void)
> +{
> +	dbfs_dir = debugfs_create_dir(PDS_VDPA_DRV_NAME, NULL);
> +}
> +
> +void
> +pds_vdpa_debugfs_destroy(void)
> +{
> +	debugfs_remove_recursive(dbfs_dir);
> +	dbfs_dir = NULL;
> +}
> +
> +void
> +pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
> +{
> +	vdpa_pdev->dentry = debugfs_create_dir(pci_name(vdpa_pdev->pdev), dbfs_dir);
> +}
> +
> +void
> +pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
> +{
> +	debugfs_remove_recursive(vdpa_pdev->dentry);
> +	vdpa_pdev->dentry = NULL;
> +}
> +
> +#endif /* CONFIG_DEBUG_FS */
> diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
> new file mode 100644
> index 000000000000..ac31ab47746b
> --- /dev/null
> +++ b/drivers/vdpa/pds/debugfs.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright(c) 2022 Pensando Systems, Inc */
> +
> +#ifndef _PDS_VDPA_DEBUGFS_H_
> +#define _PDS_VDPA_DEBUGFS_H_
> +
> +#include <linux/debugfs.h>
> +
> +#ifdef CONFIG_DEBUG_FS
> +
> +void pds_vdpa_debugfs_create(void);
> +void pds_vdpa_debugfs_destroy(void);
> +void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
> +void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
> +#else
> +static inline void pds_vdpa_debugfs_create(void) { }
> +static inline void pds_vdpa_debugfs_destroy(void) { }
> +static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
> +static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
> +#endif
> +
> +#endif /* _PDS_VDPA_DEBUGFS_H_ */
> diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
> new file mode 100644
> index 000000000000..369e11153f21
> --- /dev/null
> +++ b/drivers/vdpa/pds/pci_drv.c
> @@ -0,0 +1,143 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2022 Pensando Systems, Inc */
> +
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/aer.h>
> +#include <linux/types.h>
> +#include <linux/vdpa.h>
> +
> +#include <linux/pds/pds_core_if.h>
> +#include <linux/pds/pds_vdpa.h>
> +
> +#include "pci_drv.h"
> +#include "debugfs.h"
> +
> +static void
> +pds_vdpa_dma_action(void *data)
> +{
> +	pci_free_irq_vectors((struct pci_dev *)data);
> +}


Nit: since we're release irq vectors, it might be better to use 
"pds_vdpa_irq_action"


> +
> +static int
> +pds_vdpa_pci_probe(struct pci_dev *pdev,
> +		   const struct pci_device_id *id)
> +{
> +	struct pds_vdpa_pci_device *vdpa_pdev;
> +	struct device *dev = &pdev->dev;
> +	int err;
> +
> +	vdpa_pdev = kzalloc(sizeof(*vdpa_pdev), GFP_KERNEL);
> +	if (!vdpa_pdev)
> +		return -ENOMEM;
> +	pci_set_drvdata(pdev, vdpa_pdev);
> +
> +	vdpa_pdev->pdev = pdev;
> +	vdpa_pdev->vf_id = pci_iov_vf_id(pdev);
> +	vdpa_pdev->pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
> +
> +	/* Query system for DMA addressing limitation for the device. */
> +	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(PDS_CORE_ADDR_LEN));
> +	if (err) {
> +		dev_err(dev, "Unable to obtain 64-bit DMA for consistent allocations, aborting. %pe\n",
> +			ERR_PTR(err));
> +		goto err_out_free_mem;
> +	}
> +
> +	pci_enable_pcie_error_reporting(pdev);
> +
> +	/* Use devres management */
> +	err = pcim_enable_device(pdev);
> +	if (err) {
> +		dev_err(dev, "Cannot enable PCI device: %pe\n", ERR_PTR(err));
> +		goto err_out_free_mem;
> +	}
> +
> +	err = devm_add_action_or_reset(dev, pds_vdpa_dma_action, pdev);
> +	if (err) {
> +		dev_err(dev, "Failed adding devres for freeing irq vectors: %pe\n",
> +			ERR_PTR(err));
> +		goto err_out_pci_release_device;
> +	}
> +
> +	pci_set_master(pdev);
> +
> +	pds_vdpa_debugfs_add_pcidev(vdpa_pdev);
> +
> +	dev_info(dev, "%s: PF %#04x VF %#04x (%d) vf_id %d domain %d vdpa_aux %p vdpa_pdev %p\n",
> +		 __func__, pci_dev_id(vdpa_pdev->pdev->physfn),
> +		 vdpa_pdev->pci_id, vdpa_pdev->pci_id, vdpa_pdev->vf_id,
> +		 pci_domain_nr(pdev->bus), vdpa_pdev->vdpa_aux, vdpa_pdev);
> +
> +	return 0;
> +
> +err_out_pci_release_device:
> +	pci_disable_device(pdev);


Do we still need to care about this consider we use 
devres/pcim_enable_device()?


> +err_out_free_mem:
> +	pci_disable_pcie_error_reporting(pdev);
> +	kfree(vdpa_pdev);
> +	return err;
> +}
> +
> +static void
> +pds_vdpa_pci_remove(struct pci_dev *pdev)
> +{
> +	struct pds_vdpa_pci_device *vdpa_pdev = pci_get_drvdata(pdev);
> +
> +	pds_vdpa_debugfs_del_pcidev(vdpa_pdev);
> +	pci_clear_master(pdev);
> +	pci_disable_pcie_error_reporting(pdev);
> +	pci_disable_device(pdev);
> +	kfree(vdpa_pdev);
> +
> +	dev_info(&pdev->dev, "Removed\n");
> +}
> +
> +static const struct pci_device_id
> +pds_vdpa_pci_table[] = {
> +	{ PCI_VDEVICE(PENSANDO, PCI_DEVICE_ID_PENSANDO_VDPA_VF) },
> +	{ 0, }
> +};
> +MODULE_DEVICE_TABLE(pci, pds_vdpa_pci_table);
> +
> +static struct pci_driver
> +pds_vdpa_pci_driver = {
> +	.name = PDS_VDPA_DRV_NAME,
> +	.id_table = pds_vdpa_pci_table,
> +	.probe = pds_vdpa_pci_probe,
> +	.remove = pds_vdpa_pci_remove
> +};
> +
> +static void __exit
> +pds_vdpa_pci_cleanup(void)
> +{
> +	pci_unregister_driver(&pds_vdpa_pci_driver);
> +
> +	pds_vdpa_debugfs_destroy();
> +}
> +module_exit(pds_vdpa_pci_cleanup);
> +
> +static int __init
> +pds_vdpa_pci_init(void)
> +{
> +	int err;
> +
> +	pds_vdpa_debugfs_create();
> +
> +	err = pci_register_driver(&pds_vdpa_pci_driver);
> +	if (err) {
> +		pr_err("%s: pci driver register failed: %pe\n", __func__, ERR_PTR(err));
> +		goto err_pci;
> +	}
> +
> +	return 0;
> +
> +err_pci:
> +	pds_vdpa_debugfs_destroy();
> +	return err;
> +}
> +module_init(pds_vdpa_pci_init);
> +
> +MODULE_DESCRIPTION(PDS_VDPA_DRV_DESCRIPTION);
> +MODULE_AUTHOR("Pensando Systems, Inc");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
> new file mode 100644
> index 000000000000..747809e0df9e
> --- /dev/null
> +++ b/drivers/vdpa/pds/pci_drv.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2022 Pensando Systems, Inc */
> +
> +#ifndef _PCI_DRV_H
> +#define _PCI_DRV_H
> +
> +#include <linux/pci.h>
> +#include <linux/virtio_pci_modern.h>
> +
> +#define PDS_VDPA_DRV_NAME           "pds_vdpa"
> +#define PDS_VDPA_DRV_DESCRIPTION    "Pensando vDPA VF Device Driver"
> +
> +#define PDS_VDPA_BAR_BASE	0
> +#define PDS_VDPA_BAR_INTR	2
> +#define PDS_VDPA_BAR_DBELL	4
> +
> +struct pds_dev_bar {
> +	int           index;
> +	void __iomem  *vaddr;
> +	phys_addr_t   pa;
> +	unsigned long len;
> +};
> +
> +struct pds_vdpa_intr_info {
> +	int index;
> +	int irq;
> +	int qid;
> +	char name[32];
> +};
> +
> +struct pds_vdpa_pci_device {
> +	struct pci_dev *pdev;
> +	struct pds_vdpa_aux *vdpa_aux;
> +
> +	int vf_id;
> +	int pci_id;
> +
> +	int nintrs;
> +	struct pds_vdpa_intr_info *intrs;
> +
> +	struct dentry *dentry;
> +
> +	struct virtio_pci_modern_device vd_mdev;
> +};
> +
> +#endif /* _PCI_DRV_H */
> diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
> index 6333ec351e14..6e92697657e4 100644
> --- a/include/linux/pds/pds_core_if.h
> +++ b/include/linux/pds/pds_core_if.h
> @@ -8,6 +8,7 @@
>   
>   #define PCI_VENDOR_ID_PENSANDO			0x1dd8
>   #define PCI_DEVICE_ID_PENSANDO_CORE_PF		0x100c
> +#define PCI_DEVICE_ID_PENSANDO_VDPA_VF          0x100b
>   
>   #define PDS_CORE_BARS_MAX			4
>   #define PDS_CORE_PCI_BAR_DBELL			1
> diff --git a/include/linux/pds/pds_vdpa.h b/include/linux/pds/pds_vdpa.h
> new file mode 100644
> index 000000000000..7ecef890f175
> --- /dev/null
> +++ b/include/linux/pds/pds_vdpa.h
> @@ -0,0 +1,219 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2022 Pensando Systems, Inc */
> +
> +#ifndef _PDS_VDPA_IF_H_
> +#define _PDS_VDPA_IF_H_
> +
> +#include <linux/pds/pds_common.h>
> +
> +#define PDS_DEV_TYPE_VDPA_STR	"vDPA"
> +#define PDS_VDPA_DEV_NAME	PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR
> +
> +/*
> + * enum pds_vdpa_cmd_opcode - vDPA Device commands
> + */
> +enum pds_vdpa_cmd_opcode {
> +	PDS_VDPA_CMD_INIT		= 48,
> +	PDS_VDPA_CMD_IDENT		= 49,
> +	PDS_VDPA_CMD_RESET		= 51,
> +	PDS_VDPA_CMD_VQ_RESET		= 52,
> +	PDS_VDPA_CMD_VQ_INIT		= 53,
> +	PDS_VDPA_CMD_STATUS_UPDATE	= 54,
> +	PDS_VDPA_CMD_SET_FEATURES	= 55,
> +	PDS_VDPA_CMD_SET_ATTR		= 56,
> +};
> +
> +/**
> + * struct pds_vdpa_cmd - generic command
> + * @opcode:	Opcode
> + * @vdpa_index:	Index for vdpa subdevice
> + * @vf_id:	VF id
> + */
> +struct pds_vdpa_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +};
> +
> +/**
> + * struct pds_vdpa_comp - generic command completion
> + * @status:	Status of the command (enum pds_core_status_code)
> + * @rsvd:	Word boundary padding
> + * @color:	Color bit
> + */
> +struct pds_vdpa_comp {
> +	u8 status;
> +	u8 rsvd[14];
> +	u8 color;
> +};
> +
> +/**
> + * struct pds_vdpa_init_cmd - INIT command
> + * @opcode:	Opcode PDS_VDPA_CMD_INIT
> + * @vdpa_index: Index for vdpa subdevice
> + * @vf_id:	VF id
> + * @len:	length of config info DMA space
> + * @config_pa:	address for DMA of virtio_net_config struct


Looks like the structure is not specific to net, if yes, we may tweak 
the above comment to say it's the address of the device configuration space.


> + */
> +struct pds_vdpa_init_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +	__le32 len;
> +	__le64 config_pa;
> +};
> +
> +/**
> + * struct pds_vdpa_ident - vDPA identification data
> + * @hw_features:	vDPA features supported by device
> + * @max_vqs:		max queues available (2 queues for a single queuepair)
> + * @max_qlen:		log(2) of maximum number of descriptors
> + * @min_qlen:		log(2) of minimum number of descriptors


Note that is you have the plan to support packed virtqueue, the qlen is 
not necessarily the power of 2.


> + *
> + * This struct is used in a DMA block that is set up for the PDS_VDPA_CMD_IDENT
> + * transaction.  Set up the DMA block and send the address in the IDENT cmd
> + * data, the DSC will write the ident information, then we can remove the DMA
> + * block after reading the answer.  If the completion status is 0, then there
> + * is valid information, else there was an error and the data should be invalid.
> + */
> +struct pds_vdpa_ident {
> +	__le64 hw_features;
> +	__le16 max_vqs;
> +	__le16 max_qlen;
> +	__le16 min_qlen;
> +};
> +
> +/**
> + * struct pds_vdpa_ident_cmd - IDENT command
> + * @opcode:	Opcode PDS_VDPA_CMD_IDENT
> + * @rsvd:       Word boundary padding
> + * @vf_id:	VF id
> + * @len:	length of ident info DMA space
> + * @ident_pa:	address for DMA of ident info (struct pds_vdpa_ident)
> + *			only used for this transaction, then forgotten by DSC
> + */
> +struct pds_vdpa_ident_cmd {
> +	u8     opcode;
> +	u8     rsvd;
> +	__le16 vf_id;
> +	__le32 len;
> +	__le64 ident_pa;
> +};
> +
> +/**
> + * struct pds_vdpa_status_cmd - STATUS_UPDATE command
> + * @opcode:	Opcode PDS_VDPA_CMD_STATUS_UPDATE
> + * @vdpa_index: Index for vdpa subdevice
> + * @vf_id:	VF id
> + * @status:	new status bits
> + */
> +struct pds_vdpa_status_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +	u8     status;
> +};
> +
> +/**
> + * enum pds_vdpa_attr - List of VDPA device attributes
> + * @PDS_VDPA_ATTR_MAC:          MAC address
> + * @PDS_VDPA_ATTR_MAX_VQ_PAIRS: Max virtqueue pairs
> + */
> +enum pds_vdpa_attr {
> +	PDS_VDPA_ATTR_MAC          = 1,
> +	PDS_VDPA_ATTR_MAX_VQ_PAIRS = 2,
> +};
> +
> +/**
> + * struct pds_vdpa_setattr_cmd - SET_ATTR command
> + * @opcode:		Opcode PDS_VDPA_CMD_SET_ATTR
> + * @vdpa_index:		Index for vdpa subdevice
> + * @vf_id:		VF id
> + * @attr:		attribute to be changed (enum pds_vdpa_attr)
> + * @pad:		Word boundary padding
> + * @mac:		new mac address to be assigned as vdpa device address
> + * @max_vq_pairs:	new limit of virtqueue pairs
> + */
> +struct pds_vdpa_setattr_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +	u8     attr;
> +	u8     pad[3];
> +	union {
> +		u8 mac[6];
> +		__le16 max_vq_pairs;


So does this mean if we want to set both mac and max_vq_paris, we need 
two commands? The seems to be less efficient, since the mgmt layer could 
provision more attributes here. Can we pack all attributes into a single 
command?


> +	} __packed;
> +};
> +
> +/**
> + * struct pds_vdpa_vq_init_cmd - queue init command
> + * @opcode: Opcode PDS_VDPA_CMD_VQ_INIT
> + * @vdpa_index:	Index for vdpa subdevice
> + * @vf_id:	VF id
> + * @qid:	Queue id (bit0 clear = rx, bit0 set = tx, qid=N is ctrlq)


I wonder any reason we need to design it like this, it would be better 
to make it general to be used by other type of virtio devices.


> + * @len:	log(2) of max descriptor count
> + * @desc_addr:	DMA address of descriptor area
> + * @avail_addr:	DMA address of available descriptors (aka driver area)
> + * @used_addr:	DMA address of used descriptors (aka device area)
> + * @intr_index:	interrupt index


Is this something like MSI-X vector?


> + */
> +struct pds_vdpa_vq_init_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +	__le16 qid;
> +	__le16 len;
> +	__le64 desc_addr;
> +	__le64 avail_addr;
> +	__le64 used_addr;
> +	__le16 intr_index;
> +};
> +
> +/**
> + * struct pds_vdpa_vq_init_comp - queue init completion
> + * @status:	Status of the command (enum pds_core_status_code)
> + * @hw_qtype:	HW queue type, used in doorbell selection
> + * @hw_qindex:	HW queue index, used in doorbell selection
> + * @rsvd:	Word boundary padding
> + * @color:	Color bit


More comment is needed to know the how to use this color bit.


> + */
> +struct pds_vdpa_vq_init_comp {
> +	u8     status;
> +	u8     hw_qtype;
> +	__le16 hw_qindex;
> +	u8     rsvd[11];
> +	u8     color;
> +};
> +
> +/**
> + * struct pds_vdpa_vq_reset_cmd - queue reset command
> + * @opcode:	Opcode PDS_VDPA_CMD_VQ_RESET


Is there a chance that we could have more type of opcode here?

Thanks


> + * @vdpa_index:	Index for vdpa subdevice
> + * @vf_id:	VF id
> + * @qid:	Queue id
> + */
> +struct pds_vdpa_vq_reset_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +	__le16 qid;
> +};
> +
> +/**
> + * struct pds_vdpa_set_features_cmd - set hw features
> + * @opcode: Opcode PDS_VDPA_CMD_SET_FEATURES
> + * @vdpa_index:	Index for vdpa subdevice
> + * @vf_id:	VF id
> + * @rsvd:       Word boundary padding
> + * @features:	Feature bit mask
> + */
> +struct pds_vdpa_set_features_cmd {
> +	u8     opcode;
> +	u8     vdpa_index;
> +	__le16 vf_id;
> +	__le32 rsvd;
> +	__le64 features;
> +};
> +
> +#endif /* _PDS_VDPA_IF_H_ */


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands
  2022-11-18 22:56 ` [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands Shannon Nelson
@ 2022-11-22  6:32   ` Jason Wang
  2022-11-29 23:16     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-11-22  6:32 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, kuba, mst, virtualization, drivers

On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>
> These are the adminq commands that will be needed for
> setting up and using the vDPA device.
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/vdpa/pds/Makefile   |   1 +
>  drivers/vdpa/pds/cmds.c     | 266 ++++++++++++++++++++++++++++++++++++
>  drivers/vdpa/pds/cmds.h     |  17 +++
>  drivers/vdpa/pds/vdpa_dev.h |  60 ++++++++
>  4 files changed, 344 insertions(+)
>  create mode 100644 drivers/vdpa/pds/cmds.c
>  create mode 100644 drivers/vdpa/pds/cmds.h
>  create mode 100644 drivers/vdpa/pds/vdpa_dev.h
>

[...]

> +struct pds_vdpa_device {
> +       struct vdpa_device vdpa_dev;
> +       struct pds_vdpa_aux *vdpa_aux;
> +       struct pds_vdpa_hw hw;
> +
> +       struct virtio_net_config vn_config;
> +       dma_addr_t vn_config_pa;

So this is the dma address not necessarily pa, we'd better drop the "pa" suffix.

Thanks

> +       struct dentry *dentry;
> +};
> +
> +int pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux);
> +
> +#endif /* _VDPA_DEV_H_ */
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces
  2022-11-18 22:56 ` [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces Shannon Nelson
@ 2022-11-22  6:32   ` Jason Wang
  2022-11-30  0:11     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-11-22  6:32 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, kuba, mst, virtualization, drivers

On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>
> This is the vDPA device support, where we advertise that we can
> support the virtio queues and deal with the configuration work
> through the pds_core's adminq.
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/vdpa/pds/Makefile   |   3 +-
>  drivers/vdpa/pds/aux_drv.c  |  33 ++
>  drivers/vdpa/pds/debugfs.c  | 167 ++++++++
>  drivers/vdpa/pds/debugfs.h  |   4 +
>  drivers/vdpa/pds/vdpa_dev.c | 796 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 1002 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/vdpa/pds/vdpa_dev.c
>
> diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
> index fafd356ddf86..7fde4a4a1620 100644
> --- a/drivers/vdpa/pds/Makefile
> +++ b/drivers/vdpa/pds/Makefile
> @@ -7,4 +7,5 @@ pds_vdpa-y := aux_drv.o \
>               cmds.o \
>               pci_drv.o \
>               debugfs.o \
> -             virtio_pci.o
> +             virtio_pci.o \
> +             vdpa_dev.o
> diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
> index aef3c984dc90..83b9a5a79325 100644
> --- a/drivers/vdpa/pds/aux_drv.c
> +++ b/drivers/vdpa/pds/aux_drv.c
> @@ -12,6 +12,7 @@
>  #include <linux/pds/pds_vdpa.h>
>
>  #include "aux_drv.h"
> +#include "vdpa_dev.h"
>  #include "pci_drv.h"
>  #include "debugfs.h"
>
> @@ -25,10 +26,25 @@ static void
>  pds_vdpa_aux_notify_handler(struct pds_auxiliary_dev *padev,
>                             union pds_core_notifyq_comp *event)
>  {
> +       struct pds_vdpa_device *pdsv = padev->priv;
>         struct device *dev = &padev->aux_dev.dev;
>         u16 ecode = le16_to_cpu(event->ecode);
>
>         dev_info(dev, "%s: event code %d\n", __func__, ecode);
> +
> +       /* Give the upper layers a hint that something interesting
> +        * may have happened.  It seems that the only thing this
> +        * triggers in the virtio-net drivers above us is a check
> +        * of link status.
> +        *
> +        * We don't set the NEEDS_RESET flag for EVENT_RESET
> +        * because we're likely going through a recovery or
> +        * fw_update and will be back up and running soon.
> +        */
> +       if (ecode == PDS_EVENT_RESET || ecode == PDS_EVENT_LINK_CHANGE) {
> +               if (pdsv->hw.config_cb.callback)
> +                       pdsv->hw.config_cb.callback(pdsv->hw.config_cb.private);
> +       }
>  }
>
>  static int
> @@ -80,10 +96,25 @@ pds_vdpa_aux_probe(struct auxiliary_device *aux_dev,
>                 goto err_register_client;
>         }
>
> +       /* Get device ident info and set up the vdpa_mgmt_dev */
> +       err = pds_vdpa_get_mgmt_info(vdpa_aux);
> +       if (err)
> +               goto err_register_client;
> +
> +       /* Let vdpa know that we can provide devices */
> +       err = vdpa_mgmtdev_register(&vdpa_aux->vdpa_mdev);
> +       if (err) {
> +               dev_err(dev, "%s: Failed to initialize vdpa_mgmt interface: %pe\n",
> +                       __func__, ERR_PTR(err));
> +               goto err_mgmt_reg;
> +       }
> +
>         pds_vdpa_debugfs_add_ident(vdpa_aux);
>
>         return 0;
>
> +err_mgmt_reg:
> +       padev->ops->unregister_client(padev);
>  err_register_client:
>         auxiliary_set_drvdata(aux_dev, NULL);
>  err_invalid_driver:
> @@ -98,6 +129,8 @@ pds_vdpa_aux_remove(struct auxiliary_device *aux_dev)
>         struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
>         struct device *dev = &aux_dev->dev;
>
> +       vdpa_mgmtdev_unregister(&vdpa_aux->vdpa_mdev);
> +
>         vdpa_aux->padev->ops->unregister_client(vdpa_aux->padev);
>         if (vdpa_aux->vdpa_vf)
>                 pci_dev_put(vdpa_aux->vdpa_vf->pdev);
> diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
> index f766412209df..aa3143126a7e 100644
> --- a/drivers/vdpa/pds/debugfs.c
> +++ b/drivers/vdpa/pds/debugfs.c
> @@ -11,6 +11,7 @@
>  #include <linux/pds/pds_auxbus.h>
>  #include <linux/pds/pds_vdpa.h>
>
> +#include "vdpa_dev.h"
>  #include "aux_drv.h"
>  #include "pci_drv.h"
>  #include "debugfs.h"
> @@ -19,6 +20,72 @@
>
>  static struct dentry *dbfs_dir;
>
> +#define PRINT_SBIT_NAME(__seq, __f, __name)                     \
> +       do {                                                    \
> +               if (__f & __name)                               \
> +                       seq_printf(__seq, " %s", &#__name[16]); \
> +       } while (0)
> +
> +static void
> +print_status_bits(struct seq_file *seq, u16 status)
> +{
> +       seq_puts(seq, "status:");
> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER);
> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER_OK);
> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FEATURES_OK);
> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_NEEDS_RESET);
> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FAILED);
> +       seq_puts(seq, "\n");
> +}
> +
> +#define PRINT_FBIT_NAME(__seq, __f, __name)                \
> +       do {                                               \
> +               if (__f & BIT_ULL(__name))                 \
> +                       seq_printf(__seq, " %s", #__name); \
> +       } while (0)
> +
> +static void
> +print_feature_bits(struct seq_file *seq, u64 features)
> +{
> +       seq_puts(seq, "features:");
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CSUM);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_CSUM);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MTU);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MAC);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO4);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO6);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ECN);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_UFO);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO4);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO6);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_ECN);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_UFO);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MRG_RXBUF);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STATUS);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VQ);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VLAN);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX_EXTRA);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ANNOUNCE);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MQ);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_MAC_ADDR);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HASH_REPORT);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSS);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSC_EXT);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STANDBY);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_SPEED_DUPLEX);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_NOTIFY_ON_EMPTY);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ANY_LAYOUT);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_VERSION_1);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ACCESS_PLATFORM);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_RING_PACKED);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ORDER_PLATFORM);
> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_SR_IOV);
> +       seq_puts(seq, "\n");
> +}
> +
>  void
>  pds_vdpa_debugfs_create(void)
>  {
> @@ -49,10 +116,18 @@ static int
>  identity_show(struct seq_file *seq, void *v)
>  {
>         struct pds_vdpa_aux *vdpa_aux = seq->private;
> +       struct vdpa_mgmt_dev *mgmt;
>
>         seq_printf(seq, "aux_dev:            %s\n",
>                    dev_name(&vdpa_aux->padev->aux_dev.dev));
>
> +       mgmt = &vdpa_aux->vdpa_mdev;
> +       seq_printf(seq, "max_vqs:            %d\n", mgmt->max_supported_vqs);
> +       seq_printf(seq, "config_attr_mask:   %#llx\n", mgmt->config_attr_mask);
> +       seq_printf(seq, "supported_features: %#llx\n", mgmt->supported_features);
> +       print_feature_bits(seq, mgmt->supported_features);
> +       seq_printf(seq, "local_mac_bit:      %d\n", vdpa_aux->local_mac_bit);
> +
>         return 0;
>  }
>  DEFINE_SHOW_ATTRIBUTE(identity);
> @@ -64,4 +139,96 @@ pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux)
>                             vdpa_aux, &identity_fops);
>  }
>
> +static int
> +config_show(struct seq_file *seq, void *v)
> +{
> +       struct pds_vdpa_device *pdsv = seq->private;
> +       struct virtio_net_config *vc = &pdsv->vn_config;
> +
> +       seq_printf(seq, "mac:                  %pM\n", vc->mac);
> +       seq_printf(seq, "max_virtqueue_pairs:  %d\n",
> +                  __virtio16_to_cpu(true, vc->max_virtqueue_pairs));
> +       seq_printf(seq, "mtu:                  %d\n", __virtio16_to_cpu(true, vc->mtu));
> +       seq_printf(seq, "speed:                %d\n", le32_to_cpu(vc->speed));
> +       seq_printf(seq, "duplex:               %d\n", vc->duplex);
> +       seq_printf(seq, "rss_max_key_size:     %d\n", vc->rss_max_key_size);
> +       seq_printf(seq, "rss_max_indirection_table_length: %d\n",
> +                  le16_to_cpu(vc->rss_max_indirection_table_length));
> +       seq_printf(seq, "supported_hash_types: %#x\n",
> +                  le32_to_cpu(vc->supported_hash_types));
> +       seq_printf(seq, "vn_status:            %#x\n",
> +                  __virtio16_to_cpu(true, vc->status));
> +       print_status_bits(seq, __virtio16_to_cpu(true, vc->status));
> +
> +       seq_printf(seq, "hw_status:            %#x\n", pdsv->hw.status);
> +       print_status_bits(seq, pdsv->hw.status);
> +       seq_printf(seq, "req_features:         %#llx\n", pdsv->hw.req_features);
> +       print_feature_bits(seq, pdsv->hw.req_features);
> +       seq_printf(seq, "actual_features:      %#llx\n", pdsv->hw.actual_features);
> +       print_feature_bits(seq, pdsv->hw.actual_features);
> +       seq_printf(seq, "vdpa_index:           %d\n", pdsv->hw.vdpa_index);
> +       seq_printf(seq, "num_vqs:              %d\n", pdsv->hw.num_vqs);
> +
> +       return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(config);
> +
> +static int
> +vq_show(struct seq_file *seq, void *v)
> +{
> +       struct pds_vdpa_vq_info *vq = seq->private;
> +       struct pds_vdpa_intr_info *intrs;
> +
> +       seq_printf(seq, "ready:      %d\n", vq->ready);
> +       seq_printf(seq, "desc_addr:  %#llx\n", vq->desc_addr);
> +       seq_printf(seq, "avail_addr: %#llx\n", vq->avail_addr);
> +       seq_printf(seq, "used_addr:  %#llx\n", vq->used_addr);
> +       seq_printf(seq, "q_len:      %d\n", vq->q_len);
> +       seq_printf(seq, "qid:        %d\n", vq->qid);
> +
> +       seq_printf(seq, "doorbell:   %#llx\n", vq->doorbell);
> +       seq_printf(seq, "avail_idx:  %d\n", vq->avail_idx);
> +       seq_printf(seq, "used_idx:   %d\n", vq->used_idx);
> +       seq_printf(seq, "intr_index: %d\n", vq->intr_index);
> +
> +       intrs = vq->pdsv->vdpa_aux->vdpa_vf->intrs;
> +       seq_printf(seq, "irq:        %d\n", intrs[vq->intr_index].irq);
> +       seq_printf(seq, "irq-name:   %s\n", intrs[vq->intr_index].name);
> +
> +       seq_printf(seq, "hw_qtype:   %d\n", vq->hw_qtype);
> +       seq_printf(seq, "hw_qindex:  %d\n", vq->hw_qindex);
> +
> +       return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(vq);
> +
> +void
> +pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv)
> +{
> +       struct dentry *dentry;
> +       const char *name;
> +       int i;
> +
> +       dentry = pdsv->vdpa_aux->vdpa_vf->dentry;
> +       name = dev_name(&pdsv->vdpa_dev.dev);
> +
> +       pdsv->dentry = debugfs_create_dir(name, dentry);
> +
> +       debugfs_create_file("config", 0400, pdsv->dentry, pdsv, &config_fops);
> +
> +       for (i = 0; i < pdsv->hw.num_vqs; i++) {
> +               char name[8];
> +
> +               snprintf(name, sizeof(name), "vq%02d", i);
> +               debugfs_create_file(name, 0400, pdsv->dentry, &pdsv->hw.vqs[i], &vq_fops);
> +       }
> +}
> +
> +void
> +pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv)
> +{
> +       debugfs_remove_recursive(pdsv->dentry);
> +       pdsv->dentry = NULL;
> +}
> +
>  #endif /* CONFIG_DEBUG_FS */
> diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
> index 939a4c248aac..f0567e4ee4e4 100644
> --- a/drivers/vdpa/pds/debugfs.h
> +++ b/drivers/vdpa/pds/debugfs.h
> @@ -13,12 +13,16 @@ void pds_vdpa_debugfs_destroy(void);
>  void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
>  void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
>  void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux);
> +void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv);
> +void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv);
>  #else
>  static inline void pds_vdpa_debugfs_create(void) { }
>  static inline void pds_vdpa_debugfs_destroy(void) { }
>  static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
>  static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
>  static inline void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux) { }
> +static inline void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv) { }
> +static inline void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv) { }
>  #endif
>
>  #endif /* _PDS_VDPA_DEBUGFS_H_ */
> diff --git a/drivers/vdpa/pds/vdpa_dev.c b/drivers/vdpa/pds/vdpa_dev.c
> new file mode 100644
> index 000000000000..824be42aff0d
> --- /dev/null
> +++ b/drivers/vdpa/pds/vdpa_dev.c
> @@ -0,0 +1,796 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2022 Pensando Systems, Inc */
> +
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/sysfs.h>
> +#include <linux/types.h>
> +#include <linux/vdpa.h>
> +#include <uapi/linux/virtio_pci.h>
> +#include <uapi/linux/vdpa.h>
> +
> +#include <linux/pds/pds_intr.h>
> +#include <linux/pds/pds_core_if.h>
> +#include <linux/pds/pds_adminq.h>
> +#include <linux/pds/pds_auxbus.h>
> +#include <linux/pds/pds_vdpa.h>
> +
> +#include "vdpa_dev.h"
> +#include "pci_drv.h"
> +#include "aux_drv.h"
> +#include "pci_drv.h"
> +#include "cmds.h"
> +#include "debugfs.h"
> +
> +static int
> +pds_vdpa_setup_driver(struct pds_vdpa_device *pdsv)
> +{
> +       struct device *dev = &pdsv->vdpa_dev.dev;
> +       int err = 0;
> +       int i;
> +
> +       /* Verify all vqs[] are in ready state */
> +       for (i = 0; i < pdsv->hw.num_vqs; i++) {
> +               if (!pdsv->hw.vqs[i].ready) {
> +                       dev_warn(dev, "%s: qid %d not ready\n", __func__, i);
> +                       err = -ENOENT;
> +               }
> +       }
> +
> +       return err;
> +}
> +
> +static struct pds_vdpa_device *
> +vdpa_to_pdsv(struct vdpa_device *vdpa_dev)
> +{
> +       return container_of(vdpa_dev, struct pds_vdpa_device, vdpa_dev);
> +}
> +
> +static struct pds_vdpa_hw *
> +vdpa_to_hw(struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +
> +       return &pdsv->hw;
> +}
> +
> +static int
> +pds_vdpa_set_vq_address(struct vdpa_device *vdpa_dev, u16 qid,
> +                       u64 desc_addr, u64 driver_addr, u64 device_addr)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       hw->vqs[qid].desc_addr = desc_addr;
> +       hw->vqs[qid].avail_addr = driver_addr;
> +       hw->vqs[qid].used_addr = device_addr;
> +
> +       return 0;
> +}
> +
> +static void
> +pds_vdpa_set_vq_num(struct vdpa_device *vdpa_dev, u16 qid, u32 num)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       hw->vqs[qid].q_len = num;
> +}
> +
> +static void
> +pds_vdpa_kick_vq(struct vdpa_device *vdpa_dev, u16 qid)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +
> +       iowrite16(qid, pdsv->hw.vqs[qid].notify);
> +}
> +
> +static void
> +pds_vdpa_set_vq_cb(struct vdpa_device *vdpa_dev, u16 qid,
> +                  struct vdpa_callback *cb)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       hw->vqs[qid].event_cb = *cb;
> +}
> +
> +static irqreturn_t
> +pds_vdpa_isr(int irq, void *data)
> +{
> +       struct pds_core_intr __iomem *intr_ctrl;
> +       struct pds_vdpa_device *pdsv;
> +       struct pds_vdpa_vq_info *vq;
> +
> +       vq = data;
> +       pdsv = vq->pdsv;
> +
> +       if (vq->event_cb.callback)
> +               vq->event_cb.callback(vq->event_cb.private);
> +
> +       /* Since we don't actually know how many vq descriptors are
> +        * covered in this interrupt cycle, we simply clean all the
> +        * credits and re-enable the interrupt.
> +        */
> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
> +       pds_core_intr_clean_flags(&intr_ctrl[vq->intr_index],
> +                                 PDS_CORE_INTR_CRED_REARM);
> +
> +       return IRQ_HANDLED;
> +}
> +
> +static void
> +pds_vdpa_release_irq(struct pds_vdpa_device *pdsv, int qid)
> +{
> +       struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
> +       struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
> +       struct pds_core_intr __iomem *intr_ctrl;
> +       int index;
> +
> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
> +       index = pdsv->hw.vqs[qid].intr_index;
> +       if (index == VIRTIO_MSI_NO_VECTOR)
> +               return;
> +
> +       if (intrs[index].irq == VIRTIO_MSI_NO_VECTOR)
> +               return;
> +
> +       if (qid & 0x1) {
> +               pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
> +       } else {
> +               pds_core_intr_mask(&intr_ctrl[index], PDS_CORE_INTR_MASK_SET);
> +               devm_free_irq(&pdev->dev, intrs[index].irq, &pdsv->hw.vqs[qid]);
> +               pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
> +               intrs[index].irq = VIRTIO_MSI_NO_VECTOR;
> +       }
> +}
> +
> +static void
> +pds_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev, u16 qid, bool ready)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +       struct device *dev = &pdsv->vdpa_dev.dev;
> +       struct pds_core_intr __iomem *intr_ctrl;
> +       int err;
> +
> +       dev_dbg(dev, "%s: qid %d ready %d => %d\n",
> +                __func__, qid, hw->vqs[qid].ready, ready);
> +       if (ready == hw->vqs[qid].ready)
> +               return;
> +
> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;

It looks to me pds has a different layout/semantic for isr than virtio
spec. I'd suggest to not reuse spec isr here to avoid confusion.

> +       if (ready) {

Spec said no interrupt before DRIVER_OK, it looks more simple if we
move the interrupt setup to set_status():

E.g we can know if we have sufficient vectors and use different
mapping policies in advance.

> +               struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
> +               int index = VIRTIO_MSI_NO_VECTOR;
> +               int i;
> +
> +               /*  Tx and Rx queues share interrupts, and they start with
> +                *  even numbers, so only find an interrupt for the even numbered
> +                *  qid, and let the odd number use what the previous queue got.
> +                */
> +               if (qid & 0x1) {
> +                       int even = qid & ~0x1;
> +
> +                       index = hw->vqs[even].intr_index;
> +               } else {
> +                       for (i = 0; i < pdsv->vdpa_aux->vdpa_vf->nintrs; i++) {
> +                               if (intrs[i].irq == VIRTIO_MSI_NO_VECTOR) {
> +                                       index = i;
> +                                       break;
> +                               }
> +                       }
> +               }
> +
> +               if (qid & 0x1) {
> +                       hw->vqs[qid].intr_index = index;
> +               } else if (index != VIRTIO_MSI_NO_VECTOR) {
> +                       int irq;
> +
> +                       irq = pci_irq_vector(pdev, index);
> +                       snprintf(intrs[index].name, sizeof(intrs[index].name),
> +                                "vdpa-%s-%d", dev_name(dev), index);
> +
> +                       err = devm_request_irq(&pdev->dev, irq, pds_vdpa_isr, 0,
> +                                              intrs[index].name, &hw->vqs[qid]);
> +                       if (err) {
> +                               dev_info(dev, "%s: no irq for qid %d: %pe\n",
> +                                        __func__, qid, ERR_PTR(err));

Should we fail?

> +                       } else {
> +                               intrs[index].irq = irq;
> +                               hw->vqs[qid].intr_index = index;
> +                               pds_core_intr_mask(&intr_ctrl[index],
> +                                                  PDS_CORE_INTR_MASK_CLEAR);

I guess the reason that you don't simply use VF MSI-X is the DPU can
support vDPA subdevice in the future?

> +                       }
> +               } else {
> +                       dev_info(dev, "%s: no intr slot for qid %d\n",
> +                                __func__, qid);

Do we need to fail here?

> +               }
> +
> +               /* Pass vq setup info to DSC */
> +               err = pds_vdpa_cmd_init_vq(pdsv, qid, &hw->vqs[qid]);
> +               if (err) {
> +                       pds_vdpa_release_irq(pdsv, qid);
> +                       ready = false;
> +               }
> +       } else {
> +               pds_vdpa_release_irq(pdsv, qid);
> +               (void) pds_vdpa_cmd_reset_vq(pdsv, qid);
> +       }
> +
> +       hw->vqs[qid].ready = ready;
> +}
> +
> +static bool
> +pds_vdpa_get_vq_ready(struct vdpa_device *vdpa_dev, u16 qid)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       return hw->vqs[qid].ready;
> +}
> +
> +static int
> +pds_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> +                     const struct vdpa_vq_state *state)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       hw->vqs[qid].used_idx = state->split.avail_index;
> +       hw->vqs[qid].avail_idx = state->split.avail_index;
> +
> +       return 0;
> +}
> +
> +static int
> +pds_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> +                     struct vdpa_vq_state *state)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       state->split.avail_index = hw->vqs[qid].avail_idx;

Who is in charge of reading avail_idx from the hardware?

> +
> +       return 0;
> +}
> +
> +static struct vdpa_notification_area
> +pds_vdpa_get_vq_notification(struct vdpa_device *vdpa_dev, u16 qid)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +       struct virtio_pci_modern_device *vd_mdev;
> +       struct vdpa_notification_area area;
> +
> +       area.addr = hw->vqs[qid].notify_pa;
> +
> +       vd_mdev = &pdsv->vdpa_aux->vdpa_vf->vd_mdev;
> +       if (!vd_mdev->notify_offset_multiplier)
> +               area.size = PAGE_SIZE;
> +       else
> +               area.size = vd_mdev->notify_offset_multiplier;
> +
> +       return area;
> +}
> +
> +static int
> +pds_vdpa_get_vq_irq(struct vdpa_device *vdpa_dev, u16 qid)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +       int irq = VIRTIO_MSI_NO_VECTOR;
> +       int index;
> +
> +       if (pdsv->vdpa_aux->vdpa_vf->intrs) {
> +               index = hw->vqs[qid].intr_index;
> +               irq = pdsv->vdpa_aux->vdpa_vf->intrs[index].irq;

The notification area mapping might only work well when each vq has
it's own irq. Otherwise guest may see spurious interrupt which may
degrade the performance.

> +       }
> +
> +       return irq;
> +}
> +
> +static u32
> +pds_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
> +{
> +
> +       return PAGE_SIZE;
> +}
> +
> +static u32
> +pds_vdpa_get_vq_group(struct vdpa_device *vdpa_dev, u16 idx)
> +{
> +       return 0;
> +}
> +
> +static u64
> +pds_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +
> +       return le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
> +}
> +
> +static int
> +pds_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 features)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +       struct device *dev = &pdsv->vdpa_dev.dev;
> +       u64 nego_features;
> +       u64 set_features;
> +       u64 missing;
> +       int err;
> +
> +       if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) {
> +               dev_err(dev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n");
> +               return -EOPNOTSUPP;

Should we fail the FEATURE_OK in this case and all the other below
error conditions?

> +       }
> +
> +       hw->req_features = features;
> +
> +       /* Check for valid feature bits */
> +       nego_features = features & le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
> +       missing = hw->req_features & ~nego_features;
> +       if (missing) {
> +               dev_err(dev, "Can't support all requested features in %#llx, missing %#llx features\n",
> +                       hw->req_features, missing);
> +               return -EOPNOTSUPP;
> +       }
> +
> +       dev_dbg(dev, "%s: %#llx => %#llx\n",
> +                __func__, hw->actual_features, nego_features);
> +
> +       if (hw->actual_features == nego_features)
> +               return 0;
> +
> +       /* Update hw feature configuration, strip MAC bit if locally set */
> +       if (pdsv->vdpa_aux->local_mac_bit)
> +               set_features = nego_features & ~BIT_ULL(VIRTIO_NET_F_MAC);

Need some document to explain how local_mac_bit work.

> +       else
> +               set_features = nego_features;
> +       err = pds_vdpa_cmd_set_features(pdsv, set_features);
> +       if (!err)
> +               hw->actual_features = nego_features;
> +
> +       return err;
> +}
> +
> +static u64
> +pds_vdpa_get_driver_features(struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       return hw->actual_features;
> +}
> +
> +static void
> +pds_vdpa_set_config_cb(struct vdpa_device *vdpa_dev, struct vdpa_callback *cb)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       hw->config_cb.callback = cb->callback;
> +       hw->config_cb.private = cb->private;
> +}
> +
> +static u16
> +pds_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       u32 max_qlen;
> +
> +       max_qlen = min_t(u32, PDS_VDPA_MAX_QLEN,
> +                             1 << le16_to_cpu(pdsv->vdpa_aux->ident.max_qlen));

Assuming we can fetch the max_qlen from the device, any reason have
another layer like PDS_VDPA_MAX_QLEN?

> +
> +       return (u16)max_qlen;
> +}
> +
> +static u32
> +pds_vdpa_get_device_id(struct vdpa_device *vdpa_dev)
> +{
> +       return VIRTIO_ID_NET;
> +}
> +
> +static u32
> +pds_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev)
> +{
> +       return PCI_VENDOR_ID_PENSANDO;
> +}
> +
> +static u8
> +pds_vdpa_get_status(struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +
> +       return hw->status;

How is this synchronized with the device or it is fully emulated by this driver?

> +}
> +
> +static void
> +pds_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +       struct device *dev = &pdsv->vdpa_dev.dev;
> +       int err;
> +
> +       if (hw->status == status)
> +               return;
> +
> +       /* If the DRIVER_OK bit turns on, time to start the queues */
> +       if ((status ^ hw->status) & VIRTIO_CONFIG_S_DRIVER_OK) {
> +               if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +                       err = pds_vdpa_setup_driver(pdsv);
> +                       if (err) {
> +                               dev_err(dev, "failed to setup driver: %pe\n", ERR_PTR(err));
> +                               status = hw->status | VIRTIO_CONFIG_S_FAILED;
> +                       }
> +               } else {
> +                       dev_warn(dev, "did not expect DRIVER_OK to be cleared\n");
> +               }
> +       }
> +
> +       err = pds_vdpa_cmd_set_status(pdsv, status);
> +       if (!err)
> +               hw->status = status;
> +}
> +
> +static int
> +pds_vdpa_reset(struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> +       int i;
> +
> +       if (hw->status == 0)
> +               return 0;
> +
> +       if (hw->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +
> +               /* Reset the vqs */
> +               for (i = 0; i < hw->num_vqs; i++) {
> +                       pds_vdpa_release_irq(pdsv, i);
> +                       (void) pds_vdpa_cmd_reset_vq(pdsv, i);

(void) is unnecessary.

> +
> +                       memset(&pdsv->hw.vqs[i], 0, sizeof(pdsv->hw.vqs[0]));
> +                       pdsv->hw.vqs[i].ready = false;
> +               }
> +       }
> +
> +       hw->status = 0;
> +       (void) pds_vdpa_cmd_set_status(pdsv, 0);
> +
> +       return 0;
> +}
> +
> +static size_t
> +pds_vdpa_get_config_size(struct vdpa_device *vdpa_dev)
> +{
> +       return sizeof(struct virtio_net_config);
> +}
> +
> +static void
> +pds_vdpa_get_config(struct vdpa_device *vdpa_dev,
> +                   unsigned int offset,
> +                   void *buf, unsigned int len)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +
> +       if (offset + len <= sizeof(struct virtio_net_config))
> +               memcpy(buf, (u8 *)&pdsv->vn_config + offset, len);
> +}
> +
> +static void
> +pds_vdpa_set_config(struct vdpa_device *vdpa_dev,
> +                   unsigned int offset, const void *buf,
> +                   unsigned int len)
> +{
> +       /* In the virtio_net context, this callback seems to only be
> +        * called in drivers supporting the older non-VERSION_1 API,
> +        * so we can leave this an empty function, but we need  to
> +        * define the function in case it does get called, as there
> +        * are currently no checks for existence before calling in
> +        * that path.
> +        *
> +        * The implementation would be something like:
> +        * if (offset + len <= sizeof(struct virtio_net_config))
> +        *      memcpy((u8 *)&pdsv->vn_config + offset, buf, len);
> +        */

And we need to notify the hardware that config has been changed.

> +}
> +
> +static const struct vdpa_config_ops pds_vdpa_ops = {
> +       .set_vq_address         = pds_vdpa_set_vq_address,
> +       .set_vq_num             = pds_vdpa_set_vq_num,
> +       .kick_vq                = pds_vdpa_kick_vq,
> +       .set_vq_cb              = pds_vdpa_set_vq_cb,
> +       .set_vq_ready           = pds_vdpa_set_vq_ready,
> +       .get_vq_ready           = pds_vdpa_get_vq_ready,
> +       .set_vq_state           = pds_vdpa_set_vq_state,
> +       .get_vq_state           = pds_vdpa_get_vq_state,
> +       .get_vq_notification    = pds_vdpa_get_vq_notification,
> +       .get_vq_irq             = pds_vdpa_get_vq_irq,
> +       .get_vq_align           = pds_vdpa_get_vq_align,
> +       .get_vq_group           = pds_vdpa_get_vq_group,
> +
> +       .get_device_features    = pds_vdpa_get_device_features,
> +       .set_driver_features    = pds_vdpa_set_driver_features,
> +       .get_driver_features    = pds_vdpa_get_driver_features,
> +       .set_config_cb          = pds_vdpa_set_config_cb,
> +       .get_vq_num_max         = pds_vdpa_get_vq_num_max,
> +/*     .get_vq_num_min (optional) */
> +       .get_device_id          = pds_vdpa_get_device_id,
> +       .get_vendor_id          = pds_vdpa_get_vendor_id,
> +       .get_status             = pds_vdpa_get_status,
> +       .set_status             = pds_vdpa_set_status,
> +       .reset                  = pds_vdpa_reset,
> +       .get_config_size        = pds_vdpa_get_config_size,
> +       .get_config             = pds_vdpa_get_config,
> +       .set_config             = pds_vdpa_set_config,
> +
> +/*     .get_generation (optional) */
> +/*     .get_iova_range (optional) */
> +/*     .set_group_asid */
> +/*     .set_map (optional) */
> +/*     .dma_map (optional) */
> +/*     .dma_unmap (optional) */
> +/*     .free (optional) */
> +};
> +static struct virtio_device_id pds_vdpa_id_table[] = {
> +       {VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID},
> +       {0},
> +};
> +
> +static int
> +pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
> +                const struct vdpa_dev_set_config *add_config)
> +{
> +       struct pds_vdpa_aux *vdpa_aux;
> +       struct pds_vdpa_device *pdsv;
> +       struct vdpa_mgmt_dev *mgmt;
> +       u16 fw_max_vqs, vq_pairs;
> +       struct device *dma_dev;
> +       struct pds_vdpa_hw *hw;
> +       struct pci_dev *pdev;
> +       struct device *dev;
> +       u8 mac[ETH_ALEN];
> +       int err;
> +       int i;
> +
> +       vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
> +       dev = &vdpa_aux->padev->aux_dev.dev;
> +       mgmt = &vdpa_aux->vdpa_mdev;
> +
> +       if (vdpa_aux->pdsv) {
> +               dev_warn(dev, "Multiple vDPA devices on a VF is not supported.\n");
> +               return -EOPNOTSUPP;
> +       }
> +
> +       pdsv = vdpa_alloc_device(struct pds_vdpa_device, vdpa_dev,
> +                                dev, &pds_vdpa_ops, 1, 1, name, false);
> +       if (IS_ERR(pdsv)) {
> +               dev_err(dev, "Failed to allocate vDPA structure: %pe\n", pdsv);
> +               return PTR_ERR(pdsv);
> +       }
> +
> +       vdpa_aux->pdsv = pdsv;
> +       pdsv->vdpa_aux = vdpa_aux;
> +       pdsv->vdpa_aux->padev->priv = pdsv;
> +
> +       pdev = vdpa_aux->vdpa_vf->pdev;
> +       pdsv->vdpa_dev.dma_dev = &pdev->dev;
> +       dma_dev = pdsv->vdpa_dev.dma_dev;
> +       hw = &pdsv->hw;
> +
> +       pdsv->vn_config_pa = dma_map_single(dma_dev, &pdsv->vn_config,
> +                                           sizeof(pdsv->vn_config), DMA_FROM_DEVICE);

I think we should use coherent mapping instead of streaming mapping
otherwise we may end up with coherency issues when accessing the
device configuration space.

> +       if (dma_mapping_error(dma_dev, pdsv->vn_config_pa)) {
> +               dev_err(dma_dev, "Failed to map vn_config space\n");
> +               pdsv->vn_config_pa = 0;
> +               err = -ENOMEM;
> +               goto err_out;
> +       }
> +
> +       err = pds_vdpa_init_hw(pdsv);
> +       if (err) {
> +               dev_err(dev, "Failed to init hw: %pe\n", ERR_PTR(err));
> +               goto err_unmap;
> +       }
> +
> +       fw_max_vqs = le16_to_cpu(pdsv->vdpa_aux->ident.max_vqs);
> +       vq_pairs = fw_max_vqs / 2;
> +
> +       /* Make sure we have the queues being requested */
> +       if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
> +               vq_pairs = add_config->net.max_vq_pairs;
> +
> +       hw->num_vqs = 2 * vq_pairs;
> +       if (mgmt->supported_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> +               hw->num_vqs++;
> +
> +       if (hw->num_vqs > fw_max_vqs) {
> +               dev_err(dev, "%s: queue count requested %u greater than max %u\n",
> +                        __func__, hw->num_vqs, fw_max_vqs);
> +               err = -ENOSPC;
> +               goto err_unmap;
> +       }
> +
> +       if (hw->num_vqs != fw_max_vqs) {
> +               err = pds_vdpa_cmd_set_max_vq_pairs(pdsv, vq_pairs);
> +               if (err == -ERANGE) {
> +                       hw->num_vqs = fw_max_vqs;
> +                       dev_warn(dev, "Known FW issue - overriding to use max_vq_pairs %d\n",
> +                                hw->num_vqs / 2);

Should we fail here? Since the device has a different max_vqp that expected.

> +               } else if (err) {
> +                       dev_err(dev, "Failed to update max_vq_pairs: %pe\n",
> +                               ERR_PTR(err));
> +                       goto err_unmap;
> +               }
> +       }
> +
> +       /* Set a mac, either from the user config if provided
> +        * or set a random mac if default is 00:..:00
> +        */
> +       if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) {
> +               ether_addr_copy(mac, add_config->net.mac);
> +               pds_vdpa_cmd_set_mac(pdsv, mac);
> +       } else if (is_zero_ether_addr(pdsv->vn_config.mac)) {
> +               eth_random_addr(mac);
> +               pds_vdpa_cmd_set_mac(pdsv, mac);
> +       }
> +
> +       for (i = 0; i < hw->num_vqs; i++) {
> +               hw->vqs[i].qid = i;
> +               hw->vqs[i].pdsv = pdsv;
> +               hw->vqs[i].intr_index = VIRTIO_MSI_NO_VECTOR;

Let's rename this as msix_vector to be aligned with the virtio spec.

> +               hw->vqs[i].notify = vp_modern_map_vq_notify(&pdsv->vdpa_aux->vdpa_vf->vd_mdev,
> +                                                           i, &hw->vqs[i].notify_pa);
> +       }
> +
> +       pdsv->vdpa_dev.mdev = &vdpa_aux->vdpa_mdev;
> +
> +       /* We use the _vdpa_register_device() call rather than the
> +        * vdpa_register_device() to avoid a deadlock because this
> +        * dev_add() is called with the vdpa_dev_lock already set
> +        * by vdpa_nl_cmd_dev_add_set_doit()
> +        */
> +       err = _vdpa_register_device(&pdsv->vdpa_dev, hw->num_vqs);
> +       if (err) {
> +               dev_err(dev, "Failed to register to vDPA bus: %pe\n", ERR_PTR(err));
> +               goto err_unmap;
> +       }
> +
> +       pds_vdpa_debugfs_add_vdpadev(pdsv);
> +       dev_info(&pdsv->vdpa_dev.dev, "Added with mac %pM\n", pdsv->vn_config.mac);

dev_dbg?

> +
> +       return 0;
> +
> +err_unmap:
> +       dma_unmap_single(dma_dev, pdsv->vn_config_pa,
> +                        sizeof(pdsv->vn_config), DMA_FROM_DEVICE);
> +err_out:
> +       put_device(&pdsv->vdpa_dev.dev);
> +       vdpa_aux->pdsv = NULL;
> +       return err;
> +}
> +
> +static void
> +pds_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *vdpa_dev)
> +{
> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> +       struct pds_vdpa_aux *vdpa_aux;
> +
> +       dev_info(&vdpa_dev->dev, "Removed\n");
> +
> +       vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
> +       _vdpa_unregister_device(vdpa_dev);
> +       pds_vdpa_debugfs_del_vdpadev(pdsv);
> +
> +       if (vdpa_aux->pdsv->vn_config_pa)
> +               dma_unmap_single(vdpa_dev->dma_dev, vdpa_aux->pdsv->vn_config_pa,
> +                                sizeof(vdpa_aux->pdsv->vn_config), DMA_FROM_DEVICE);
> +
> +       vdpa_aux->pdsv = NULL;
> +}
> +
> +static const struct vdpa_mgmtdev_ops pds_vdpa_mgmt_dev_ops = {
> +       .dev_add = pds_vdpa_dev_add,
> +       .dev_del = pds_vdpa_dev_del
> +};
> +
> +int
> +pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux)
> +{
> +       struct pds_vdpa_pci_device *vdpa_pdev;
> +       struct pds_vdpa_ident_cmd ident_cmd = {
> +               .opcode = PDS_VDPA_CMD_IDENT,
> +               .vf_id = cpu_to_le16(vdpa_aux->vdpa_vf->vf_id),
> +       };
> +       struct pds_vdpa_comp ident_comp = {0};
> +       struct vdpa_mgmt_dev *mgmt;
> +       struct device *dma_dev;
> +       dma_addr_t ident_pa;
> +       struct pci_dev *pdev;
> +       struct device *dev;
> +       __le64 mac_bit;
> +       u16 max_vqs;
> +       int err;
> +       int i;
> +
> +       vdpa_pdev = vdpa_aux->vdpa_vf;
> +       pdev = vdpa_pdev->pdev;
> +       dev = &vdpa_aux->padev->aux_dev.dev;
> +       mgmt = &vdpa_aux->vdpa_mdev;
> +
> +       /* Get resource info from the device */
> +       dma_dev = &pdev->dev;
> +       ident_pa = dma_map_single(dma_dev, &vdpa_aux->ident,
> +                                 sizeof(vdpa_aux->ident), DMA_FROM_DEVICE);

I wonder how this work. The ident_pa is mapped through VF, but the
command is sent to PF adminq if I understand correctly. If yes, this
might work but looks tricky. We'd better explain this is safe since
vDPA is not yet created so no userspace can use that. Or I wonder if
we can just piggyback the ident via the adminq response so we don't
need to worry the security implications.

Thanks

> +       if (dma_mapping_error(dma_dev, ident_pa)) {
> +               dev_err(dma_dev, "Failed to map ident space\n");
> +               return -ENOMEM;
> +       }
> +
> +       ident_cmd.ident_pa = cpu_to_le64(ident_pa);
> +       ident_cmd.len = cpu_to_le32(sizeof(vdpa_aux->ident));
> +       err = vdpa_aux->padev->ops->adminq_cmd(vdpa_aux->padev,
> +                                              (union pds_core_adminq_cmd *)&ident_cmd,
> +                                              sizeof(ident_cmd),
> +                                              (union pds_core_adminq_comp *)&ident_comp,
> +                                              0);
> +       dma_unmap_single(dma_dev, ident_pa,
> +                        sizeof(vdpa_aux->ident), DMA_FROM_DEVICE);
> +       if (err) {
> +               dev_err(dev, "Failed to ident hw, status %d: %pe\n",
> +                       ident_comp.status, ERR_PTR(err));
> +               return err;
> +       }
> +
> +       /* The driver adds a default mac address if the device doesn't,
> +        * so we need to sure we advertise VIRTIO_NET_F_MAC
> +        */
> +       mac_bit = cpu_to_le64(BIT_ULL(VIRTIO_NET_F_MAC));
> +       if (!(vdpa_aux->ident.hw_features & mac_bit)) {
> +               vdpa_aux->ident.hw_features |= mac_bit;
> +               vdpa_aux->local_mac_bit = true;
> +       }
> +
> +       max_vqs = le16_to_cpu(vdpa_aux->ident.max_vqs);
> +       mgmt->max_supported_vqs = min_t(u16, PDS_VDPA_MAX_QUEUES, max_vqs);
> +       if (max_vqs > PDS_VDPA_MAX_QUEUES)
> +               dev_info(dev, "FYI - Device supports more vqs (%d) than driver (%d)\n",
> +                        max_vqs, PDS_VDPA_MAX_QUEUES);
> +
> +       mgmt->ops = &pds_vdpa_mgmt_dev_ops;
> +       mgmt->id_table = pds_vdpa_id_table;
> +       mgmt->device = dev;
> +       mgmt->supported_features = le64_to_cpu(vdpa_aux->ident.hw_features);
> +       mgmt->config_attr_mask = BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR);
> +       mgmt->config_attr_mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
> +
> +       /* Set up interrupts now that we know how many we might want
> +        * TX and RX pairs will share interrupts, so halve the vq count
> +        * Add another for a control queue if supported
> +        */
> +       vdpa_pdev->nintrs = mgmt->max_supported_vqs / 2;
> +       if (mgmt->supported_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> +               vdpa_pdev->nintrs++;
> +
> +       err = pci_alloc_irq_vectors(pdev, vdpa_pdev->nintrs, vdpa_pdev->nintrs,
> +                                   PCI_IRQ_MSIX);
> +       if (err < 0) {
> +               dev_err(dma_dev, "Couldn't get %d msix vectors: %pe\n",
> +                       vdpa_pdev->nintrs, ERR_PTR(err));
> +               return err;
> +       }
> +       vdpa_pdev->nintrs = err;
> +       err = 0;
> +
> +       vdpa_pdev->intrs = devm_kcalloc(&pdev->dev, vdpa_pdev->nintrs,
> +                                       sizeof(*vdpa_pdev->intrs),
> +                                       GFP_KERNEL);
> +       if (!vdpa_pdev->intrs) {
> +               vdpa_pdev->nintrs = 0;
> +               pci_free_irq_vectors(pdev);
> +               return -ENOMEM;
> +       }
> +
> +       for (i = 0; i < vdpa_pdev->nintrs; i++)
> +               vdpa_pdev->intrs[i].irq = VIRTIO_MSI_NO_VECTOR;
> +
> +       return 0;
> +}
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst
  2022-11-18 22:56 ` [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst Shannon Nelson
@ 2022-11-22  6:35   ` Jason Wang
  2022-11-22 22:33     ` Shannon Nelson
  2022-11-30  0:13     ` Shannon Nelson
  0 siblings, 2 replies; 61+ messages in thread
From: Jason Wang @ 2022-11-22  6:35 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, kuba, mst, virtualization, drivers

On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  .../ethernet/pensando/pds_vdpa.rst            | 85 +++++++++++++++++++
>  MAINTAINERS                                   |  1 +
>  drivers/vdpa/Kconfig                          |  7 ++
>  3 files changed, 93 insertions(+)
>  create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>
> diff --git a/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
> new file mode 100644
> index 000000000000..c517f337d212
> --- /dev/null
> +++ b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
> @@ -0,0 +1,85 @@
> +.. SPDX-License-Identifier: GPL-2.0+
> +.. note: can be edited and viewed with /usr/bin/formiko-vim
> +
> +==========================================================
> +PCI vDPA driver for the Pensando(R) DSC adapter family
> +==========================================================
> +
> +Pensando vDPA VF Device Driver
> +Copyright(c) 2022 Pensando Systems, Inc
> +
> +Overview
> +========
> +
> +The ``pds_vdpa`` driver is a PCI and auxiliary bus driver and supplies
> +a vDPA device for use by the virtio network stack.  It is used with
> +the Pensando Virtual Function devices that offer vDPA and virtio queue
> +services.  It depends on the ``pds_core`` driver and hardware for the PF
> +and for device configuration services.
> +
> +Using the device
> +================
> +
> +The ``pds_vdpa`` device is enabled via multiple configuration steps and
> +depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
> +Function devices.
> +
> +Shown below are the steps to bind the driver to a VF and also to the
> +associated auxiliary device created by the ``pds_core`` driver. This
> +example assumes the pds_core and pds_vdpa modules are already
> +loaded.
> +
> +.. code-block:: bash
> +
> +  #!/bin/bash
> +
> +  modprobe pds_core
> +  modprobe pds_vdpa
> +
> +  PF_BDF=`grep "vDPA.*1" /sys/kernel/debug/pds_core/*/viftypes | head -1 | awk -F / '{print $6}'`
> +
> +  # Enable vDPA VF auxiliary device(s) in the PF
> +  devlink dev param set pci/$PF_BDF name enable_vnet value true cmode runtime
> +
> +  # Create a VF for vDPA use
> +  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
> +
> +  # Find the vDPA services/devices available
> +  PDS_VDPA_MGMT=`vdpa mgmtdev show | grep vDPA | head -1 | cut -d: -f1`
> +
> +  # Create a vDPA device for use in virtio network configurations
> +  vdpa dev add name vdpa1 mgmtdev $PDS_VDPA_MGMT mac 00:11:22:33:44:55
> +
> +  # Set up an ethernet interface on the vdpa device
> +  modprobe virtio_vdpa
> +
> +
> +
> +Enabling the driver
> +===================
> +
> +The driver is enabled via the standard kernel configuration system,
> +using the make command::
> +
> +  make oldconfig/menuconfig/etc.
> +
> +The driver is located in the menu structure at:
> +
> +  -> Device Drivers
> +    -> Network device support (NETDEVICES [=y])
> +      -> Ethernet driver support
> +        -> Pensando devices
> +          -> Pensando Ethernet PDS_VDPA Support
> +
> +Support
> +=======
> +
> +For general Linux networking support, please use the netdev mailing
> +list, which is monitored by Pensando personnel::
> +
> +  netdev@vger.kernel.org
> +
> +For more specific support needs, please use the Pensando driver support
> +email::
> +
> +  drivers@pensando.io
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a4f989fa8192..a4d96e854757 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16152,6 +16152,7 @@ L:      netdev@vger.kernel.org
>  S:     Supported
>  F:     Documentation/networking/device_drivers/ethernet/pensando/
>  F:     drivers/net/ethernet/pensando/
> +F:     drivers/vdpa/pds/
>  F:     include/linux/pds/
>
>  PER-CPU MEMORY ALLOCATOR
> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> index 50f45d037611..1c44df18f3da 100644
> --- a/drivers/vdpa/Kconfig
> +++ b/drivers/vdpa/Kconfig
> @@ -86,4 +86,11 @@ config ALIBABA_ENI_VDPA
>           VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon
>           virtio 0.9.5 specification.
>
> +config PDS_VDPA
> +       tristate "vDPA driver for Pensando DSC devices"
> +       select VHOST_RING

Any reason it needs to select on vringh?

Thanks

> +       depends on PDS_CORE
> +       help
> +         VDPA network driver for Pensando's PDS Core devices.
> +
>  endif # VDPA
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa
  2022-11-22  3:32   ` Jason Wang
@ 2022-11-22  6:36     ` Jason Wang
  2022-11-29 23:02       ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jason Wang @ 2022-11-22  6:36 UTC (permalink / raw)
  To: Shannon Nelson, netdev, davem, kuba, mst, virtualization; +Cc: drivers

On Tue, Nov 22, 2022 at 11:32 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/11/19 06:56, Shannon Nelson 写道:
> > The PDS vDPA device has a virtio BAR for describing itself, and
> > the pds_vdpa driver needs to access it.  Here we copy liberally
> > from the existing drivers/virtio/virtio_pci_modern_dev.c as it
> > has what we need, but we need to modify it so that it can work
> > with our device id and so we can use our own DMA mask.
> >
> > We suspect there is room for discussion here about making the
> > existing code a little more flexible, but we thought we'd at
> > least start the discussion here.
>
>
> Exactly, since the virtio_pci_modern_dev.c is a library, we could tweak
> it to allow the caller to pass the device_id with the DMA mask. Then we
> can avoid code/bug duplication here.

Btw, I found only isr/notification were used but not the others? If
this is true, we can avoid mapping those capabilities.

Thanks

>
> Thanks
>
>
> >
> > Signed-off-by: Shannon Nelson <snelson@pensando.io>
> > ---
> >   drivers/vdpa/pds/Makefile     |   3 +-
> >   drivers/vdpa/pds/pci_drv.c    |  10 ++
> >   drivers/vdpa/pds/pci_drv.h    |   2 +
> >   drivers/vdpa/pds/virtio_pci.c | 283 ++++++++++++++++++++++++++++++++++
> >   4 files changed, 297 insertions(+), 1 deletion(-)
> >   create mode 100644 drivers/vdpa/pds/virtio_pci.c
> >
> > diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
> > index 3ba28a875574..b8376ab165bc 100644
> > --- a/drivers/vdpa/pds/Makefile
> > +++ b/drivers/vdpa/pds/Makefile
> > @@ -4,4 +4,5 @@
> >   obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
> >
> >   pds_vdpa-y := pci_drv.o     \
> > -           debugfs.o
> > +           debugfs.o \
> > +           virtio_pci.o
> > diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
> > index 369e11153f21..10491e22778c 100644
> > --- a/drivers/vdpa/pds/pci_drv.c
> > +++ b/drivers/vdpa/pds/pci_drv.c
> > @@ -44,6 +44,14 @@ pds_vdpa_pci_probe(struct pci_dev *pdev,
> >               goto err_out_free_mem;
> >       }
> >
> > +     vdpa_pdev->vd_mdev.pci_dev = pdev;
> > +     err = pds_vdpa_probe_virtio(&vdpa_pdev->vd_mdev);
> > +     if (err) {
> > +             dev_err(dev, "Unable to probe for virtio configuration: %pe\n",
> > +                     ERR_PTR(err));
> > +             goto err_out_free_mem;
> > +     }
> > +
> >       pci_enable_pcie_error_reporting(pdev);
> >
> >       /* Use devres management */
> > @@ -74,6 +82,7 @@ pds_vdpa_pci_probe(struct pci_dev *pdev,
> >   err_out_pci_release_device:
> >       pci_disable_device(pdev);
> >   err_out_free_mem:
> > +     pds_vdpa_remove_virtio(&vdpa_pdev->vd_mdev);
> >       pci_disable_pcie_error_reporting(pdev);
> >       kfree(vdpa_pdev);
> >       return err;
> > @@ -88,6 +97,7 @@ pds_vdpa_pci_remove(struct pci_dev *pdev)
> >       pci_clear_master(pdev);
> >       pci_disable_pcie_error_reporting(pdev);
> >       pci_disable_device(pdev);
> > +     pds_vdpa_remove_virtio(&vdpa_pdev->vd_mdev);
> >       kfree(vdpa_pdev);
> >
> >       dev_info(&pdev->dev, "Removed\n");
> > diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
> > index 747809e0df9e..15f3b34fafa9 100644
> > --- a/drivers/vdpa/pds/pci_drv.h
> > +++ b/drivers/vdpa/pds/pci_drv.h
> > @@ -43,4 +43,6 @@ struct pds_vdpa_pci_device {
> >       struct virtio_pci_modern_device vd_mdev;
> >   };
> >
> > +int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev);
> > +void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev);
> >   #endif /* _PCI_DRV_H */
> > diff --git a/drivers/vdpa/pds/virtio_pci.c b/drivers/vdpa/pds/virtio_pci.c
> > new file mode 100644
> > index 000000000000..0f4ac9467199
> > --- /dev/null
> > +++ b/drivers/vdpa/pds/virtio_pci.c
> > @@ -0,0 +1,283 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +
> > +/*
> > + * adapted from drivers/virtio/virtio_pci_modern_dev.c, v6.0-rc1
> > + */
> > +
> > +#include <linux/virtio_pci_modern.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/delay.h>
> > +
> > +#include "pci_drv.h"
> > +
> > +/*
> > + * pds_vdpa_map_capability - map a part of virtio pci capability
> > + * @mdev: the modern virtio-pci device
> > + * @off: offset of the capability
> > + * @minlen: minimal length of the capability
> > + * @align: align requirement
> > + * @start: start from the capability
> > + * @size: map size
> > + * @len: the length that is actually mapped
> > + * @pa: physical address of the capability
> > + *
> > + * Returns the io address of for the part of the capability
> > + */
> > +static void __iomem *
> > +pds_vdpa_map_capability(struct virtio_pci_modern_device *mdev, int off,
> > +                      size_t minlen, u32 align, u32 start, u32 size,
> > +                      size_t *len, resource_size_t *pa)
> > +{
> > +     struct pci_dev *dev = mdev->pci_dev;
> > +     u8 bar;
> > +     u32 offset, length;
> > +     void __iomem *p;
> > +
> > +     pci_read_config_byte(dev, off + offsetof(struct virtio_pci_cap,
> > +                                              bar),
> > +                          &bar);
> > +     pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, offset),
> > +                          &offset);
> > +     pci_read_config_dword(dev, off + offsetof(struct virtio_pci_cap, length),
> > +                           &length);
> > +
> > +     /* Check if the BAR may have changed since we requested the region. */
> > +     if (bar >= PCI_STD_NUM_BARS || !(mdev->modern_bars & (1 << bar))) {
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: bar unexpectedly changed to %u\n", bar);
> > +             return NULL;
> > +     }
> > +
> > +     if (length <= start) {
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: bad capability len %u (>%u expected)\n",
> > +                     length, start);
> > +             return NULL;
> > +     }
> > +
> > +     if (length - start < minlen) {
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: bad capability len %u (>=%zu expected)\n",
> > +                     length, minlen);
> > +             return NULL;
> > +     }
> > +
> > +     length -= start;
> > +
> > +     if (start + offset < offset) {
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: map wrap-around %u+%u\n",
> > +                     start, offset);
> > +             return NULL;
> > +     }
> > +
> > +     offset += start;
> > +
> > +     if (offset & (align - 1)) {
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: offset %u not aligned to %u\n",
> > +                     offset, align);
> > +             return NULL;
> > +     }
> > +
> > +     if (length > size)
> > +             length = size;
> > +
> > +     if (len)
> > +             *len = length;
> > +
> > +     if (minlen + offset < minlen ||
> > +         minlen + offset > pci_resource_len(dev, bar)) {
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: map virtio %zu@%u out of range on bar %i length %lu\n",
> > +                     minlen, offset,
> > +                     bar, (unsigned long)pci_resource_len(dev, bar));
> > +             return NULL;
> > +     }
> > +
> > +     p = pci_iomap_range(dev, bar, offset, length);
> > +     if (!p)
> > +             dev_err(&dev->dev,
> > +                     "virtio_pci: unable to map virtio %u@%u on bar %i\n",
> > +                     length, offset, bar);
> > +     else if (pa)
> > +             *pa = pci_resource_start(dev, bar) + offset;
> > +
> > +     return p;
> > +}
> > +
> > +/**
> > + * virtio_pci_find_capability - walk capabilities to find device info.
> > + * @dev: the pci device
> > + * @cfg_type: the VIRTIO_PCI_CAP_* value we seek
> > + * @ioresource_types: IORESOURCE_MEM and/or IORESOURCE_IO.
> > + * @bars: the bitmask of BARs
> > + *
> > + * Returns offset of the capability, or 0.
> > + */
> > +static inline int virtio_pci_find_capability(struct pci_dev *dev, u8 cfg_type,
> > +                                          u32 ioresource_types, int *bars)
> > +{
> > +     int pos;
> > +
> > +     for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
> > +          pos > 0;
> > +          pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
> > +             u8 type, bar;
> > +
> > +             pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +                                                      cfg_type),
> > +                                  &type);
> > +             pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap,
> > +                                                      bar),
> > +                                  &bar);
> > +
> > +             /* Ignore structures with reserved BAR values */
> > +             if (bar >= PCI_STD_NUM_BARS)
> > +                     continue;
> > +
> > +             if (type == cfg_type) {
> > +                     if (pci_resource_len(dev, bar) &&
> > +                         pci_resource_flags(dev, bar) & ioresource_types) {
> > +                             *bars |= (1 << bar);
> > +                             return pos;
> > +                     }
> > +             }
> > +     }
> > +     return 0;
> > +}
> > +
> > +/*
> > + * pds_vdpa_probe_virtio: probe the modern virtio pci device, note that the
> > + * caller is required to enable PCI device before calling this function.
> > + * @mdev: the modern virtio-pci device
> > + *
> > + * Return 0 on succeed otherwise fail
> > + */
> > +int pds_vdpa_probe_virtio(struct virtio_pci_modern_device *mdev)
> > +{
> > +     struct pci_dev *pci_dev = mdev->pci_dev;
> > +     int err, common, isr, notify, device;
> > +     u32 notify_length;
> > +     u32 notify_offset;
> > +
> > +     /* check for a common config: if not, use legacy mode (bar 0). */
> > +     common = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_COMMON_CFG,
> > +                                         IORESOURCE_IO | IORESOURCE_MEM,
> > +                                         &mdev->modern_bars);
> > +     if (!common) {
> > +             dev_info(&pci_dev->dev,
> > +                      "virtio_pci: missing common config\n");
> > +             return -ENODEV;
> > +     }
> > +
> > +     /* If common is there, these should be too... */
> > +     isr = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_ISR_CFG,
> > +                                      IORESOURCE_IO | IORESOURCE_MEM,
> > +                                      &mdev->modern_bars);
> > +     notify = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_NOTIFY_CFG,
> > +                                         IORESOURCE_IO | IORESOURCE_MEM,
> > +                                         &mdev->modern_bars);
> > +     if (!isr || !notify) {
> > +             dev_err(&pci_dev->dev,
> > +                     "virtio_pci: missing capabilities %i/%i/%i\n",
> > +                     common, isr, notify);
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* Device capability is only mandatory for devices that have
> > +      * device-specific configuration.
> > +      */
> > +     device = virtio_pci_find_capability(pci_dev, VIRTIO_PCI_CAP_DEVICE_CFG,
> > +                                         IORESOURCE_IO | IORESOURCE_MEM,
> > +                                         &mdev->modern_bars);
> > +
> > +     err = pci_request_selected_regions(pci_dev, mdev->modern_bars,
> > +                                        "virtio-pci-modern");
> > +     if (err)
> > +             return err;
> > +
> > +     err = -EINVAL;
> > +     mdev->common = pds_vdpa_map_capability(mdev, common,
> > +                                   sizeof(struct virtio_pci_common_cfg), 4,
> > +                                   0, sizeof(struct virtio_pci_common_cfg),
> > +                                   NULL, NULL);
> > +     if (!mdev->common)
> > +             goto err_map_common;
> > +     mdev->isr = pds_vdpa_map_capability(mdev, isr, sizeof(u8), 1,
> > +                                          0, 1,
> > +                                          NULL, NULL);
> > +     if (!mdev->isr)
> > +             goto err_map_isr;
> > +
> > +     /* Read notify_off_multiplier from config space. */
> > +     pci_read_config_dword(pci_dev,
> > +                           notify + offsetof(struct virtio_pci_notify_cap,
> > +                                             notify_off_multiplier),
> > +                           &mdev->notify_offset_multiplier);
> > +     /* Read notify length and offset from config space. */
> > +     pci_read_config_dword(pci_dev,
> > +                           notify + offsetof(struct virtio_pci_notify_cap,
> > +                                             cap.length),
> > +                           &notify_length);
> > +
> > +     pci_read_config_dword(pci_dev,
> > +                           notify + offsetof(struct virtio_pci_notify_cap,
> > +                                             cap.offset),
> > +                           &notify_offset);
> > +
> > +     /* We don't know how many VQs we'll map, ahead of the time.
> > +      * If notify length is small, map it all now.
> > +      * Otherwise, map each VQ individually later.
> > +      */
> > +     if ((u64)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {
> > +             mdev->notify_base = pds_vdpa_map_capability(mdev, notify,
> > +                                                          2, 2,
> > +                                                          0, notify_length,
> > +                                                          &mdev->notify_len,
> > +                                                          &mdev->notify_pa);
> > +             if (!mdev->notify_base)
> > +                     goto err_map_notify;
> > +     } else {
> > +             mdev->notify_map_cap = notify;
> > +     }
> > +
> > +     /* Again, we don't know how much we should map, but PAGE_SIZE
> > +      * is more than enough for all existing devices.
> > +      */
> > +     if (device) {
> > +             mdev->device = pds_vdpa_map_capability(mdev, device, 0, 4,
> > +                                                     0, PAGE_SIZE,
> > +                                                     &mdev->device_len,
> > +                                                     NULL);
> > +             if (!mdev->device)
> > +                     goto err_map_device;
> > +     }
> > +
> > +     return 0;
> > +
> > +err_map_device:
> > +     if (mdev->notify_base)
> > +             pci_iounmap(pci_dev, mdev->notify_base);
> > +err_map_notify:
> > +     pci_iounmap(pci_dev, mdev->isr);
> > +err_map_isr:
> > +     pci_iounmap(pci_dev, mdev->common);
> > +err_map_common:
> > +     pci_release_selected_regions(pci_dev, mdev->modern_bars);
> > +     return err;
> > +}
> > +
> > +void pds_vdpa_remove_virtio(struct virtio_pci_modern_device *mdev)
> > +{
> > +     struct pci_dev *pci_dev = mdev->pci_dev;
> > +
> > +     if (mdev->device)
> > +             pci_iounmap(pci_dev, mdev->device);
> > +     if (mdev->notify_base)
> > +             pci_iounmap(pci_dev, mdev->notify_base);
> > +     pci_iounmap(pci_dev, mdev->isr);
> > +     pci_iounmap(pci_dev, mdev->common);
> > +     pci_release_selected_regions(pci_dev, mdev->modern_bars);
> > +}


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst
  2022-11-22  6:35   ` Jason Wang
@ 2022-11-22 22:33     ` Shannon Nelson
  2022-11-30  0:13     ` Shannon Nelson
  1 sibling, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-22 22:33 UTC (permalink / raw)
  To: Jason Wang, Shannon Nelson
  Cc: netdev, davem, kuba, mst, virtualization, drivers

On 11/21/22 10:35 PM, Jason Wang wrote:
> 
> On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>>
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
>> ---
>>   .../ethernet/pensando/pds_vdpa.rst            | 85 +++++++++++++++++++
>>   MAINTAINERS                                   |  1 +
>>   drivers/vdpa/Kconfig                          |  7 ++
>>   3 files changed, 93 insertions(+)
>>   create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>>
>> diff --git a/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>> new file mode 100644
>> index 000000000000..c517f337d212
>> --- /dev/null
>> +++ b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>> @@ -0,0 +1,85 @@
>> +.. SPDX-License-Identifier: GPL-2.0+
>> +.. note: can be edited and viewed with /usr/bin/formiko-vim
>> +
>> +==========================================================
>> +PCI vDPA driver for the Pensando(R) DSC adapter family
>> +==========================================================
>> +
>> +Pensando vDPA VF Device Driver
>> +Copyright(c) 2022 Pensando Systems, Inc
>> +
>> +Overview
>> +========
>> +
>> +The ``pds_vdpa`` driver is a PCI and auxiliary bus driver and supplies
>> +a vDPA device for use by the virtio network stack.  It is used with
>> +the Pensando Virtual Function devices that offer vDPA and virtio queue
>> +services.  It depends on the ``pds_core`` driver and hardware for the PF
>> +and for device configuration services.
>> +
>> +Using the device
>> +================
>> +
>> +The ``pds_vdpa`` device is enabled via multiple configuration steps and
>> +depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
>> +Function devices.
>> +
>> +Shown below are the steps to bind the driver to a VF and also to the
>> +associated auxiliary device created by the ``pds_core`` driver. This
>> +example assumes the pds_core and pds_vdpa modules are already
>> +loaded.
>> +
>> +.. code-block:: bash
>> +
>> +  #!/bin/bash
>> +
>> +  modprobe pds_core
>> +  modprobe pds_vdpa
>> +
>> +  PF_BDF=`grep "vDPA.*1" /sys/kernel/debug/pds_core/*/viftypes | head -1 | awk -F / '{print $6}'`
>> +
>> +  # Enable vDPA VF auxiliary device(s) in the PF
>> +  devlink dev param set pci/$PF_BDF name enable_vnet value true cmode runtime
>> +
>> +  # Create a VF for vDPA use
>> +  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
>> +
>> +  # Find the vDPA services/devices available
>> +  PDS_VDPA_MGMT=`vdpa mgmtdev show | grep vDPA | head -1 | cut -d: -f1`
>> +
>> +  # Create a vDPA device for use in virtio network configurations
>> +  vdpa dev add name vdpa1 mgmtdev $PDS_VDPA_MGMT mac 00:11:22:33:44:55
>> +
>> +  # Set up an ethernet interface on the vdpa device
>> +  modprobe virtio_vdpa
>> +
>> +
>> +
>> +Enabling the driver
>> +===================
>> +
>> +The driver is enabled via the standard kernel configuration system,
>> +using the make command::
>> +
>> +  make oldconfig/menuconfig/etc.
>> +
>> +The driver is located in the menu structure at:
>> +
>> +  -> Device Drivers
>> +    -> Network device support (NETDEVICES [=y])
>> +      -> Ethernet driver support
>> +        -> Pensando devices
>> +          -> Pensando Ethernet PDS_VDPA Support
>> +
>> +Support
>> +=======
>> +
>> +For general Linux networking support, please use the netdev mailing
>> +list, which is monitored by Pensando personnel::
>> +
>> +  netdev@vger.kernel.org
>> +
>> +For more specific support needs, please use the Pensando driver support
>> +email::
>> +
>> +  drivers@pensando.io
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index a4f989fa8192..a4d96e854757 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -16152,6 +16152,7 @@ L:      netdev@vger.kernel.org
>>   S:     Supported
>>   F:     Documentation/networking/device_drivers/ethernet/pensando/
>>   F:     drivers/net/ethernet/pensando/
>> +F:     drivers/vdpa/pds/
>>   F:     include/linux/pds/
>>
>>   PER-CPU MEMORY ALLOCATOR
>> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
>> index 50f45d037611..1c44df18f3da 100644
>> --- a/drivers/vdpa/Kconfig
>> +++ b/drivers/vdpa/Kconfig
>> @@ -86,4 +86,11 @@ config ALIBABA_ENI_VDPA
>>            VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon
>>            virtio 0.9.5 specification.
>>
>> +config PDS_VDPA
>> +       tristate "vDPA driver for Pensando DSC devices"
>> +       select VHOST_RING
> 
> Any reason it needs to select on vringh?
> 
> Thanks
> 

Hi Jason,

Thanks for your comments, I appreciate the time.  I'll be able to 
respond to them more fully next week when I'm back from the holidays.

sln


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-18 22:56 ` [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink Shannon Nelson
@ 2022-11-28 18:27   ` Jakub Kicinski
  2022-11-28 22:25     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-28 18:27 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, mst, jasowang, virtualization, drivers

On Fri, 18 Nov 2022 14:56:43 -0800 Shannon Nelson wrote:
> Add in the support for doing firmware updates, and for selecting
> the next firmware image to boot on, and tie them into the
> devlink flash and parameter handling.  The FW flash is the same
> as in the ionic driver.  However, this device has the ability
> to report what is in the firmware slots on the device and
> allows you to select the slot to use on the next device boot.

This is hardly vendor specific. Intel does a similar thing, IIUC.
Please work on a common interface.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-18 22:56 ` [RFC PATCH net-next 08/19] pds_core: initial VF configuration Shannon Nelson
@ 2022-11-28 18:28   ` Jakub Kicinski
  2022-11-28 22:25     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-28 18:28 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, mst, jasowang, virtualization, drivers

On Fri, 18 Nov 2022 14:56:45 -0800 Shannon Nelson wrote:
> +	.ndo_set_vf_vlan	= pdsc_set_vf_vlan,
> +	.ndo_set_vf_mac		= pdsc_set_vf_mac,
> +	.ndo_set_vf_trust	= pdsc_set_vf_trust,
> +	.ndo_set_vf_rate	= pdsc_set_vf_rate,
> +	.ndo_set_vf_spoofchk	= pdsc_set_vf_spoofchk,
> +	.ndo_set_vf_link_state	= pdsc_set_vf_link_state,
> +	.ndo_get_vf_config	= pdsc_get_vf_config,
> +	.ndo_get_vf_stats       = pdsc_get_vf_stats,

These are legacy, you're adding a fancy SmartNIC (or whatever your
marketing decided to call it) driver. Please don't use these at all.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-18 22:56 ` [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support Shannon Nelson
@ 2022-11-28 18:29   ` Jakub Kicinski
  2022-11-28 22:26     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-28 18:29 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: netdev, davem, mst, jasowang, virtualization, drivers

On Fri, 18 Nov 2022 14:56:47 -0800 Shannon Nelson wrote:
> +	DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_LM,
> +			     "enable_lm",
> +			     DEVLINK_PARAM_TYPE_BOOL,
> +			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
> +			     pdsc_dl_enable_get,
> +			     pdsc_dl_enable_set,
> +			     pdsc_dl_enable_validate),

Terrible name, not vendor specific.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-28 18:27   ` Jakub Kicinski
@ 2022-11-28 22:25     ` Shannon Nelson
  2022-11-28 23:33       ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-28 22:25 UTC (permalink / raw)
  To: Jakub Kicinski, Shannon Nelson
  Cc: netdev, davem, mst, jasowang, virtualization, drivers

On 11/28/22 10:27 AM, Jakub Kicinski wrote:
> On Fri, 18 Nov 2022 14:56:43 -0800 Shannon Nelson wrote:
>> Add in the support for doing firmware updates, and for selecting
>> the next firmware image to boot on, and tie them into the
>> devlink flash and parameter handling.  The FW flash is the same
>> as in the ionic driver.  However, this device has the ability
>> to report what is in the firmware slots on the device and
>> allows you to select the slot to use on the next device boot.
> 
> This is hardly vendor specific. Intel does a similar thing, IIUC.
> Please work on a common interface.

I don't think Intel selects which FW image to boot, but it looks like 
mlxsw and nfp use the PARAM_GENERIC_FW_LOAD_POLICY to select between 
DRIVER, FLASH, or DISK.  Shall I add a couple of generic SLOT_x items to 
the enum devlink_param_fw_load_policy_value and use this API?  For example:

	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_0,
	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_1,
	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_2,
	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_3,

I could then modify the devlink dev info printed to refer to fw_slot_0, 
fw.slot_1, and fw.slot_2 instead of our vendor specific names.

Cheers,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-28 18:28   ` Jakub Kicinski
@ 2022-11-28 22:25     ` Shannon Nelson
  2022-11-28 23:37       ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-28 22:25 UTC (permalink / raw)
  To: Jakub Kicinski, Shannon Nelson
  Cc: netdev, davem, mst, jasowang, virtualization, drivers

On 11/28/22 10:28 AM, Jakub Kicinski wrote:
> On Fri, 18 Nov 2022 14:56:45 -0800 Shannon Nelson wrote:
>> +     .ndo_set_vf_vlan        = pdsc_set_vf_vlan,
>> +     .ndo_set_vf_mac         = pdsc_set_vf_mac,
>> +     .ndo_set_vf_trust       = pdsc_set_vf_trust,
>> +     .ndo_set_vf_rate        = pdsc_set_vf_rate,
>> +     .ndo_set_vf_spoofchk    = pdsc_set_vf_spoofchk,
>> +     .ndo_set_vf_link_state  = pdsc_set_vf_link_state,
>> +     .ndo_get_vf_config      = pdsc_get_vf_config,
>> +     .ndo_get_vf_stats       = pdsc_get_vf_stats,
> 
> These are legacy, you're adding a fancy SmartNIC (or whatever your
> marketing decided to call it) driver. Please don't use these at all.

Since these are the existing APIs that I am aware of for doing this kind 
of VF configuration, it seemed to be the right choice.  I'm not aware of 
any other obvious solutions.  Do you have an alternate suggestion?

Cheers,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 18:29   ` Jakub Kicinski
@ 2022-11-28 22:26     ` Shannon Nelson
  2022-11-28 22:57       ` Andrew Lunn
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-28 22:26 UTC (permalink / raw)
  To: Jakub Kicinski, Shannon Nelson
  Cc: netdev, davem, mst, jasowang, virtualization, drivers

On 11/28/22 10:29 AM, Jakub Kicinski wrote:
> On Fri, 18 Nov 2022 14:56:47 -0800 Shannon Nelson wrote:
>> +     DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_LM,
>> +                          "enable_lm",
>> +                          DEVLINK_PARAM_TYPE_BOOL,
>> +                          BIT(DEVLINK_PARAM_CMODE_RUNTIME),
>> +                          pdsc_dl_enable_get,
>> +                          pdsc_dl_enable_set,
>> +                          pdsc_dl_enable_validate),
> 
> Terrible name, not vendor specific.

... but useful for starting a conversation.

How about we add
	DEVLINK_PARAM_GENERIC_ID_ENABLE_LM,

to live along with the existing
	DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
	DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET,

By the way, thanks for your time looking through these patches.

Cheers,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 22:26     ` Shannon Nelson
@ 2022-11-28 22:57       ` Andrew Lunn
  2022-11-28 23:07         ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2022-11-28 22:57 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Jakub Kicinski, Shannon Nelson, netdev, davem, mst, jasowang,
	virtualization, drivers

On Mon, Nov 28, 2022 at 02:26:26PM -0800, Shannon Nelson wrote:
> On 11/28/22 10:29 AM, Jakub Kicinski wrote:
> > On Fri, 18 Nov 2022 14:56:47 -0800 Shannon Nelson wrote:
> > > +     DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_LM,
> > > +                          "enable_lm",
> > > +                          DEVLINK_PARAM_TYPE_BOOL,
> > > +                          BIT(DEVLINK_PARAM_CMODE_RUNTIME),
> > > +                          pdsc_dl_enable_get,
> > > +                          pdsc_dl_enable_set,
> > > +                          pdsc_dl_enable_validate),
> > 
> > Terrible name, not vendor specific.
> 
> ... but useful for starting a conversation.
> 
> How about we add
> 	DEVLINK_PARAM_GENERIC_ID_ENABLE_LM,

I know we are running short of short acronyms and we have to recycle
them, rfc5513 and all, so could you actually use
DEVLINK_PARAM_GENERIC_ID_ENABLE_LIST_MANAGER making it clear your
Smart NIC is running majordomo and will soon replace vger.

      Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 22:57       ` Andrew Lunn
@ 2022-11-28 23:07         ` Shannon Nelson
  2022-11-28 23:29           ` Andrew Lunn
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-28 23:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Shannon Nelson, netdev, davem, mst, jasowang,
	virtualization, drivers

On 11/28/22 2:57 PM, Andrew Lunn wrote:
> On Mon, Nov 28, 2022 at 02:26:26PM -0800, Shannon Nelson wrote:
>> On 11/28/22 10:29 AM, Jakub Kicinski wrote:
>>> On Fri, 18 Nov 2022 14:56:47 -0800 Shannon Nelson wrote:
>>>> +     DEVLINK_PARAM_DRIVER(PDSC_DEVLINK_PARAM_ID_LM,
>>>> +                          "enable_lm",
>>>> +                          DEVLINK_PARAM_TYPE_BOOL,
>>>> +                          BIT(DEVLINK_PARAM_CMODE_RUNTIME),
>>>> +                          pdsc_dl_enable_get,
>>>> +                          pdsc_dl_enable_set,
>>>> +                          pdsc_dl_enable_validate),
>>>
>>> Terrible name, not vendor specific.
>>
>> ... but useful for starting a conversation.
>>
>> How about we add
>>        DEVLINK_PARAM_GENERIC_ID_ENABLE_LM,
> 
> I know we are running short of short acronyms and we have to recycle
> them, rfc5513 and all, so could you actually use
> DEVLINK_PARAM_GENERIC_ID_ENABLE_LIST_MANAGER making it clear your
> Smart NIC is running majordomo and will soon replace vger.
> 
>        Andrew

Oh, hush, someone might hear you speak of our plan to take over the 
email world!  You never know who might be listening...

On the other hand, "LM" could be expanded to "LIVE_MIGRATION", but that 
is soooo many letters to type... perhaps I'll need to set up some 
additional vim macros.

How about:
	DEVLINK_PARAM_GENERIC_ID_ENABLE_LIVE_MIGRATION

Cheers,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 23:07         ` Shannon Nelson
@ 2022-11-28 23:29           ` Andrew Lunn
  2022-11-28 23:39             ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Lunn @ 2022-11-28 23:29 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Jakub Kicinski, Shannon Nelson, netdev, davem, mst, jasowang,
	virtualization, drivers

> > I know we are running short of short acronyms and we have to recycle
> > them, rfc5513 and all, so could you actually use
> > DEVLINK_PARAM_GENERIC_ID_ENABLE_LIST_MANAGER making it clear your
> > Smart NIC is running majordomo and will soon replace vger.
> > 
> >        Andrew
> 
> Oh, hush, someone might hear you speak of our plan to take over the email
> world!

It seems like something a Smart NIC would be ideal to do. Here is an
email body and 10,000 email addresses i recently acquired, go send
spam to them at line rate.

> How about:
> 	DEVLINK_PARAM_GENERIC_ID_ENABLE_LIVE_MIGRATION

Much better.

     Andrew

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-28 22:25     ` Shannon Nelson
@ 2022-11-28 23:33       ` Jakub Kicinski
  2022-11-28 23:45         ` Shannon Nelson
  2022-11-29  0:13         ` Keller, Jacob E
  0 siblings, 2 replies; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-28 23:33 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization,
	drivers, Jacob Keller

On Mon, 28 Nov 2022 14:25:46 -0800 Shannon Nelson wrote:
> I don't think Intel selects which FW image to boot, but it looks like 
> mlxsw and nfp use the PARAM_GENERIC_FW_LOAD_POLICY to select between 
> DRIVER, FLASH, or DISK.  Shall I add a couple of generic SLOT_x items to 
> the enum devlink_param_fw_load_policy_value and use this API?  For example:
> 
> 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_0,
> 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_1,
> 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_2,
> 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_3,

Not the worst idea, although I presume normal FW flashing should switch
between slots to activate the new image by default? Which means the
action of fw flashing would alter the policy set by the user. A little
awkward from an API purist standpoint.

I'd just expose the active "bank" via netlink directly.

> I could then modify the devlink dev info printed to refer to fw_slot_0, 
> fw.slot_1, and fw.slot_2 instead of our vendor specific names.

Jake, didn't you have a similar capability in ice?

Knowing my memory I may have acquiesced to something in another driver
already. That said - I think it's cleaner if we just list the stored
versions per bank, no?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-28 22:25     ` Shannon Nelson
@ 2022-11-28 23:37       ` Jakub Kicinski
  2022-11-29  0:37         ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-28 23:37 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On Mon, 28 Nov 2022 14:25:56 -0800 Shannon Nelson wrote:
> On 11/28/22 10:28 AM, Jakub Kicinski wrote:
> > On Fri, 18 Nov 2022 14:56:45 -0800 Shannon Nelson wrote:  
> >> +     .ndo_set_vf_vlan        = pdsc_set_vf_vlan,
> >> +     .ndo_set_vf_mac         = pdsc_set_vf_mac,
> >> +     .ndo_set_vf_trust       = pdsc_set_vf_trust,
> >> +     .ndo_set_vf_rate        = pdsc_set_vf_rate,
> >> +     .ndo_set_vf_spoofchk    = pdsc_set_vf_spoofchk,
> >> +     .ndo_set_vf_link_state  = pdsc_set_vf_link_state,
> >> +     .ndo_get_vf_config      = pdsc_get_vf_config,
> >> +     .ndo_get_vf_stats       = pdsc_get_vf_stats,  
> > 
> > These are legacy, you're adding a fancy SmartNIC (or whatever your
> > marketing decided to call it) driver. Please don't use these at all.  
> 
> Since these are the existing APIs that I am aware of for doing this kind 
> of VF configuration, it seemed to be the right choice.  I'm not aware of 
> any other obvious solutions.  Do you have an alternate suggestion?

If this is a "SmartNIC" there should be alternative solution based on
representors for each of those callbacks, and the device should support
forwarding using proper netdev constructs like bridge, routing, or tc.

This has been our high level guidance for a few years now. It's quite
hard to move the ball forward since all major vendors have a single
driver for multiple generations of HW :(

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 23:29           ` Andrew Lunn
@ 2022-11-28 23:39             ` Jakub Kicinski
  2022-11-29  9:00               ` Leon Romanovsky
  2022-11-29  9:13               ` Jiri Pirko
  0 siblings, 2 replies; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-28 23:39 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Shannon Nelson, Shannon Nelson, netdev, davem, mst, jasowang,
	virtualization, drivers

On Tue, 29 Nov 2022 00:29:42 +0100 Andrew Lunn wrote:
> > How about:
> > 	DEVLINK_PARAM_GENERIC_ID_ENABLE_LIVE_MIGRATION  
> 
> Much better.

+1, although I care much less about the define name which is stupidly
long anyway and more about the actual value that the user will see

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-28 23:33       ` Jakub Kicinski
@ 2022-11-28 23:45         ` Shannon Nelson
  2022-11-29  0:18           ` Keller, Jacob E
  2022-11-29  0:13         ` Keller, Jacob E
  1 sibling, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-28 23:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization,
	drivers, Jacob Keller

On 11/28/22 3:33 PM, Jakub Kicinski wrote:
> On Mon, 28 Nov 2022 14:25:46 -0800 Shannon Nelson wrote:
>> I don't think Intel selects which FW image to boot, but it looks like
>> mlxsw and nfp use the PARAM_GENERIC_FW_LOAD_POLICY to select between
>> DRIVER, FLASH, or DISK.  Shall I add a couple of generic SLOT_x items to
>> the enum devlink_param_fw_load_policy_value and use this API?  For example:
>>
>>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_0,
>>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_1,
>>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_2,
>>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_3,
> 
> Not the worst idea, although I presume normal FW flashing should switch
> between slots to activate the new image by default? Which means the
> action of fw flashing would alter the policy set by the user. A little
> awkward from an API purist standpoint.

Yes, the action of flashing will set the new bank/slot to use on the 
next boot.  However, we have the ability to select from multiple valid 
images and we want to pass this flexibility to the user rather than 
force them to go through a whole flash sequence just to get to the other 
bank.

> 
> I'd just expose the active "bank" via netlink directly.
> 
>> I could then modify the devlink dev info printed to refer to fw_slot_0,
>> fw.slot_1, and fw.slot_2 instead of our vendor specific names.
> 
> Jake, didn't you have a similar capability in ice?
> 
> Knowing my memory I may have acquiesced to something in another driver
> already. That said - I think it's cleaner if we just list the stored
> versions per bank, no?

We are listing the stored images in the devlink dev info output, just 
want to let the user choose which of those valid images to use next.

Cheers,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-28 23:33       ` Jakub Kicinski
  2022-11-28 23:45         ` Shannon Nelson
@ 2022-11-29  0:13         ` Keller, Jacob E
  1 sibling, 0 replies; 61+ messages in thread
From: Keller, Jacob E @ 2022-11-29  0:13 UTC (permalink / raw)
  To: Jakub Kicinski, Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, November 28, 2022 3:33 PM
> To: Shannon Nelson <shnelson@amd.com>
> Cc: Shannon Nelson <snelson@pensando.io>; netdev@vger.kernel.org;
> davem@davemloft.net; mst@redhat.com; jasowang@redhat.com;
> virtualization@lists.linux-foundation.org; drivers@pensando.io; Keller, Jacob E
> <jacob.e.keller@intel.com>
> Subject: Re: [RFC PATCH net-next 06/19] pds_core: add FW update feature to
> devlink
> 
> On Mon, 28 Nov 2022 14:25:46 -0800 Shannon Nelson wrote:
> > I don't think Intel selects which FW image to boot, but it looks like
> > mlxsw and nfp use the PARAM_GENERIC_FW_LOAD_POLICY to select between
> > DRIVER, FLASH, or DISK.  Shall I add a couple of generic SLOT_x items to
> > the enum devlink_param_fw_load_policy_value and use this API?  For example:
> >
> > 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_0,
> > 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_1,
> > 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_2,
> > 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_3,
> 
> Not the worst idea, although I presume normal FW flashing should switch
> between slots to activate the new image by default? Which means the
> action of fw flashing would alter the policy set by the user. A little
> awkward from an API purist standpoint.
> 
> I'd just expose the active "bank" via netlink directly.
> 
> > I could then modify the devlink dev info printed to refer to fw_slot_0,
> > fw.slot_1, and fw.slot_2 instead of our vendor specific names.
> 
> Jake, didn't you have a similar capability in ice?
> 

We have two banks of flash, the active bank, and an inactive bank used for updates. We can determine the active bank from the Shadow RAM contents which are generated as the EMP firmware boots up.

> Knowing my memory I may have acquiesced to something in another driver
> already. That said - I think it's cleaner if we just list the stored
> versions per bank, no?

I think it would make sense to store them per bank and make the bank number some index instead of something separate as like this DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_<X> where each <X> makes a separate parameter.

Currently devlink info reports "stored" and "active", which aligns with our current use of the active vs inactive flash bank. We could be explicit and indicate which bank it is, though its a bit tricky since most of the firmware interface deals with it in terms of "active" and "inactive" rather than the absolute position of "bank 0 or bank 1".

Especially if another device has more than 2 banks I think its a good extension to devlink info, and we could probably get away with something like a new info attribute that specifies which bank index it is, and an attribute which indicates whether its active or not.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink
  2022-11-28 23:45         ` Shannon Nelson
@ 2022-11-29  0:18           ` Keller, Jacob E
  0 siblings, 0 replies; 61+ messages in thread
From: Keller, Jacob E @ 2022-11-29  0:18 UTC (permalink / raw)
  To: Shannon Nelson, Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers



> -----Original Message-----
> From: Shannon Nelson <shnelson@amd.com>
> Sent: Monday, November 28, 2022 3:46 PM
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: Shannon Nelson <snelson@pensando.io>; netdev@vger.kernel.org;
> davem@davemloft.net; mst@redhat.com; jasowang@redhat.com;
> virtualization@lists.linux-foundation.org; drivers@pensando.io; Keller, Jacob E
> <jacob.e.keller@intel.com>
> Subject: Re: [RFC PATCH net-next 06/19] pds_core: add FW update feature to
> devlink
> 
> On 11/28/22 3:33 PM, Jakub Kicinski wrote:
> > On Mon, 28 Nov 2022 14:25:46 -0800 Shannon Nelson wrote:
> >> I don't think Intel selects which FW image to boot, but it looks like
> >> mlxsw and nfp use the PARAM_GENERIC_FW_LOAD_POLICY to select between
> >> DRIVER, FLASH, or DISK.  Shall I add a couple of generic SLOT_x items to
> >> the enum devlink_param_fw_load_policy_value and use this API?  For
> example:
> >>
> >>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_0,
> >>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_1,
> >>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_2,
> >>        DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_SLOT_3,
> >
> > Not the worst idea, although I presume normal FW flashing should switch
> > between slots to activate the new image by default? Which means the
> > action of fw flashing would alter the policy set by the user. A little
> > awkward from an API purist standpoint.

This could potentially be handled by having DELVINK_PARAM_FW_LOAD_POLICY_FLASH be the automatic "select best version", and if a user has set a manual value then don't allow flashing until a reboot or the value is set back to POLICY_FLASH?

> 
> Yes, the action of flashing will set the new bank/slot to use on the
> next boot.  However, we have the ability to select from multiple valid
> images and we want to pass this flexibility to the user rather than
> force them to go through a whole flash sequence just to get to the other
> bank.
> 
> >
> > I'd just expose the active "bank" via netlink directly.
> >
> >> I could then modify the devlink dev info printed to refer to fw_slot_0,
> >> fw.slot_1, and fw.slot_2 instead of our vendor specific names.
> >
> > Jake, didn't you have a similar capability in ice?
> >
> > Knowing my memory I may have acquiesced to something in another driver
> > already. That said - I think it's cleaner if we just list the stored
> > versions per bank, no?
> 
> We are listing the stored images in the devlink dev info output, just
> want to let the user choose which of those valid images to use next.
> 
> Cheers,
> sln

Technically I think we could do something similar in ice to switch between the banks, at least as long as there is a valid image in the bank. The big trick is that I am not sure we can verify ahead of time whether we have a valid image and if you happen to boot into an invalid or blank image. There is some recovery firmware that should activate in that case, but I think our current driver doesn't implement enough of a recovery mode to actually handle this case to allow user to switch back.

Still, I think the ability to select the bank is valuable, and finding the right way to expose it is good.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-28 23:37       ` Jakub Kicinski
@ 2022-11-29  0:37         ` Shannon Nelson
  2022-11-29  0:55           ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29  0:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On 11/28/22 3:37 PM, Jakub Kicinski wrote:
> On Mon, 28 Nov 2022 14:25:56 -0800 Shannon Nelson wrote:
>> On 11/28/22 10:28 AM, Jakub Kicinski wrote:
>>> On Fri, 18 Nov 2022 14:56:45 -0800 Shannon Nelson wrote:
>>>> +     .ndo_set_vf_vlan        = pdsc_set_vf_vlan,
>>>> +     .ndo_set_vf_mac         = pdsc_set_vf_mac,
>>>> +     .ndo_set_vf_trust       = pdsc_set_vf_trust,
>>>> +     .ndo_set_vf_rate        = pdsc_set_vf_rate,
>>>> +     .ndo_set_vf_spoofchk    = pdsc_set_vf_spoofchk,
>>>> +     .ndo_set_vf_link_state  = pdsc_set_vf_link_state,
>>>> +     .ndo_get_vf_config      = pdsc_get_vf_config,
>>>> +     .ndo_get_vf_stats       = pdsc_get_vf_stats,
>>>
>>> These are legacy, you're adding a fancy SmartNIC (or whatever your
>>> marketing decided to call it) driver. Please don't use these at all.
>>
>> Since these are the existing APIs that I am aware of for doing this kind
>> of VF configuration, it seemed to be the right choice.  I'm not aware of
>> any other obvious solutions.  Do you have an alternate suggestion?
> 
> If this is a "SmartNIC" there should be alternative solution based on
> representors for each of those callbacks, and the device should support
> forwarding using proper netdev constructs like bridge, routing, or tc.
> 
> This has been our high level guidance for a few years now. It's quite
> hard to move the ball forward since all major vendors have a single
> driver for multiple generations of HW :(

Absolutely, if the device presented to the host is a SmartNIC and has 
these bridge and router capabilities, by all means it should use the 
newer APIs, but that's not the case here.

In this case we are making devices available to baremetal platforms in a 
cloud vendor setting where the majority of the network configuration is 
controlled outside of the host machine's purview.  There is no bridging, 
routing, or filtering control available to the baremetal client other 
than the basic VF configurations.

The device model presented to the host is a simple PF with VFs, not a 
SmartNIC, thus the pds_core driver sets up a simple PF netdev 
"representor" for using the existing VF control API: easy to use, 
everyone knows how to use it, keeps code simple.

I suppose we could have the PF create a representor netdev for each 
individual VF to set mac address and read stats, but that seems 
redundant, and as far as I know that still would be missing the other VF 
controls.  Do we have alternate ways for the user to set things like 
trust and spoofchk?

sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-29  0:37         ` Shannon Nelson
@ 2022-11-29  0:55           ` Jakub Kicinski
  2022-11-29  1:08             ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-29  0:55 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On Mon, 28 Nov 2022 16:37:45 -0800 Shannon Nelson wrote:
> > If this is a "SmartNIC" there should be alternative solution based on
> > representors for each of those callbacks, and the device should support
> > forwarding using proper netdev constructs like bridge, routing, or tc.
> > 
> > This has been our high level guidance for a few years now. It's quite
> > hard to move the ball forward since all major vendors have a single
> > driver for multiple generations of HW :(  
> 
> Absolutely, if the device presented to the host is a SmartNIC and has 
> these bridge and router capabilities, by all means it should use the 
> newer APIs, but that's not the case here.
> 
> In this case we are making devices available to baremetal platforms in a 
> cloud vendor setting where the majority of the network configuration is 
> controlled outside of the host machine's purview.  There is no bridging, 
> routing, or filtering control available to the baremetal client other 
> than the basic VF configurations.

Don't even start with the "our device is simple and only needs 
the legacy API" line of arguing :/

> The device model presented to the host is a simple PF with VFs, not a 
> SmartNIC, thus the pds_core driver sets up a simple PF netdev 
> "representor" for using the existing VF control API: easy to use, 
> everyone knows how to use it, keeps code simple.
> 
> I suppose we could have the PF create a representor netdev for each 
> individual VF to set mac address and read stats, but that seems 

Oh, so the "representor" you mention in the cover letter is for the PF?

> redundant, and as far as I know that still would be missing the other VF 
> controls.  Do we have alternate ways for the user to set things like 
> trust and spoofchk?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-29  0:55           ` Jakub Kicinski
@ 2022-11-29  1:08             ` Shannon Nelson
  2022-11-29  1:54               ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29  1:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On 11/28/22 4:55 PM, Jakub Kicinski wrote:
> On Mon, 28 Nov 2022 16:37:45 -0800 Shannon Nelson wrote:
>>> If this is a "SmartNIC" there should be alternative solution based on
>>> representors for each of those callbacks, and the device should support
>>> forwarding using proper netdev constructs like bridge, routing, or tc.
>>>
>>> This has been our high level guidance for a few years now. It's quite
>>> hard to move the ball forward since all major vendors have a single
>>> driver for multiple generations of HW :(
>>
>> Absolutely, if the device presented to the host is a SmartNIC and has
>> these bridge and router capabilities, by all means it should use the
>> newer APIs, but that's not the case here.
>>
>> In this case we are making devices available to baremetal platforms in a
>> cloud vendor setting where the majority of the network configuration is
>> controlled outside of the host machine's purview.  There is no bridging,
>> routing, or filtering control available to the baremetal client other
>> than the basic VF configurations.
> 
> Don't even start with the "our device is simple and only needs
> the legacy API" line of arguing :/

I'm not sure what else to say here - yes, we have a fancy and complex 
piece of hardware plugged into the PCI slot, but the device that shows 
up on the PCI bus is a very constrained model that doesn't know anything 
about switchdev kinds of things.

> 
>> The device model presented to the host is a simple PF with VFs, not a
>> SmartNIC, thus the pds_core driver sets up a simple PF netdev
>> "representor" for using the existing VF control API: easy to use,
>> everyone knows how to use it, keeps code simple.
>>
>> I suppose we could have the PF create a representor netdev for each
>> individual VF to set mac address and read stats, but that seems
> 
> Oh, so the "representor" you mention in the cover letter is for the PF?

Yes, a PF representor simply so we can get access to the .ndo_set_vf_xxx 
interfaces.  There is no network traffic running through the PF.

sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-29  1:08             ` Shannon Nelson
@ 2022-11-29  1:54               ` Jakub Kicinski
  2022-11-29 17:57                 ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-29  1:54 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On Mon, 28 Nov 2022 17:08:28 -0800 Shannon Nelson wrote:
> > Don't even start with the "our device is simple and only needs
> > the legacy API" line of arguing :/  
> 
> I'm not sure what else to say here - yes, we have a fancy and complex 
> piece of hardware plugged into the PCI slot, but the device that shows 
> up on the PCI bus is a very constrained model that doesn't know anything 
> about switchdev kinds of things.

Today it is, but I presume it's all FW underneath. So a year from now
you'll be back asking for extensions because FW devs added features.

> >> The device model presented to the host is a simple PF with VFs, not a
> >> SmartNIC, thus the pds_core driver sets up a simple PF netdev
> >> "representor" for using the existing VF control API: easy to use,
> >> everyone knows how to use it, keeps code simple.
> >>
> >> I suppose we could have the PF create a representor netdev for each
> >> individual VF to set mac address and read stats, but that seems  
> > 
> > Oh, so the "representor" you mention in the cover letter is for the PF?  
> 
> Yes, a PF representor simply so we can get access to the .ndo_set_vf_xxx 
> interfaces.  There is no network traffic running through the PF.

In that case not only have you come up with your own name for 
a SmartNIC, you also managed to misuse one of our existing terms 
in your own way! It can't pass any traffic it's just a dummy to hook
the legacy vf ndos to. It's the opposite of what a repr is.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 23:39             ` Jakub Kicinski
@ 2022-11-29  9:00               ` Leon Romanovsky
  2022-11-29  9:13               ` Jiri Pirko
  1 sibling, 0 replies; 61+ messages in thread
From: Leon Romanovsky @ 2022-11-29  9:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Shannon Nelson, Shannon Nelson, netdev, davem, mst,
	jasowang, virtualization, drivers

On Mon, Nov 28, 2022 at 03:39:22PM -0800, Jakub Kicinski wrote:
> On Tue, 29 Nov 2022 00:29:42 +0100 Andrew Lunn wrote:
> > > How about:
> > > 	DEVLINK_PARAM_GENERIC_ID_ENABLE_LIVE_MIGRATION  
> > 
> > Much better.
> 
> +1, although I care much less about the define name which is stupidly
> long anyway and more about the actual value that the user will see

We have enable/disable devlink live migration knob in our queue. Saeed
thought to send it next week.

Thanks

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-28 23:39             ` Jakub Kicinski
  2022-11-29  9:00               ` Leon Romanovsky
@ 2022-11-29  9:13               ` Jiri Pirko
  2022-11-29 17:16                 ` Shannon Nelson
  1 sibling, 1 reply; 61+ messages in thread
From: Jiri Pirko @ 2022-11-29  9:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, Shannon Nelson, Shannon Nelson, netdev, davem, mst,
	jasowang, virtualization, drivers

Tue, Nov 29, 2022 at 12:39:22AM CET, kuba@kernel.org wrote:
>On Tue, 29 Nov 2022 00:29:42 +0100 Andrew Lunn wrote:
>> > How about:
>> > 	DEVLINK_PARAM_GENERIC_ID_ENABLE_LIVE_MIGRATION  
>> 
>> Much better.
>
>+1, although I care much less about the define name which is stupidly
>long anyway and more about the actual value that the user will see

We have patches that introduce live migration as a generic port function
capability bit. It is an attribute of the function.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support
  2022-11-29  9:13               ` Jiri Pirko
@ 2022-11-29 17:16                 ` Shannon Nelson
  0 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29 17:16 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: Andrew Lunn, Shannon Nelson, netdev, davem, mst, jasowang,
	virtualization, drivers

On 11/29/22 1:13 AM, Jiri Pirko wrote:
> Tue, Nov 29, 2022 at 12:39:22AM CET, kuba@kernel.org wrote:
>> On Tue, 29 Nov 2022 00:29:42 +0100 Andrew Lunn wrote:
>>>> How about:
>>>>     DEVLINK_PARAM_GENERIC_ID_ENABLE_LIVE_MIGRATION
>>>
>>> Much better.
>>
>> +1, although I care much less about the define name which is stupidly
>> long anyway and more about the actual value that the user will see
> 
> We have patches that introduce live migration as a generic port function
> capability bit. It is an attribute of the function.
> 

Thanks Leon and Jiri, we'll keep an eye out for it.

sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-29  1:54               ` Jakub Kicinski
@ 2022-11-29 17:57                 ` Shannon Nelson
  2022-11-30  2:02                   ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29 17:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On 11/28/22 5:54 PM, Jakub Kicinski wrote:
> On Mon, 28 Nov 2022 17:08:28 -0800 Shannon Nelson wrote:
>>> Don't even start with the "our device is simple and only needs
>>> the legacy API" line of arguing :/
>>
>> I'm not sure what else to say here - yes, we have a fancy and complex
>> piece of hardware plugged into the PCI slot, but the device that shows
>> up on the PCI bus is a very constrained model that doesn't know anything
>> about switchdev kinds of things.
> 
> Today it is, but I presume it's all FW underneath. So a year from now
> you'll be back asking for extensions because FW devs added features.

Sure, and that will be the time to add the APIs and code for handling 
the more complex switching and filtering needs.  We leave it out for now 
so as to not have unneeded code waiting for future features that might 
never actually appear, as driver writers are often reminded.

> 
>>>> The device model presented to the host is a simple PF with VFs, not a
>>>> SmartNIC, thus the pds_core driver sets up a simple PF netdev
>>>> "representor" for using the existing VF control API: easy to use,
>>>> everyone knows how to use it, keeps code simple.
>>>>
>>>> I suppose we could have the PF create a representor netdev for each
>>>> individual VF to set mac address and read stats, but that seems
>>>
>>> Oh, so the "representor" you mention in the cover letter is for the PF?
>>
>> Yes, a PF representor simply so we can get access to the .ndo_set_vf_xxx
>> interfaces.  There is no network traffic running through the PF.
> 
> In that case not only have you come up with your own name for
> a SmartNIC, you also managed to misuse one of our existing terms
> in your own way! It can't pass any traffic it's just a dummy to hook
> the legacy vf ndos to. It's the opposite of what a repr is.

Sorry, this seemed to me an reasonable use of the term.  Is there an 
alternative wording we should use for this case?

Are there other existing methods we can use for getting the VF 
configurations from the user, or does this make sense to keep in our 
current simple model?

Thanks,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services
  2022-11-22  3:53   ` Jason Wang
@ 2022-11-29 22:24     ` Shannon Nelson
  0 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29 22:24 UTC (permalink / raw)
  To: Jason Wang, Shannon Nelson, netdev, davem, kuba, mst, virtualization
  Cc: drivers

On 11/21/22 7:53 PM, Jason Wang wrote:
> 在 2022/11/19 06:56, Shannon Nelson 写道:
>> This is the initial PCI driver framework for the new pds_vdpa VF
>> device driver, an auxiliary_bus client of the pds_core driver.
>> This does the very basics of registering for the new PCI
>> device 1dd8:100b, setting up debugfs entries, and registering
>> with devlink.
>>
>> The new PCI device id has not made it to the official PCI ID Repository
>> yet, but will soon be registered there.
>>
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
>> ---
>>   drivers/vdpa/pds/Makefile       |   7 +
>>   drivers/vdpa/pds/debugfs.c      |  44 +++++++
>>   drivers/vdpa/pds/debugfs.h      |  22 ++++
>>   drivers/vdpa/pds/pci_drv.c      | 143 +++++++++++++++++++++
>>   drivers/vdpa/pds/pci_drv.h      |  46 +++++++
>>   include/linux/pds/pds_core_if.h |   1 +
>>   include/linux/pds/pds_vdpa.h    | 219 ++++++++++++++++++++++++++++++++
>>   7 files changed, 482 insertions(+)
>>   create mode 100644 drivers/vdpa/pds/Makefile
>>   create mode 100644 drivers/vdpa/pds/debugfs.c
>>   create mode 100644 drivers/vdpa/pds/debugfs.h
>>   create mode 100644 drivers/vdpa/pds/pci_drv.c
>>   create mode 100644 drivers/vdpa/pds/pci_drv.h
>>   create mode 100644 include/linux/pds/pds_vdpa.h
>>
>> diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
>> new file mode 100644
>> index 000000000000..3ba28a875574
>> --- /dev/null
>> +++ b/drivers/vdpa/pds/Makefile
>> @@ -0,0 +1,7 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +# Copyright(c) 2022 Pensando Systems, Inc
>> +
>> +obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
>> +
>> +pds_vdpa-y := pci_drv.o      \
>> +           debugfs.o
>> diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
>> new file mode 100644
>> index 000000000000..f5b6654ae89b
>> --- /dev/null
>> +++ b/drivers/vdpa/pds/debugfs.c
>> @@ -0,0 +1,44 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright(c) 2022 Pensando Systems, Inc */
>> +
>> +#include <linux/module.h>
>> +#include <linux/pci.h>
>> +#include <linux/types.h>
>> +
>> +#include <linux/pds/pds_core_if.h>
>> +#include <linux/pds/pds_vdpa.h>
>> +
>> +#include "pci_drv.h"
>> +#include "debugfs.h"
>> +
>> +#ifdef CONFIG_DEBUG_FS
>> +
>> +static struct dentry *dbfs_dir;
>> +
>> +void
>> +pds_vdpa_debugfs_create(void)
>> +{
>> +     dbfs_dir = debugfs_create_dir(PDS_VDPA_DRV_NAME, NULL);
>> +}
>> +
>> +void
>> +pds_vdpa_debugfs_destroy(void)
>> +{
>> +     debugfs_remove_recursive(dbfs_dir);
>> +     dbfs_dir = NULL;
>> +}
>> +
>> +void
>> +pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
>> +{
>> +     vdpa_pdev->dentry = 
>> debugfs_create_dir(pci_name(vdpa_pdev->pdev), dbfs_dir);
>> +}
>> +
>> +void
>> +pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev)
>> +{
>> +     debugfs_remove_recursive(vdpa_pdev->dentry);
>> +     vdpa_pdev->dentry = NULL;
>> +}
>> +
>> +#endif /* CONFIG_DEBUG_FS */
>> diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
>> new file mode 100644
>> index 000000000000..ac31ab47746b
>> --- /dev/null
>> +++ b/drivers/vdpa/pds/debugfs.h
>> @@ -0,0 +1,22 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2022 Pensando Systems, Inc */
>> +
>> +#ifndef _PDS_VDPA_DEBUGFS_H_
>> +#define _PDS_VDPA_DEBUGFS_H_
>> +
>> +#include <linux/debugfs.h>
>> +
>> +#ifdef CONFIG_DEBUG_FS
>> +
>> +void pds_vdpa_debugfs_create(void);
>> +void pds_vdpa_debugfs_destroy(void);
>> +void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
>> +void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
>> +#else
>> +static inline void pds_vdpa_debugfs_create(void) { }
>> +static inline void pds_vdpa_debugfs_destroy(void) { }
>> +static inline void pds_vdpa_debugfs_add_pcidev(struct 
>> pds_vdpa_pci_device *vdpa_pdev) { }
>> +static inline void pds_vdpa_debugfs_del_pcidev(struct 
>> pds_vdpa_pci_device *vdpa_pdev) { }
>> +#endif
>> +
>> +#endif /* _PDS_VDPA_DEBUGFS_H_ */
>> diff --git a/drivers/vdpa/pds/pci_drv.c b/drivers/vdpa/pds/pci_drv.c
>> new file mode 100644
>> index 000000000000..369e11153f21
>> --- /dev/null
>> +++ b/drivers/vdpa/pds/pci_drv.c
>> @@ -0,0 +1,143 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright(c) 2022 Pensando Systems, Inc */
>> +
>> +#include <linux/module.h>
>> +#include <linux/pci.h>
>> +#include <linux/aer.h>
>> +#include <linux/types.h>
>> +#include <linux/vdpa.h>
>> +
>> +#include <linux/pds/pds_core_if.h>
>> +#include <linux/pds/pds_vdpa.h>
>> +
>> +#include "pci_drv.h"
>> +#include "debugfs.h"
>> +
>> +static void
>> +pds_vdpa_dma_action(void *data)
>> +{
>> +     pci_free_irq_vectors((struct pci_dev *)data);
>> +}
> 
> 
> Nit: since we're release irq vectors, it might be better to use
> "pds_vdpa_irq_action"

Sure.

> 
> 
>> +
>> +static int
>> +pds_vdpa_pci_probe(struct pci_dev *pdev,
>> +                const struct pci_device_id *id)
>> +{
>> +     struct pds_vdpa_pci_device *vdpa_pdev;
>> +     struct device *dev = &pdev->dev;
>> +     int err;
>> +
>> +     vdpa_pdev = kzalloc(sizeof(*vdpa_pdev), GFP_KERNEL);
>> +     if (!vdpa_pdev)
>> +             return -ENOMEM;
>> +     pci_set_drvdata(pdev, vdpa_pdev);
>> +
>> +     vdpa_pdev->pdev = pdev;
>> +     vdpa_pdev->vf_id = pci_iov_vf_id(pdev);
>> +     vdpa_pdev->pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn);
>> +
>> +     /* Query system for DMA addressing limitation for the device. */
>> +     err = dma_set_mask_and_coherent(dev, 
>> DMA_BIT_MASK(PDS_CORE_ADDR_LEN));
>> +     if (err) {
>> +             dev_err(dev, "Unable to obtain 64-bit DMA for consistent 
>> allocations, aborting. %pe\n",
>> +                     ERR_PTR(err));
>> +             goto err_out_free_mem;
>> +     }
>> +
>> +     pci_enable_pcie_error_reporting(pdev);
>> +
>> +     /* Use devres management */
>> +     err = pcim_enable_device(pdev);
>> +     if (err) {
>> +             dev_err(dev, "Cannot enable PCI device: %pe\n", 
>> ERR_PTR(err));
>> +             goto err_out_free_mem;
>> +     }
>> +
>> +     err = devm_add_action_or_reset(dev, pds_vdpa_dma_action, pdev);
>> +     if (err) {
>> +             dev_err(dev, "Failed adding devres for freeing irq 
>> vectors: %pe\n",
>> +                     ERR_PTR(err));
>> +             goto err_out_pci_release_device;
>> +     }
>> +
>> +     pci_set_master(pdev);
>> +
>> +     pds_vdpa_debugfs_add_pcidev(vdpa_pdev);
>> +
>> +     dev_info(dev, "%s: PF %#04x VF %#04x (%d) vf_id %d domain %d 
>> vdpa_aux %p vdpa_pdev %p\n",
>> +              __func__, pci_dev_id(vdpa_pdev->pdev->physfn),
>> +              vdpa_pdev->pci_id, vdpa_pdev->pci_id, vdpa_pdev->vf_id,
>> +              pci_domain_nr(pdev->bus), vdpa_pdev->vdpa_aux, vdpa_pdev);
>> +
>> +     return 0;
>> +
>> +err_out_pci_release_device:
>> +     pci_disable_device(pdev);
> 
> 
> Do we still need to care about this consider we use
> devres/pcim_enable_device()?

It isn't absolutely necessary...  I like how the pcim/devm stuff will 
clean up lost items at removal time, but I like to try to keep lost 
items from happening in the first place.

> 
> 
>> +err_out_free_mem:
>> +     pci_disable_pcie_error_reporting(pdev);
>> +     kfree(vdpa_pdev);
>> +     return err;
>> +}
>> +
>> +static void
>> +pds_vdpa_pci_remove(struct pci_dev *pdev)
>> +{
>> +     struct pds_vdpa_pci_device *vdpa_pdev = pci_get_drvdata(pdev);
>> +
>> +     pds_vdpa_debugfs_del_pcidev(vdpa_pdev);
>> +     pci_clear_master(pdev);
>> +     pci_disable_pcie_error_reporting(pdev);
>> +     pci_disable_device(pdev);
>> +     kfree(vdpa_pdev);
>> +
>> +     dev_info(&pdev->dev, "Removed\n");
>> +}
>> +
>> +static const struct pci_device_id
>> +pds_vdpa_pci_table[] = {
>> +     { PCI_VDEVICE(PENSANDO, PCI_DEVICE_ID_PENSANDO_VDPA_VF) },
>> +     { 0, }
>> +};
>> +MODULE_DEVICE_TABLE(pci, pds_vdpa_pci_table);
>> +
>> +static struct pci_driver
>> +pds_vdpa_pci_driver = {
>> +     .name = PDS_VDPA_DRV_NAME,
>> +     .id_table = pds_vdpa_pci_table,
>> +     .probe = pds_vdpa_pci_probe,
>> +     .remove = pds_vdpa_pci_remove
>> +};
>> +
>> +static void __exit
>> +pds_vdpa_pci_cleanup(void)
>> +{
>> +     pci_unregister_driver(&pds_vdpa_pci_driver);
>> +
>> +     pds_vdpa_debugfs_destroy();
>> +}
>> +module_exit(pds_vdpa_pci_cleanup);
>> +
>> +static int __init
>> +pds_vdpa_pci_init(void)
>> +{
>> +     int err;
>> +
>> +     pds_vdpa_debugfs_create();
>> +
>> +     err = pci_register_driver(&pds_vdpa_pci_driver);
>> +     if (err) {
>> +             pr_err("%s: pci driver register failed: %pe\n", 
>> __func__, ERR_PTR(err));
>> +             goto err_pci;
>> +     }
>> +
>> +     return 0;
>> +
>> +err_pci:
>> +     pds_vdpa_debugfs_destroy();
>> +     return err;
>> +}
>> +module_init(pds_vdpa_pci_init);
>> +
>> +MODULE_DESCRIPTION(PDS_VDPA_DRV_DESCRIPTION);
>> +MODULE_AUTHOR("Pensando Systems, Inc");
>> +MODULE_LICENSE("GPL");
>> diff --git a/drivers/vdpa/pds/pci_drv.h b/drivers/vdpa/pds/pci_drv.h
>> new file mode 100644
>> index 000000000000..747809e0df9e
>> --- /dev/null
>> +++ b/drivers/vdpa/pds/pci_drv.h
>> @@ -0,0 +1,46 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright(c) 2022 Pensando Systems, Inc */
>> +
>> +#ifndef _PCI_DRV_H
>> +#define _PCI_DRV_H
>> +
>> +#include <linux/pci.h>
>> +#include <linux/virtio_pci_modern.h>
>> +
>> +#define PDS_VDPA_DRV_NAME           "pds_vdpa"
>> +#define PDS_VDPA_DRV_DESCRIPTION    "Pensando vDPA VF Device Driver"
>> +
>> +#define PDS_VDPA_BAR_BASE    0
>> +#define PDS_VDPA_BAR_INTR    2
>> +#define PDS_VDPA_BAR_DBELL   4
>> +
>> +struct pds_dev_bar {
>> +     int           index;
>> +     void __iomem  *vaddr;
>> +     phys_addr_t   pa;
>> +     unsigned long len;
>> +};
>> +
>> +struct pds_vdpa_intr_info {
>> +     int index;
>> +     int irq;
>> +     int qid;
>> +     char name[32];
>> +};
>> +
>> +struct pds_vdpa_pci_device {
>> +     struct pci_dev *pdev;
>> +     struct pds_vdpa_aux *vdpa_aux;
>> +
>> +     int vf_id;
>> +     int pci_id;
>> +
>> +     int nintrs;
>> +     struct pds_vdpa_intr_info *intrs;
>> +
>> +     struct dentry *dentry;
>> +
>> +     struct virtio_pci_modern_device vd_mdev;
>> +};
>> +
>> +#endif /* _PCI_DRV_H */
>> diff --git a/include/linux/pds/pds_core_if.h 
>> b/include/linux/pds/pds_core_if.h
>> index 6333ec351e14..6e92697657e4 100644
>> --- a/include/linux/pds/pds_core_if.h
>> +++ b/include/linux/pds/pds_core_if.h
>> @@ -8,6 +8,7 @@
>>
>>   #define PCI_VENDOR_ID_PENSANDO                      0x1dd8
>>   #define PCI_DEVICE_ID_PENSANDO_CORE_PF              0x100c
>> +#define PCI_DEVICE_ID_PENSANDO_VDPA_VF          0x100b
>>
>>   #define PDS_CORE_BARS_MAX                   4
>>   #define PDS_CORE_PCI_BAR_DBELL                      1
>> diff --git a/include/linux/pds/pds_vdpa.h b/include/linux/pds/pds_vdpa.h
>> new file mode 100644
>> index 000000000000..7ecef890f175
>> --- /dev/null
>> +++ b/include/linux/pds/pds_vdpa.h
>> @@ -0,0 +1,219 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright(c) 2022 Pensando Systems, Inc */
>> +
>> +#ifndef _PDS_VDPA_IF_H_
>> +#define _PDS_VDPA_IF_H_
>> +
>> +#include <linux/pds/pds_common.h>
>> +
>> +#define PDS_DEV_TYPE_VDPA_STR        "vDPA"
>> +#define PDS_VDPA_DEV_NAME    PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR
>> +
>> +/*
>> + * enum pds_vdpa_cmd_opcode - vDPA Device commands
>> + */
>> +enum pds_vdpa_cmd_opcode {
>> +     PDS_VDPA_CMD_INIT               = 48,
>> +     PDS_VDPA_CMD_IDENT              = 49,
>> +     PDS_VDPA_CMD_RESET              = 51,
>> +     PDS_VDPA_CMD_VQ_RESET           = 52,
>> +     PDS_VDPA_CMD_VQ_INIT            = 53,
>> +     PDS_VDPA_CMD_STATUS_UPDATE      = 54,
>> +     PDS_VDPA_CMD_SET_FEATURES       = 55,
>> +     PDS_VDPA_CMD_SET_ATTR           = 56,
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_cmd - generic command
>> + * @opcode:  Opcode
>> + * @vdpa_index:      Index for vdpa subdevice
>> + * @vf_id:   VF id
>> + */
>> +struct pds_vdpa_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_comp - generic command completion
>> + * @status:  Status of the command (enum pds_core_status_code)
>> + * @rsvd:    Word boundary padding
>> + * @color:   Color bit
>> + */
>> +struct pds_vdpa_comp {
>> +     u8 status;
>> +     u8 rsvd[14];
>> +     u8 color;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_init_cmd - INIT command
>> + * @opcode:  Opcode PDS_VDPA_CMD_INIT
>> + * @vdpa_index: Index for vdpa subdevice
>> + * @vf_id:   VF id
>> + * @len:     length of config info DMA space
>> + * @config_pa:       address for DMA of virtio_net_config struct
> 
> 
> Looks like the structure is not specific to net, if yes, we may tweak
> the above comment to say it's the address of the device configuration 
> space.

We're not expecting to do anything other than net, but yes we can update 
this comment.

> 
> 
>> + */
>> +struct pds_vdpa_init_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +     __le32 len;
>> +     __le64 config_pa;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_ident - vDPA identification data
>> + * @hw_features:     vDPA features supported by device
>> + * @max_vqs:         max queues available (2 queues for a single 
>> queuepair)
>> + * @max_qlen:                log(2) of maximum number of descriptors
>> + * @min_qlen:                log(2) of minimum number of descriptors
> 
> 
> Note that is you have the plan to support packed virtqueue, the qlen is
> not necessarily the power of 2.

Shouldn't be a problem - this is only a way of giving us the max queue 
len in an 8-bit value, it doesn't mean we can only deal with power-of-2 
actual use values.


> 
> 
>> + *
>> + * This struct is used in a DMA block that is set up for the 
>> PDS_VDPA_CMD_IDENT
>> + * transaction.  Set up the DMA block and send the address in the 
>> IDENT cmd
>> + * data, the DSC will write the ident information, then we can remove 
>> the DMA
>> + * block after reading the answer.  If the completion status is 0, 
>> then there
>> + * is valid information, else there was an error and the data should 
>> be invalid.
>> + */
>> +struct pds_vdpa_ident {
>> +     __le64 hw_features;
>> +     __le16 max_vqs;
>> +     __le16 max_qlen;
>> +     __le16 min_qlen;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_ident_cmd - IDENT command
>> + * @opcode:  Opcode PDS_VDPA_CMD_IDENT
>> + * @rsvd:       Word boundary padding
>> + * @vf_id:   VF id
>> + * @len:     length of ident info DMA space
>> + * @ident_pa:        address for DMA of ident info (struct 
>> pds_vdpa_ident)
>> + *                   only used for this transaction, then forgotten 
>> by DSC
>> + */
>> +struct pds_vdpa_ident_cmd {
>> +     u8     opcode;
>> +     u8     rsvd;
>> +     __le16 vf_id;
>> +     __le32 len;
>> +     __le64 ident_pa;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_status_cmd - STATUS_UPDATE command
>> + * @opcode:  Opcode PDS_VDPA_CMD_STATUS_UPDATE
>> + * @vdpa_index: Index for vdpa subdevice
>> + * @vf_id:   VF id
>> + * @status:  new status bits
>> + */
>> +struct pds_vdpa_status_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +     u8     status;
>> +};
>> +
>> +/**
>> + * enum pds_vdpa_attr - List of VDPA device attributes
>> + * @PDS_VDPA_ATTR_MAC:          MAC address
>> + * @PDS_VDPA_ATTR_MAX_VQ_PAIRS: Max virtqueue pairs
>> + */
>> +enum pds_vdpa_attr {
>> +     PDS_VDPA_ATTR_MAC          = 1,
>> +     PDS_VDPA_ATTR_MAX_VQ_PAIRS = 2,
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_setattr_cmd - SET_ATTR command
>> + * @opcode:          Opcode PDS_VDPA_CMD_SET_ATTR
>> + * @vdpa_index:              Index for vdpa subdevice
>> + * @vf_id:           VF id
>> + * @attr:            attribute to be changed (enum pds_vdpa_attr)
>> + * @pad:             Word boundary padding
>> + * @mac:             new mac address to be assigned as vdpa device 
>> address
>> + * @max_vq_pairs:    new limit of virtqueue pairs
>> + */
>> +struct pds_vdpa_setattr_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +     u8     attr;
>> +     u8     pad[3];
>> +     union {
>> +             u8 mac[6];
>> +             __le16 max_vq_pairs;
> 
> 
> So does this mean if we want to set both mac and max_vq_paris, we need
> two commands? The seems to be less efficient, since the mgmt layer could
> provision more attributes here. Can we pack all attributes into a single
> command?

Yes, that is how the cmd is set up, similar to other setattr commands 
work in our firmware.  This was driven originally by our ionic's struct 
ionic_lif_setattr_cmd where you can see that the combined fields in the 
union would be much greater than the available 64 bytes for the request. 
  This is a new device, but we wanted to keep the cmd operations similar.

> 
> 
>> +     } __packed;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_vq_init_cmd - queue init command
>> + * @opcode: Opcode PDS_VDPA_CMD_VQ_INIT
>> + * @vdpa_index:      Index for vdpa subdevice
>> + * @vf_id:   VF id
>> + * @qid:     Queue id (bit0 clear = rx, bit0 set = tx, qid=N is ctrlq)
> 
> 
> I wonder any reason we need to design it like this, it would be better
> to make it general to be used by other type of virtio devices.

There's no plan to go beyond net devices, but if we find we need to 
revisit that, we can version the interface and add what is needed.

> 
> 
>> + * @len:     log(2) of max descriptor count
>> + * @desc_addr:       DMA address of descriptor area
>> + * @avail_addr:      DMA address of available descriptors (aka driver 
>> area)
>> + * @used_addr:       DMA address of used descriptors (aka device area)
>> + * @intr_index:      interrupt index
> 
> 
> Is this something like MSI-X vector?

It is an index into the VF's list of MSI-x vectors

> 
> 
>> + */
>> +struct pds_vdpa_vq_init_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +     __le16 qid;
>> +     __le16 len;
>> +     __le64 desc_addr;
>> +     __le64 avail_addr;
>> +     __le64 used_addr;
>> +     __le16 intr_index;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_vq_init_comp - queue init completion
>> + * @status:  Status of the command (enum pds_core_status_code)
>> + * @hw_qtype:        HW queue type, used in doorbell selection
>> + * @hw_qindex:       HW queue index, used in doorbell selection
>> + * @rsvd:    Word boundary padding
>> + * @color:   Color bit
> 
> 
> More comment is needed to know the how to use this color bit.

Sure.  I'll add something somewhere about how this is used in place of a 
tail pointer.

> 
> 
>> + */
>> +struct pds_vdpa_vq_init_comp {
>> +     u8     status;
>> +     u8     hw_qtype;
>> +     __le16 hw_qindex;
>> +     u8     rsvd[11];
>> +     u8     color;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_vq_reset_cmd - queue reset command
>> + * @opcode:  Opcode PDS_VDPA_CMD_VQ_RESET
> 
> 
> Is there a chance that we could have more type of opcode here?

I'm not sure I understand this question... yes it is possible we can add 
to the enum pds_vdpa_cmd_opcode list.

sln

> 
> Thanks
> 
> 
>> + * @vdpa_index:      Index for vdpa subdevice
>> + * @vf_id:   VF id
>> + * @qid:     Queue id
>> + */
>> +struct pds_vdpa_vq_reset_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +     __le16 qid;
>> +};
>> +
>> +/**
>> + * struct pds_vdpa_set_features_cmd - set hw features
>> + * @opcode: Opcode PDS_VDPA_CMD_SET_FEATURES
>> + * @vdpa_index:      Index for vdpa subdevice
>> + * @vf_id:   VF id
>> + * @rsvd:       Word boundary padding
>> + * @features:        Feature bit mask
>> + */
>> +struct pds_vdpa_set_features_cmd {
>> +     u8     opcode;
>> +     u8     vdpa_index;
>> +     __le16 vf_id;
>> +     __le32 rsvd;
>> +     __le64 features;
>> +};
>> +
>> +#endif /* _PDS_VDPA_IF_H_ */
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa
  2022-11-22  6:36     ` Jason Wang
@ 2022-11-29 23:02       ` Shannon Nelson
  0 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29 23:02 UTC (permalink / raw)
  To: Jason Wang, Shannon Nelson, netdev, davem, kuba, mst, virtualization
  Cc: drivers

On 11/21/22 10:36 PM, Jason Wang wrote:
> On Tue, Nov 22, 2022 at 11:32 AM Jason Wang <jasowang@redhat.com> wrote:
>> 在 2022/11/19 06:56, Shannon Nelson 写道:
>>> The PDS vDPA device has a virtio BAR for describing itself, and
>>> the pds_vdpa driver needs to access it.  Here we copy liberally
>>> from the existing drivers/virtio/virtio_pci_modern_dev.c as it
>>> has what we need, but we need to modify it so that it can work
>>> with our device id and so we can use our own DMA mask.
>>>
>>> We suspect there is room for discussion here about making the
>>> existing code a little more flexible, but we thought we'd at
>>> least start the discussion here.
>>
>>
>> Exactly, since the virtio_pci_modern_dev.c is a library, we could tweak
>> it to allow the caller to pass the device_id with the DMA mask. Then we
>> can avoid code/bug duplication here.

I'll look into possible mods for it, although I'm not sure how quickly I 
can get to that... maybe a v+1 along the way.

> 
> Btw, I found only isr/notification were used but not the others? If
> this is true, we can avoid mapping those capabilities.

I'll keep this in mind.

sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands
  2022-11-22  6:32   ` Jason Wang
@ 2022-11-29 23:16     ` Shannon Nelson
  0 siblings, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-29 23:16 UTC (permalink / raw)
  To: Jason Wang, Shannon Nelson
  Cc: netdev, davem, kuba, mst, virtualization, drivers

On 11/21/22 10:32 PM, Jason Wang wrote:
> On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>>
>> These are the adminq commands that will be needed for
>> setting up and using the vDPA device.
>>
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
>> ---
>>   drivers/vdpa/pds/Makefile   |   1 +
>>   drivers/vdpa/pds/cmds.c     | 266 ++++++++++++++++++++++++++++++++++++
>>   drivers/vdpa/pds/cmds.h     |  17 +++
>>   drivers/vdpa/pds/vdpa_dev.h |  60 ++++++++
>>   4 files changed, 344 insertions(+)
>>   create mode 100644 drivers/vdpa/pds/cmds.c
>>   create mode 100644 drivers/vdpa/pds/cmds.h
>>   create mode 100644 drivers/vdpa/pds/vdpa_dev.h
>>
> 
> [...]
> 
>> +struct pds_vdpa_device {
>> +       struct vdpa_device vdpa_dev;
>> +       struct pds_vdpa_aux *vdpa_aux;
>> +       struct pds_vdpa_hw hw;
>> +
>> +       struct virtio_net_config vn_config;
>> +       dma_addr_t vn_config_pa;
> 
> So this is the dma address not necessarily pa, we'd better drop the "pa" suffix.

Yeah, strictly speaking I suppose it isn't necessarily pa, but _pa is 
the moniker we've used throughout our drivers for this kind of thing - 
maybe not perfect, but this is where we are for now.

sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces
  2022-11-22  6:32   ` Jason Wang
@ 2022-11-30  0:11     ` Shannon Nelson
  2022-12-05  7:40       ` Jason Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-11-30  0:11 UTC (permalink / raw)
  To: Jason Wang, Shannon Nelson
  Cc: netdev, davem, kuba, mst, virtualization, drivers

On 11/21/22 10:32 PM, Jason Wang wrote:
> On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>>
>> This is the vDPA device support, where we advertise that we can
>> support the virtio queues and deal with the configuration work
>> through the pds_core's adminq.
>>
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
>> ---
>>   drivers/vdpa/pds/Makefile   |   3 +-
>>   drivers/vdpa/pds/aux_drv.c  |  33 ++
>>   drivers/vdpa/pds/debugfs.c  | 167 ++++++++
>>   drivers/vdpa/pds/debugfs.h  |   4 +
>>   drivers/vdpa/pds/vdpa_dev.c | 796 ++++++++++++++++++++++++++++++++++++
>>   5 files changed, 1002 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/vdpa/pds/vdpa_dev.c
>>
>> diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
>> index fafd356ddf86..7fde4a4a1620 100644
>> --- a/drivers/vdpa/pds/Makefile
>> +++ b/drivers/vdpa/pds/Makefile
>> @@ -7,4 +7,5 @@ pds_vdpa-y := aux_drv.o \
>>                cmds.o \
>>                pci_drv.o \
>>                debugfs.o \
>> -             virtio_pci.o
>> +             virtio_pci.o \
>> +             vdpa_dev.o
>> diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
>> index aef3c984dc90..83b9a5a79325 100644
>> --- a/drivers/vdpa/pds/aux_drv.c
>> +++ b/drivers/vdpa/pds/aux_drv.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/pds/pds_vdpa.h>
>>
>>   #include "aux_drv.h"
>> +#include "vdpa_dev.h"
>>   #include "pci_drv.h"
>>   #include "debugfs.h"
>>
>> @@ -25,10 +26,25 @@ static void
>>   pds_vdpa_aux_notify_handler(struct pds_auxiliary_dev *padev,
>>                              union pds_core_notifyq_comp *event)
>>   {
>> +       struct pds_vdpa_device *pdsv = padev->priv;
>>          struct device *dev = &padev->aux_dev.dev;
>>          u16 ecode = le16_to_cpu(event->ecode);
>>
>>          dev_info(dev, "%s: event code %d\n", __func__, ecode);
>> +
>> +       /* Give the upper layers a hint that something interesting
>> +        * may have happened.  It seems that the only thing this
>> +        * triggers in the virtio-net drivers above us is a check
>> +        * of link status.
>> +        *
>> +        * We don't set the NEEDS_RESET flag for EVENT_RESET
>> +        * because we're likely going through a recovery or
>> +        * fw_update and will be back up and running soon.
>> +        */
>> +       if (ecode == PDS_EVENT_RESET || ecode == PDS_EVENT_LINK_CHANGE) {
>> +               if (pdsv->hw.config_cb.callback)
>> +                       pdsv->hw.config_cb.callback(pdsv->hw.config_cb.private);
>> +       }
>>   }
>>
>>   static int
>> @@ -80,10 +96,25 @@ pds_vdpa_aux_probe(struct auxiliary_device *aux_dev,
>>                  goto err_register_client;
>>          }
>>
>> +       /* Get device ident info and set up the vdpa_mgmt_dev */
>> +       err = pds_vdpa_get_mgmt_info(vdpa_aux);
>> +       if (err)
>> +               goto err_register_client;
>> +
>> +       /* Let vdpa know that we can provide devices */
>> +       err = vdpa_mgmtdev_register(&vdpa_aux->vdpa_mdev);
>> +       if (err) {
>> +               dev_err(dev, "%s: Failed to initialize vdpa_mgmt interface: %pe\n",
>> +                       __func__, ERR_PTR(err));
>> +               goto err_mgmt_reg;
>> +       }
>> +
>>          pds_vdpa_debugfs_add_ident(vdpa_aux);
>>
>>          return 0;
>>
>> +err_mgmt_reg:
>> +       padev->ops->unregister_client(padev);
>>   err_register_client:
>>          auxiliary_set_drvdata(aux_dev, NULL);
>>   err_invalid_driver:
>> @@ -98,6 +129,8 @@ pds_vdpa_aux_remove(struct auxiliary_device *aux_dev)
>>          struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
>>          struct device *dev = &aux_dev->dev;
>>
>> +       vdpa_mgmtdev_unregister(&vdpa_aux->vdpa_mdev);
>> +
>>          vdpa_aux->padev->ops->unregister_client(vdpa_aux->padev);
>>          if (vdpa_aux->vdpa_vf)
>>                  pci_dev_put(vdpa_aux->vdpa_vf->pdev);
>> diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
>> index f766412209df..aa3143126a7e 100644
>> --- a/drivers/vdpa/pds/debugfs.c
>> +++ b/drivers/vdpa/pds/debugfs.c
>> @@ -11,6 +11,7 @@
>>   #include <linux/pds/pds_auxbus.h>
>>   #include <linux/pds/pds_vdpa.h>
>>
>> +#include "vdpa_dev.h"
>>   #include "aux_drv.h"
>>   #include "pci_drv.h"
>>   #include "debugfs.h"
>> @@ -19,6 +20,72 @@
>>
>>   static struct dentry *dbfs_dir;
>>
>> +#define PRINT_SBIT_NAME(__seq, __f, __name)                     \
>> +       do {                                                    \
>> +               if (__f & __name)                               \
>> +                       seq_printf(__seq, " %s", &#__name[16]); \
>> +       } while (0)
>> +
>> +static void
>> +print_status_bits(struct seq_file *seq, u16 status)
>> +{
>> +       seq_puts(seq, "status:");
>> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER);
>> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER_OK);
>> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FEATURES_OK);
>> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_NEEDS_RESET);
>> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FAILED);
>> +       seq_puts(seq, "\n");
>> +}
>> +
>> +#define PRINT_FBIT_NAME(__seq, __f, __name)                \
>> +       do {                                               \
>> +               if (__f & BIT_ULL(__name))                 \
>> +                       seq_printf(__seq, " %s", #__name); \
>> +       } while (0)
>> +
>> +static void
>> +print_feature_bits(struct seq_file *seq, u64 features)
>> +{
>> +       seq_puts(seq, "features:");
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CSUM);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_CSUM);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MTU);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MAC);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO4);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO6);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ECN);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_UFO);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO4);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO6);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_ECN);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_UFO);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MRG_RXBUF);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STATUS);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VQ);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VLAN);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX_EXTRA);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ANNOUNCE);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MQ);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_MAC_ADDR);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HASH_REPORT);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSS);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSC_EXT);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STANDBY);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_SPEED_DUPLEX);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_NOTIFY_ON_EMPTY);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ANY_LAYOUT);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_VERSION_1);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ACCESS_PLATFORM);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_RING_PACKED);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ORDER_PLATFORM);
>> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_SR_IOV);
>> +       seq_puts(seq, "\n");
>> +}
>> +
>>   void
>>   pds_vdpa_debugfs_create(void)
>>   {
>> @@ -49,10 +116,18 @@ static int
>>   identity_show(struct seq_file *seq, void *v)
>>   {
>>          struct pds_vdpa_aux *vdpa_aux = seq->private;
>> +       struct vdpa_mgmt_dev *mgmt;
>>
>>          seq_printf(seq, "aux_dev:            %s\n",
>>                     dev_name(&vdpa_aux->padev->aux_dev.dev));
>>
>> +       mgmt = &vdpa_aux->vdpa_mdev;
>> +       seq_printf(seq, "max_vqs:            %d\n", mgmt->max_supported_vqs);
>> +       seq_printf(seq, "config_attr_mask:   %#llx\n", mgmt->config_attr_mask);
>> +       seq_printf(seq, "supported_features: %#llx\n", mgmt->supported_features);
>> +       print_feature_bits(seq, mgmt->supported_features);
>> +       seq_printf(seq, "local_mac_bit:      %d\n", vdpa_aux->local_mac_bit);
>> +
>>          return 0;
>>   }
>>   DEFINE_SHOW_ATTRIBUTE(identity);
>> @@ -64,4 +139,96 @@ pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux)
>>                              vdpa_aux, &identity_fops);
>>   }
>>
>> +static int
>> +config_show(struct seq_file *seq, void *v)
>> +{
>> +       struct pds_vdpa_device *pdsv = seq->private;
>> +       struct virtio_net_config *vc = &pdsv->vn_config;
>> +
>> +       seq_printf(seq, "mac:                  %pM\n", vc->mac);
>> +       seq_printf(seq, "max_virtqueue_pairs:  %d\n",
>> +                  __virtio16_to_cpu(true, vc->max_virtqueue_pairs));
>> +       seq_printf(seq, "mtu:                  %d\n", __virtio16_to_cpu(true, vc->mtu));
>> +       seq_printf(seq, "speed:                %d\n", le32_to_cpu(vc->speed));
>> +       seq_printf(seq, "duplex:               %d\n", vc->duplex);
>> +       seq_printf(seq, "rss_max_key_size:     %d\n", vc->rss_max_key_size);
>> +       seq_printf(seq, "rss_max_indirection_table_length: %d\n",
>> +                  le16_to_cpu(vc->rss_max_indirection_table_length));
>> +       seq_printf(seq, "supported_hash_types: %#x\n",
>> +                  le32_to_cpu(vc->supported_hash_types));
>> +       seq_printf(seq, "vn_status:            %#x\n",
>> +                  __virtio16_to_cpu(true, vc->status));
>> +       print_status_bits(seq, __virtio16_to_cpu(true, vc->status));
>> +
>> +       seq_printf(seq, "hw_status:            %#x\n", pdsv->hw.status);
>> +       print_status_bits(seq, pdsv->hw.status);
>> +       seq_printf(seq, "req_features:         %#llx\n", pdsv->hw.req_features);
>> +       print_feature_bits(seq, pdsv->hw.req_features);
>> +       seq_printf(seq, "actual_features:      %#llx\n", pdsv->hw.actual_features);
>> +       print_feature_bits(seq, pdsv->hw.actual_features);
>> +       seq_printf(seq, "vdpa_index:           %d\n", pdsv->hw.vdpa_index);
>> +       seq_printf(seq, "num_vqs:              %d\n", pdsv->hw.num_vqs);
>> +
>> +       return 0;
>> +}
>> +DEFINE_SHOW_ATTRIBUTE(config);
>> +
>> +static int
>> +vq_show(struct seq_file *seq, void *v)
>> +{
>> +       struct pds_vdpa_vq_info *vq = seq->private;
>> +       struct pds_vdpa_intr_info *intrs;
>> +
>> +       seq_printf(seq, "ready:      %d\n", vq->ready);
>> +       seq_printf(seq, "desc_addr:  %#llx\n", vq->desc_addr);
>> +       seq_printf(seq, "avail_addr: %#llx\n", vq->avail_addr);
>> +       seq_printf(seq, "used_addr:  %#llx\n", vq->used_addr);
>> +       seq_printf(seq, "q_len:      %d\n", vq->q_len);
>> +       seq_printf(seq, "qid:        %d\n", vq->qid);
>> +
>> +       seq_printf(seq, "doorbell:   %#llx\n", vq->doorbell);
>> +       seq_printf(seq, "avail_idx:  %d\n", vq->avail_idx);
>> +       seq_printf(seq, "used_idx:   %d\n", vq->used_idx);
>> +       seq_printf(seq, "intr_index: %d\n", vq->intr_index);
>> +
>> +       intrs = vq->pdsv->vdpa_aux->vdpa_vf->intrs;
>> +       seq_printf(seq, "irq:        %d\n", intrs[vq->intr_index].irq);
>> +       seq_printf(seq, "irq-name:   %s\n", intrs[vq->intr_index].name);
>> +
>> +       seq_printf(seq, "hw_qtype:   %d\n", vq->hw_qtype);
>> +       seq_printf(seq, "hw_qindex:  %d\n", vq->hw_qindex);
>> +
>> +       return 0;
>> +}
>> +DEFINE_SHOW_ATTRIBUTE(vq);
>> +
>> +void
>> +pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv)
>> +{
>> +       struct dentry *dentry;
>> +       const char *name;
>> +       int i;
>> +
>> +       dentry = pdsv->vdpa_aux->vdpa_vf->dentry;
>> +       name = dev_name(&pdsv->vdpa_dev.dev);
>> +
>> +       pdsv->dentry = debugfs_create_dir(name, dentry);
>> +
>> +       debugfs_create_file("config", 0400, pdsv->dentry, pdsv, &config_fops);
>> +
>> +       for (i = 0; i < pdsv->hw.num_vqs; i++) {
>> +               char name[8];
>> +
>> +               snprintf(name, sizeof(name), "vq%02d", i);
>> +               debugfs_create_file(name, 0400, pdsv->dentry, &pdsv->hw.vqs[i], &vq_fops);
>> +       }
>> +}
>> +
>> +void
>> +pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv)
>> +{
>> +       debugfs_remove_recursive(pdsv->dentry);
>> +       pdsv->dentry = NULL;
>> +}
>> +
>>   #endif /* CONFIG_DEBUG_FS */
>> diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
>> index 939a4c248aac..f0567e4ee4e4 100644
>> --- a/drivers/vdpa/pds/debugfs.h
>> +++ b/drivers/vdpa/pds/debugfs.h
>> @@ -13,12 +13,16 @@ void pds_vdpa_debugfs_destroy(void);
>>   void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
>>   void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
>>   void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux);
>> +void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv);
>> +void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv);
>>   #else
>>   static inline void pds_vdpa_debugfs_create(void) { }
>>   static inline void pds_vdpa_debugfs_destroy(void) { }
>>   static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
>>   static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
>>   static inline void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux) { }
>> +static inline void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv) { }
>> +static inline void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv) { }
>>   #endif
>>
>>   #endif /* _PDS_VDPA_DEBUGFS_H_ */
>> diff --git a/drivers/vdpa/pds/vdpa_dev.c b/drivers/vdpa/pds/vdpa_dev.c
>> new file mode 100644
>> index 000000000000..824be42aff0d
>> --- /dev/null
>> +++ b/drivers/vdpa/pds/vdpa_dev.c
>> @@ -0,0 +1,796 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright(c) 2022 Pensando Systems, Inc */
>> +
>> +#include <linux/interrupt.h>
>> +#include <linux/module.h>
>> +#include <linux/pci.h>
>> +#include <linux/sysfs.h>
>> +#include <linux/types.h>
>> +#include <linux/vdpa.h>
>> +#include <uapi/linux/virtio_pci.h>
>> +#include <uapi/linux/vdpa.h>
>> +
>> +#include <linux/pds/pds_intr.h>
>> +#include <linux/pds/pds_core_if.h>
>> +#include <linux/pds/pds_adminq.h>
>> +#include <linux/pds/pds_auxbus.h>
>> +#include <linux/pds/pds_vdpa.h>
>> +
>> +#include "vdpa_dev.h"
>> +#include "pci_drv.h"
>> +#include "aux_drv.h"
>> +#include "pci_drv.h"
>> +#include "cmds.h"
>> +#include "debugfs.h"
>> +
>> +static int
>> +pds_vdpa_setup_driver(struct pds_vdpa_device *pdsv)
>> +{
>> +       struct device *dev = &pdsv->vdpa_dev.dev;
>> +       int err = 0;
>> +       int i;
>> +
>> +       /* Verify all vqs[] are in ready state */
>> +       for (i = 0; i < pdsv->hw.num_vqs; i++) {
>> +               if (!pdsv->hw.vqs[i].ready) {
>> +                       dev_warn(dev, "%s: qid %d not ready\n", __func__, i);
>> +                       err = -ENOENT;
>> +               }
>> +       }
>> +
>> +       return err;
>> +}
>> +
>> +static struct pds_vdpa_device *
>> +vdpa_to_pdsv(struct vdpa_device *vdpa_dev)
>> +{
>> +       return container_of(vdpa_dev, struct pds_vdpa_device, vdpa_dev);
>> +}
>> +
>> +static struct pds_vdpa_hw *
>> +vdpa_to_hw(struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +
>> +       return &pdsv->hw;
>> +}
>> +
>> +static int
>> +pds_vdpa_set_vq_address(struct vdpa_device *vdpa_dev, u16 qid,
>> +                       u64 desc_addr, u64 driver_addr, u64 device_addr)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       hw->vqs[qid].desc_addr = desc_addr;
>> +       hw->vqs[qid].avail_addr = driver_addr;
>> +       hw->vqs[qid].used_addr = device_addr;
>> +
>> +       return 0;
>> +}
>> +
>> +static void
>> +pds_vdpa_set_vq_num(struct vdpa_device *vdpa_dev, u16 qid, u32 num)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       hw->vqs[qid].q_len = num;
>> +}
>> +
>> +static void
>> +pds_vdpa_kick_vq(struct vdpa_device *vdpa_dev, u16 qid)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +
>> +       iowrite16(qid, pdsv->hw.vqs[qid].notify);
>> +}
>> +
>> +static void
>> +pds_vdpa_set_vq_cb(struct vdpa_device *vdpa_dev, u16 qid,
>> +                  struct vdpa_callback *cb)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       hw->vqs[qid].event_cb = *cb;
>> +}
>> +
>> +static irqreturn_t
>> +pds_vdpa_isr(int irq, void *data)
>> +{
>> +       struct pds_core_intr __iomem *intr_ctrl;
>> +       struct pds_vdpa_device *pdsv;
>> +       struct pds_vdpa_vq_info *vq;
>> +
>> +       vq = data;
>> +       pdsv = vq->pdsv;
>> +
>> +       if (vq->event_cb.callback)
>> +               vq->event_cb.callback(vq->event_cb.private);
>> +
>> +       /* Since we don't actually know how many vq descriptors are
>> +        * covered in this interrupt cycle, we simply clean all the
>> +        * credits and re-enable the interrupt.
>> +        */
>> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
>> +       pds_core_intr_clean_flags(&intr_ctrl[vq->intr_index],
>> +                                 PDS_CORE_INTR_CRED_REARM);
>> +
>> +       return IRQ_HANDLED;
>> +}
>> +
>> +static void
>> +pds_vdpa_release_irq(struct pds_vdpa_device *pdsv, int qid)
>> +{
>> +       struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
>> +       struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
>> +       struct pds_core_intr __iomem *intr_ctrl;
>> +       int index;
>> +
>> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
>> +       index = pdsv->hw.vqs[qid].intr_index;
>> +       if (index == VIRTIO_MSI_NO_VECTOR)
>> +               return;
>> +
>> +       if (intrs[index].irq == VIRTIO_MSI_NO_VECTOR)
>> +               return;
>> +
>> +       if (qid & 0x1) {
>> +               pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
>> +       } else {
>> +               pds_core_intr_mask(&intr_ctrl[index], PDS_CORE_INTR_MASK_SET);
>> +               devm_free_irq(&pdev->dev, intrs[index].irq, &pdsv->hw.vqs[qid]);
>> +               pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
>> +               intrs[index].irq = VIRTIO_MSI_NO_VECTOR;
>> +       }
>> +}
>> +
>> +static void
>> +pds_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev, u16 qid, bool ready)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +       struct device *dev = &pdsv->vdpa_dev.dev;
>> +       struct pds_core_intr __iomem *intr_ctrl;
>> +       int err;
>> +
>> +       dev_dbg(dev, "%s: qid %d ready %d => %d\n",
>> +                __func__, qid, hw->vqs[qid].ready, ready);
>> +       if (ready == hw->vqs[qid].ready)
>> +               return;
>> +
>> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
> 
> It looks to me pds has a different layout/semantic for isr than virtio
> spec. I'd suggest to not reuse spec isr here to avoid confusion.

Hmm, yes, that needs some straightening out.

> 
>> +       if (ready) {
> 
> Spec said no interrupt before DRIVER_OK, it looks more simple if we
> move the interrupt setup to set_status():
> 
> E.g we can know if we have sufficient vectors and use different
> mapping policies in advance.

I'll look at that.

> 
>> +               struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
>> +               int index = VIRTIO_MSI_NO_VECTOR;
>> +               int i;
>> +
>> +               /*  Tx and Rx queues share interrupts, and they start with
>> +                *  even numbers, so only find an interrupt for the even numbered
>> +                *  qid, and let the odd number use what the previous queue got.
>> +                */
>> +               if (qid & 0x1) {
>> +                       int even = qid & ~0x1;
>> +
>> +                       index = hw->vqs[even].intr_index;
>> +               } else {
>> +                       for (i = 0; i < pdsv->vdpa_aux->vdpa_vf->nintrs; i++) {
>> +                               if (intrs[i].irq == VIRTIO_MSI_NO_VECTOR) {
>> +                                       index = i;
>> +                                       break;
>> +                               }
>> +                       }
>> +               }
>> +
>> +               if (qid & 0x1) {
>> +                       hw->vqs[qid].intr_index = index;
>> +               } else if (index != VIRTIO_MSI_NO_VECTOR) {
>> +                       int irq;
>> +
>> +                       irq = pci_irq_vector(pdev, index);
>> +                       snprintf(intrs[index].name, sizeof(intrs[index].name),
>> +                                "vdpa-%s-%d", dev_name(dev), index);
>> +
>> +                       err = devm_request_irq(&pdev->dev, irq, pds_vdpa_isr, 0,
>> +                                              intrs[index].name, &hw->vqs[qid]);
>> +                       if (err) {
>> +                               dev_info(dev, "%s: no irq for qid %d: %pe\n",
>> +                                        __func__, qid, ERR_PTR(err));
> 
> Should we fail?
> 
>> +                       } else {
>> +                               intrs[index].irq = irq;
>> +                               hw->vqs[qid].intr_index = index;
>> +                               pds_core_intr_mask(&intr_ctrl[index],
>> +                                                  PDS_CORE_INTR_MASK_CLEAR);
> 
> I guess the reason that you don't simply use VF MSI-X is the DPU can
> support vDPA subdevice in the future?
> 
>> +                       }
>> +               } else {
>> +                       dev_info(dev, "%s: no intr slot for qid %d\n",
>> +                                __func__, qid);
> 
> Do we need to fail here?
> 
>> +               }
>> +
>> +               /* Pass vq setup info to DSC */
>> +               err = pds_vdpa_cmd_init_vq(pdsv, qid, &hw->vqs[qid]);
>> +               if (err) {
>> +                       pds_vdpa_release_irq(pdsv, qid);
>> +                       ready = false;
>> +               }
>> +       } else {
>> +               pds_vdpa_release_irq(pdsv, qid);
>> +               (void) pds_vdpa_cmd_reset_vq(pdsv, qid);
>> +       }
>> +
>> +       hw->vqs[qid].ready = ready;
>> +}
>> +
>> +static bool
>> +pds_vdpa_get_vq_ready(struct vdpa_device *vdpa_dev, u16 qid)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       return hw->vqs[qid].ready;
>> +}
>> +
>> +static int
>> +pds_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
>> +                     const struct vdpa_vq_state *state)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       hw->vqs[qid].used_idx = state->split.avail_index;
>> +       hw->vqs[qid].avail_idx = state->split.avail_index;
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>> +pds_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
>> +                     struct vdpa_vq_state *state)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       state->split.avail_index = hw->vqs[qid].avail_idx;
> 
> Who is in charge of reading avail_idx from the hardware?

We didn't have that available in the early FW, so it isn't here yet. 
Work in progerss.

> 
>> +
>> +       return 0;
>> +}
>> +
>> +static struct vdpa_notification_area
>> +pds_vdpa_get_vq_notification(struct vdpa_device *vdpa_dev, u16 qid)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +       struct virtio_pci_modern_device *vd_mdev;
>> +       struct vdpa_notification_area area;
>> +
>> +       area.addr = hw->vqs[qid].notify_pa;
>> +
>> +       vd_mdev = &pdsv->vdpa_aux->vdpa_vf->vd_mdev;
>> +       if (!vd_mdev->notify_offset_multiplier)
>> +               area.size = PAGE_SIZE;
>> +       else
>> +               area.size = vd_mdev->notify_offset_multiplier;
>> +
>> +       return area;
>> +}
>> +
>> +static int
>> +pds_vdpa_get_vq_irq(struct vdpa_device *vdpa_dev, u16 qid)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +       int irq = VIRTIO_MSI_NO_VECTOR;
>> +       int index;
>> +
>> +       if (pdsv->vdpa_aux->vdpa_vf->intrs) {
>> +               index = hw->vqs[qid].intr_index;
>> +               irq = pdsv->vdpa_aux->vdpa_vf->intrs[index].irq;
> 
> The notification area mapping might only work well when each vq has
> it's own irq. Otherwise guest may see spurious interrupt which may
> degrade the performance.

We haven't been expecting to use shared interrupts - are we being overly 
optimistic?


> 
>> +       }
>> +
>> +       return irq;
>> +}
>> +
>> +static u32
>> +pds_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
>> +{
>> +
>> +       return PAGE_SIZE;
>> +}
>> +
>> +static u32
>> +pds_vdpa_get_vq_group(struct vdpa_device *vdpa_dev, u16 idx)
>> +{
>> +       return 0;
>> +}
>> +
>> +static u64
>> +pds_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +
>> +       return le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
>> +}
>> +
>> +static int
>> +pds_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 features)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +       struct device *dev = &pdsv->vdpa_dev.dev;
>> +       u64 nego_features;
>> +       u64 set_features;
>> +       u64 missing;
>> +       int err;
>> +
>> +       if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) {
>> +               dev_err(dev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n");
>> +               return -EOPNOTSUPP;
> 
> Should we fail the FEATURE_OK in this case and all the other below
> error conditions?

Perhaps I'm missing a nuance in the inteface... isn't that what we're 
doing by returning a non-zero status?

> 
>> +       }
>> +
>> +       hw->req_features = features;
>> +
>> +       /* Check for valid feature bits */
>> +       nego_features = features & le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
>> +       missing = hw->req_features & ~nego_features;
>> +       if (missing) {
>> +               dev_err(dev, "Can't support all requested features in %#llx, missing %#llx features\n",
>> +                       hw->req_features, missing);
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>> +       dev_dbg(dev, "%s: %#llx => %#llx\n",
>> +                __func__, hw->actual_features, nego_features);
>> +
>> +       if (hw->actual_features == nego_features)
>> +               return 0;
>> +
>> +       /* Update hw feature configuration, strip MAC bit if locally set */
>> +       if (pdsv->vdpa_aux->local_mac_bit)
>> +               set_features = nego_features & ~BIT_ULL(VIRTIO_NET_F_MAC);
> 
> Need some document to explain how local_mac_bit work.

I'll look at expanding the comment in pds_vdpa_get_mgmt_info() where 
this would get set to true.  Basically, it is tracking whether or not 
the driver is faking the VIRTIO_NET_F_MAC capability for the device.

> 
>> +       else
>> +               set_features = nego_features;
>> +       err = pds_vdpa_cmd_set_features(pdsv, set_features);
>> +       if (!err)
>> +               hw->actual_features = nego_features;
>> +
>> +       return err;
>> +}
>> +
>> +static u64
>> +pds_vdpa_get_driver_features(struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       return hw->actual_features;
>> +}
>> +
>> +static void
>> +pds_vdpa_set_config_cb(struct vdpa_device *vdpa_dev, struct vdpa_callback *cb)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       hw->config_cb.callback = cb->callback;
>> +       hw->config_cb.private = cb->private;
>> +}
>> +
>> +static u16
>> +pds_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       u32 max_qlen;
>> +
>> +       max_qlen = min_t(u32, PDS_VDPA_MAX_QLEN,
>> +                             1 << le16_to_cpu(pdsv->vdpa_aux->ident.max_qlen));
> 
> Assuming we can fetch the max_qlen from the device, any reason have
> another layer like PDS_VDPA_MAX_QLEN?

Yes, I think this can be simplified to just the read.

> 
>> +
>> +       return (u16)max_qlen;
>> +}
>> +
>> +static u32
>> +pds_vdpa_get_device_id(struct vdpa_device *vdpa_dev)
>> +{
>> +       return VIRTIO_ID_NET;
>> +}
>> +
>> +static u32
>> +pds_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev)
>> +{
>> +       return PCI_VENDOR_ID_PENSANDO;
>> +}
>> +
>> +static u8
>> +pds_vdpa_get_status(struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +
>> +       return hw->status;
> 
> How is this synchronized with the device or it is fully emulated by this driver?

For now has been fronted by the driver, but probably should be coming 
out of the virtio_net_config block.

> 
>> +}
>> +
>> +static void
>> +pds_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +       struct device *dev = &pdsv->vdpa_dev.dev;
>> +       int err;
>> +
>> +       if (hw->status == status)
>> +               return;
>> +
>> +       /* If the DRIVER_OK bit turns on, time to start the queues */
>> +       if ((status ^ hw->status) & VIRTIO_CONFIG_S_DRIVER_OK) {
>> +               if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>> +                       err = pds_vdpa_setup_driver(pdsv);
>> +                       if (err) {
>> +                               dev_err(dev, "failed to setup driver: %pe\n", ERR_PTR(err));
>> +                               status = hw->status | VIRTIO_CONFIG_S_FAILED;
>> +                       }
>> +               } else {
>> +                       dev_warn(dev, "did not expect DRIVER_OK to be cleared\n");
>> +               }
>> +       }
>> +
>> +       err = pds_vdpa_cmd_set_status(pdsv, status);
>> +       if (!err)
>> +               hw->status = status;
>> +}
>> +
>> +static int
>> +pds_vdpa_reset(struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
>> +       int i;
>> +
>> +       if (hw->status == 0)
>> +               return 0;
>> +
>> +       if (hw->status & VIRTIO_CONFIG_S_DRIVER_OK) {
>> +
>> +               /* Reset the vqs */
>> +               for (i = 0; i < hw->num_vqs; i++) {
>> +                       pds_vdpa_release_irq(pdsv, i);
>> +                       (void) pds_vdpa_cmd_reset_vq(pdsv, i);
> 
> (void) is unnecessary.

yep

> 
>> +
>> +                       memset(&pdsv->hw.vqs[i], 0, sizeof(pdsv->hw.vqs[0]));
>> +                       pdsv->hw.vqs[i].ready = false;
>> +               }
>> +       }
>> +
>> +       hw->status = 0;
>> +       (void) pds_vdpa_cmd_set_status(pdsv, 0);
>> +
>> +       return 0;
>> +}
>> +
>> +static size_t
>> +pds_vdpa_get_config_size(struct vdpa_device *vdpa_dev)
>> +{
>> +       return sizeof(struct virtio_net_config);
>> +}
>> +
>> +static void
>> +pds_vdpa_get_config(struct vdpa_device *vdpa_dev,
>> +                   unsigned int offset,
>> +                   void *buf, unsigned int len)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +
>> +       if (offset + len <= sizeof(struct virtio_net_config))
>> +               memcpy(buf, (u8 *)&pdsv->vn_config + offset, len);
>> +}
>> +
>> +static void
>> +pds_vdpa_set_config(struct vdpa_device *vdpa_dev,
>> +                   unsigned int offset, const void *buf,
>> +                   unsigned int len)
>> +{
>> +       /* In the virtio_net context, this callback seems to only be
>> +        * called in drivers supporting the older non-VERSION_1 API,
>> +        * so we can leave this an empty function, but we need  to
>> +        * define the function in case it does get called, as there
>> +        * are currently no checks for existence before calling in
>> +        * that path.
>> +        *
>> +        * The implementation would be something like:
>> +        * if (offset + len <= sizeof(struct virtio_net_config))
>> +        *      memcpy((u8 *)&pdsv->vn_config + offset, buf, len);
>> +        */
> 
> And we need to notify the hardware that config has been changed.

Sure

> 
>> +}
>> +
>> +static const struct vdpa_config_ops pds_vdpa_ops = {
>> +       .set_vq_address         = pds_vdpa_set_vq_address,
>> +       .set_vq_num             = pds_vdpa_set_vq_num,
>> +       .kick_vq                = pds_vdpa_kick_vq,
>> +       .set_vq_cb              = pds_vdpa_set_vq_cb,
>> +       .set_vq_ready           = pds_vdpa_set_vq_ready,
>> +       .get_vq_ready           = pds_vdpa_get_vq_ready,
>> +       .set_vq_state           = pds_vdpa_set_vq_state,
>> +       .get_vq_state           = pds_vdpa_get_vq_state,
>> +       .get_vq_notification    = pds_vdpa_get_vq_notification,
>> +       .get_vq_irq             = pds_vdpa_get_vq_irq,
>> +       .get_vq_align           = pds_vdpa_get_vq_align,
>> +       .get_vq_group           = pds_vdpa_get_vq_group,
>> +
>> +       .get_device_features    = pds_vdpa_get_device_features,
>> +       .set_driver_features    = pds_vdpa_set_driver_features,
>> +       .get_driver_features    = pds_vdpa_get_driver_features,
>> +       .set_config_cb          = pds_vdpa_set_config_cb,
>> +       .get_vq_num_max         = pds_vdpa_get_vq_num_max,
>> +/*     .get_vq_num_min (optional) */
>> +       .get_device_id          = pds_vdpa_get_device_id,
>> +       .get_vendor_id          = pds_vdpa_get_vendor_id,
>> +       .get_status             = pds_vdpa_get_status,
>> +       .set_status             = pds_vdpa_set_status,
>> +       .reset                  = pds_vdpa_reset,
>> +       .get_config_size        = pds_vdpa_get_config_size,
>> +       .get_config             = pds_vdpa_get_config,
>> +       .set_config             = pds_vdpa_set_config,
>> +
>> +/*     .get_generation (optional) */
>> +/*     .get_iova_range (optional) */
>> +/*     .set_group_asid */
>> +/*     .set_map (optional) */
>> +/*     .dma_map (optional) */
>> +/*     .dma_unmap (optional) */
>> +/*     .free (optional) */
>> +};
>> +static struct virtio_device_id pds_vdpa_id_table[] = {
>> +       {VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID},
>> +       {0},
>> +};
>> +
>> +static int
>> +pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
>> +                const struct vdpa_dev_set_config *add_config)
>> +{
>> +       struct pds_vdpa_aux *vdpa_aux;
>> +       struct pds_vdpa_device *pdsv;
>> +       struct vdpa_mgmt_dev *mgmt;
>> +       u16 fw_max_vqs, vq_pairs;
>> +       struct device *dma_dev;
>> +       struct pds_vdpa_hw *hw;
>> +       struct pci_dev *pdev;
>> +       struct device *dev;
>> +       u8 mac[ETH_ALEN];
>> +       int err;
>> +       int i;
>> +
>> +       vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
>> +       dev = &vdpa_aux->padev->aux_dev.dev;
>> +       mgmt = &vdpa_aux->vdpa_mdev;
>> +
>> +       if (vdpa_aux->pdsv) {
>> +               dev_warn(dev, "Multiple vDPA devices on a VF is not supported.\n");
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>> +       pdsv = vdpa_alloc_device(struct pds_vdpa_device, vdpa_dev,
>> +                                dev, &pds_vdpa_ops, 1, 1, name, false);
>> +       if (IS_ERR(pdsv)) {
>> +               dev_err(dev, "Failed to allocate vDPA structure: %pe\n", pdsv);
>> +               return PTR_ERR(pdsv);
>> +       }
>> +
>> +       vdpa_aux->pdsv = pdsv;
>> +       pdsv->vdpa_aux = vdpa_aux;
>> +       pdsv->vdpa_aux->padev->priv = pdsv;
>> +
>> +       pdev = vdpa_aux->vdpa_vf->pdev;
>> +       pdsv->vdpa_dev.dma_dev = &pdev->dev;
>> +       dma_dev = pdsv->vdpa_dev.dma_dev;
>> +       hw = &pdsv->hw;
>> +
>> +       pdsv->vn_config_pa = dma_map_single(dma_dev, &pdsv->vn_config,
>> +                                           sizeof(pdsv->vn_config), DMA_FROM_DEVICE);
> 
> I think we should use coherent mapping instead of streaming mapping
> otherwise we may end up with coherency issues when accessing the
> device configuration space.

Makes sense

> 
>> +       if (dma_mapping_error(dma_dev, pdsv->vn_config_pa)) {
>> +               dev_err(dma_dev, "Failed to map vn_config space\n");
>> +               pdsv->vn_config_pa = 0;
>> +               err = -ENOMEM;
>> +               goto err_out;
>> +       }
>> +
>> +       err = pds_vdpa_init_hw(pdsv);
>> +       if (err) {
>> +               dev_err(dev, "Failed to init hw: %pe\n", ERR_PTR(err));
>> +               goto err_unmap;
>> +       }
>> +
>> +       fw_max_vqs = le16_to_cpu(pdsv->vdpa_aux->ident.max_vqs);
>> +       vq_pairs = fw_max_vqs / 2;
>> +
>> +       /* Make sure we have the queues being requested */
>> +       if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MAX_VQP))
>> +               vq_pairs = add_config->net.max_vq_pairs;
>> +
>> +       hw->num_vqs = 2 * vq_pairs;
>> +       if (mgmt->supported_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> +               hw->num_vqs++;
>> +
>> +       if (hw->num_vqs > fw_max_vqs) {
>> +               dev_err(dev, "%s: queue count requested %u greater than max %u\n",
>> +                        __func__, hw->num_vqs, fw_max_vqs);
>> +               err = -ENOSPC;
>> +               goto err_unmap;
>> +       }
>> +
>> +       if (hw->num_vqs != fw_max_vqs) {
>> +               err = pds_vdpa_cmd_set_max_vq_pairs(pdsv, vq_pairs);
>> +               if (err == -ERANGE) {
>> +                       hw->num_vqs = fw_max_vqs;
>> +                       dev_warn(dev, "Known FW issue - overriding to use max_vq_pairs %d\n",
>> +                                hw->num_vqs / 2);
> 
> Should we fail here? Since the device has a different max_vqp that expected.

Wasn't sure if we should annoy users with a fail here, or try to adjust 
and continue on with something that should work.

> 
>> +               } else if (err) {
>> +                       dev_err(dev, "Failed to update max_vq_pairs: %pe\n",
>> +                               ERR_PTR(err));
>> +                       goto err_unmap;
>> +               }
>> +       }
>> +
>> +       /* Set a mac, either from the user config if provided
>> +        * or set a random mac if default is 00:..:00
>> +        */
>> +       if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) {
>> +               ether_addr_copy(mac, add_config->net.mac);
>> +               pds_vdpa_cmd_set_mac(pdsv, mac);
>> +       } else if (is_zero_ether_addr(pdsv->vn_config.mac)) {
>> +               eth_random_addr(mac);
>> +               pds_vdpa_cmd_set_mac(pdsv, mac);
>> +       }
>> +
>> +       for (i = 0; i < hw->num_vqs; i++) {
>> +               hw->vqs[i].qid = i;
>> +               hw->vqs[i].pdsv = pdsv;
>> +               hw->vqs[i].intr_index = VIRTIO_MSI_NO_VECTOR;
> 
> Let's rename this as msix_vector to be aligned with the virtio spec.

Sure.

> 
>> +               hw->vqs[i].notify = vp_modern_map_vq_notify(&pdsv->vdpa_aux->vdpa_vf->vd_mdev,
>> +                                                           i, &hw->vqs[i].notify_pa);
>> +       }
>> +
>> +       pdsv->vdpa_dev.mdev = &vdpa_aux->vdpa_mdev;
>> +
>> +       /* We use the _vdpa_register_device() call rather than the
>> +        * vdpa_register_device() to avoid a deadlock because this
>> +        * dev_add() is called with the vdpa_dev_lock already set
>> +        * by vdpa_nl_cmd_dev_add_set_doit()
>> +        */
>> +       err = _vdpa_register_device(&pdsv->vdpa_dev, hw->num_vqs);
>> +       if (err) {
>> +               dev_err(dev, "Failed to register to vDPA bus: %pe\n", ERR_PTR(err));
>> +               goto err_unmap;
>> +       }
>> +
>> +       pds_vdpa_debugfs_add_vdpadev(pdsv);
>> +       dev_info(&pdsv->vdpa_dev.dev, "Added with mac %pM\n", pdsv->vn_config.mac);
> 
> dev_dbg?

That was the eventual idea.

> 
>> +
>> +       return 0;
>> +
>> +err_unmap:
>> +       dma_unmap_single(dma_dev, pdsv->vn_config_pa,
>> +                        sizeof(pdsv->vn_config), DMA_FROM_DEVICE);
>> +err_out:
>> +       put_device(&pdsv->vdpa_dev.dev);
>> +       vdpa_aux->pdsv = NULL;
>> +       return err;
>> +}
>> +
>> +static void
>> +pds_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *vdpa_dev)
>> +{
>> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
>> +       struct pds_vdpa_aux *vdpa_aux;
>> +
>> +       dev_info(&vdpa_dev->dev, "Removed\n");
>> +
>> +       vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
>> +       _vdpa_unregister_device(vdpa_dev);
>> +       pds_vdpa_debugfs_del_vdpadev(pdsv);
>> +
>> +       if (vdpa_aux->pdsv->vn_config_pa)
>> +               dma_unmap_single(vdpa_dev->dma_dev, vdpa_aux->pdsv->vn_config_pa,
>> +                                sizeof(vdpa_aux->pdsv->vn_config), DMA_FROM_DEVICE);
>> +
>> +       vdpa_aux->pdsv = NULL;
>> +}
>> +
>> +static const struct vdpa_mgmtdev_ops pds_vdpa_mgmt_dev_ops = {
>> +       .dev_add = pds_vdpa_dev_add,
>> +       .dev_del = pds_vdpa_dev_del
>> +};
>> +
>> +int
>> +pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux)
>> +{
>> +       struct pds_vdpa_pci_device *vdpa_pdev;
>> +       struct pds_vdpa_ident_cmd ident_cmd = {
>> +               .opcode = PDS_VDPA_CMD_IDENT,
>> +               .vf_id = cpu_to_le16(vdpa_aux->vdpa_vf->vf_id),
>> +       };
>> +       struct pds_vdpa_comp ident_comp = {0};
>> +       struct vdpa_mgmt_dev *mgmt;
>> +       struct device *dma_dev;
>> +       dma_addr_t ident_pa;
>> +       struct pci_dev *pdev;
>> +       struct device *dev;
>> +       __le64 mac_bit;
>> +       u16 max_vqs;
>> +       int err;
>> +       int i;
>> +
>> +       vdpa_pdev = vdpa_aux->vdpa_vf;
>> +       pdev = vdpa_pdev->pdev;
>> +       dev = &vdpa_aux->padev->aux_dev.dev;
>> +       mgmt = &vdpa_aux->vdpa_mdev;
>> +
>> +       /* Get resource info from the device */
>> +       dma_dev = &pdev->dev;
>> +       ident_pa = dma_map_single(dma_dev, &vdpa_aux->ident,
>> +                                 sizeof(vdpa_aux->ident), DMA_FROM_DEVICE);
> 
> I wonder how this work. The ident_pa is mapped through VF, but the
> command is sent to PF adminq if I understand correctly. If yes, this
> might work but looks tricky. We'd better explain this is safe since
> vDPA is not yet created so no userspace can use that. Or I wonder if
> we can just piggyback the ident via the adminq response so we don't
> need to worry the security implications.

I'll work with our FW folks on this.

Thanks for all the comments,
sln

> 
> Thanks
> 
>> +       if (dma_mapping_error(dma_dev, ident_pa)) {
>> +               dev_err(dma_dev, "Failed to map ident space\n");
>> +               return -ENOMEM;
>> +       }
>> +
>> +       ident_cmd.ident_pa = cpu_to_le64(ident_pa);
>> +       ident_cmd.len = cpu_to_le32(sizeof(vdpa_aux->ident));
>> +       err = vdpa_aux->padev->ops->adminq_cmd(vdpa_aux->padev,
>> +                                              (union pds_core_adminq_cmd *)&ident_cmd,
>> +                                              sizeof(ident_cmd),
>> +                                              (union pds_core_adminq_comp *)&ident_comp,
>> +                                              0);
>> +       dma_unmap_single(dma_dev, ident_pa,
>> +                        sizeof(vdpa_aux->ident), DMA_FROM_DEVICE);
>> +       if (err) {
>> +               dev_err(dev, "Failed to ident hw, status %d: %pe\n",
>> +                       ident_comp.status, ERR_PTR(err));
>> +               return err;
>> +       }
>> +
>> +       /* The driver adds a default mac address if the device doesn't,
>> +        * so we need to sure we advertise VIRTIO_NET_F_MAC
>> +        */
>> +       mac_bit = cpu_to_le64(BIT_ULL(VIRTIO_NET_F_MAC));
>> +       if (!(vdpa_aux->ident.hw_features & mac_bit)) {
>> +               vdpa_aux->ident.hw_features |= mac_bit;
>> +               vdpa_aux->local_mac_bit = true;
>> +       }
>> +
>> +       max_vqs = le16_to_cpu(vdpa_aux->ident.max_vqs);
>> +       mgmt->max_supported_vqs = min_t(u16, PDS_VDPA_MAX_QUEUES, max_vqs);
>> +       if (max_vqs > PDS_VDPA_MAX_QUEUES)
>> +               dev_info(dev, "FYI - Device supports more vqs (%d) than driver (%d)\n",
>> +                        max_vqs, PDS_VDPA_MAX_QUEUES);
>> +
>> +       mgmt->ops = &pds_vdpa_mgmt_dev_ops;
>> +       mgmt->id_table = pds_vdpa_id_table;
>> +       mgmt->device = dev;
>> +       mgmt->supported_features = le64_to_cpu(vdpa_aux->ident.hw_features);
>> +       mgmt->config_attr_mask = BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR);
>> +       mgmt->config_attr_mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
>> +
>> +       /* Set up interrupts now that we know how many we might want
>> +        * TX and RX pairs will share interrupts, so halve the vq count
>> +        * Add another for a control queue if supported
>> +        */
>> +       vdpa_pdev->nintrs = mgmt->max_supported_vqs / 2;
>> +       if (mgmt->supported_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
>> +               vdpa_pdev->nintrs++;
>> +
>> +       err = pci_alloc_irq_vectors(pdev, vdpa_pdev->nintrs, vdpa_pdev->nintrs,
>> +                                   PCI_IRQ_MSIX);
>> +       if (err < 0) {
>> +               dev_err(dma_dev, "Couldn't get %d msix vectors: %pe\n",
>> +                       vdpa_pdev->nintrs, ERR_PTR(err));
>> +               return err;
>> +       }
>> +       vdpa_pdev->nintrs = err;
>> +       err = 0;
>> +
>> +       vdpa_pdev->intrs = devm_kcalloc(&pdev->dev, vdpa_pdev->nintrs,
>> +                                       sizeof(*vdpa_pdev->intrs),
>> +                                       GFP_KERNEL);
>> +       if (!vdpa_pdev->intrs) {
>> +               vdpa_pdev->nintrs = 0;
>> +               pci_free_irq_vectors(pdev);
>> +               return -ENOMEM;
>> +       }
>> +
>> +       for (i = 0; i < vdpa_pdev->nintrs; i++)
>> +               vdpa_pdev->intrs[i].irq = VIRTIO_MSI_NO_VECTOR;
>> +
>> +       return 0;
>> +}
>> --
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst
  2022-11-22  6:35   ` Jason Wang
  2022-11-22 22:33     ` Shannon Nelson
@ 2022-11-30  0:13     ` Shannon Nelson
  1 sibling, 0 replies; 61+ messages in thread
From: Shannon Nelson @ 2022-11-30  0:13 UTC (permalink / raw)
  To: Jason Wang, Shannon Nelson
  Cc: netdev, davem, kuba, mst, virtualization, drivers

On 11/21/22 10:35 PM, Jason Wang wrote:
> On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
>>
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
>> ---
>>   .../ethernet/pensando/pds_vdpa.rst            | 85 +++++++++++++++++++
>>   MAINTAINERS                                   |  1 +
>>   drivers/vdpa/Kconfig                          |  7 ++
>>   3 files changed, 93 insertions(+)
>>   create mode 100644 Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>>
>> diff --git a/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>> new file mode 100644
>> index 000000000000..c517f337d212
>> --- /dev/null
>> +++ b/Documentation/networking/device_drivers/ethernet/pensando/pds_vdpa.rst
>> @@ -0,0 +1,85 @@
>> +.. SPDX-License-Identifier: GPL-2.0+
>> +.. note: can be edited and viewed with /usr/bin/formiko-vim
>> +
>> +==========================================================
>> +PCI vDPA driver for the Pensando(R) DSC adapter family
>> +==========================================================
>> +
>> +Pensando vDPA VF Device Driver
>> +Copyright(c) 2022 Pensando Systems, Inc
>> +
>> +Overview
>> +========
>> +
>> +The ``pds_vdpa`` driver is a PCI and auxiliary bus driver and supplies
>> +a vDPA device for use by the virtio network stack.  It is used with
>> +the Pensando Virtual Function devices that offer vDPA and virtio queue
>> +services.  It depends on the ``pds_core`` driver and hardware for the PF
>> +and for device configuration services.
>> +
>> +Using the device
>> +================
>> +
>> +The ``pds_vdpa`` device is enabled via multiple configuration steps and
>> +depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
>> +Function devices.
>> +
>> +Shown below are the steps to bind the driver to a VF and also to the
>> +associated auxiliary device created by the ``pds_core`` driver. This
>> +example assumes the pds_core and pds_vdpa modules are already
>> +loaded.
>> +
>> +.. code-block:: bash
>> +
>> +  #!/bin/bash
>> +
>> +  modprobe pds_core
>> +  modprobe pds_vdpa
>> +
>> +  PF_BDF=`grep "vDPA.*1" /sys/kernel/debug/pds_core/*/viftypes | head -1 | awk -F / '{print $6}'`
>> +
>> +  # Enable vDPA VF auxiliary device(s) in the PF
>> +  devlink dev param set pci/$PF_BDF name enable_vnet value true cmode runtime
>> +
>> +  # Create a VF for vDPA use
>> +  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
>> +
>> +  # Find the vDPA services/devices available
>> +  PDS_VDPA_MGMT=`vdpa mgmtdev show | grep vDPA | head -1 | cut -d: -f1`
>> +
>> +  # Create a vDPA device for use in virtio network configurations
>> +  vdpa dev add name vdpa1 mgmtdev $PDS_VDPA_MGMT mac 00:11:22:33:44:55
>> +
>> +  # Set up an ethernet interface on the vdpa device
>> +  modprobe virtio_vdpa
>> +
>> +
>> +
>> +Enabling the driver
>> +===================
>> +
>> +The driver is enabled via the standard kernel configuration system,
>> +using the make command::
>> +
>> +  make oldconfig/menuconfig/etc.
>> +
>> +The driver is located in the menu structure at:
>> +
>> +  -> Device Drivers
>> +    -> Network device support (NETDEVICES [=y])
>> +      -> Ethernet driver support
>> +        -> Pensando devices
>> +          -> Pensando Ethernet PDS_VDPA Support
>> +
>> +Support
>> +=======
>> +
>> +For general Linux networking support, please use the netdev mailing
>> +list, which is monitored by Pensando personnel::
>> +
>> +  netdev@vger.kernel.org
>> +
>> +For more specific support needs, please use the Pensando driver support
>> +email::
>> +
>> +  drivers@pensando.io
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index a4f989fa8192..a4d96e854757 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -16152,6 +16152,7 @@ L:      netdev@vger.kernel.org
>>   S:     Supported
>>   F:     Documentation/networking/device_drivers/ethernet/pensando/
>>   F:     drivers/net/ethernet/pensando/
>> +F:     drivers/vdpa/pds/
>>   F:     include/linux/pds/
>>
>>   PER-CPU MEMORY ALLOCATOR
>> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
>> index 50f45d037611..1c44df18f3da 100644
>> --- a/drivers/vdpa/Kconfig
>> +++ b/drivers/vdpa/Kconfig
>> @@ -86,4 +86,11 @@ config ALIBABA_ENI_VDPA
>>            VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon
>>            virtio 0.9.5 specification.
>>
>> +config PDS_VDPA
>> +       tristate "vDPA driver for Pensando DSC devices"
>> +       select VHOST_RING
> 
> Any reason it needs to select on vringh?

Copied from an example... maybe not needed.

sln

> 
> Thanks
> 
>> +       depends on PDS_CORE
>> +       help
>> +         VDPA network driver for Pensando's PDS Core devices.
>> +
>>   endif # VDPA
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-29 17:57                 ` Shannon Nelson
@ 2022-11-30  2:02                   ` Jakub Kicinski
  2022-12-01  0:12                     ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-11-30  2:02 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On Tue, 29 Nov 2022 09:57:25 -0800 Shannon Nelson wrote:
> >> Yes, a PF representor simply so we can get access to the .ndo_set_vf_xxx
> >> interfaces.  There is no network traffic running through the PF.  
> > 
> > In that case not only have you come up with your own name for
> > a SmartNIC, you also managed to misuse one of our existing terms
> > in your own way! It can't pass any traffic it's just a dummy to hook
> > the legacy vf ndos to. It's the opposite of what a repr is.  
> 
> Sorry, this seemed to me an reasonable use of the term.  Is there an 
> alternative wording we should use for this case?
> 
> Are there other existing methods we can use for getting the VF 
> configurations from the user, or does this make sense to keep in our 
> current simple model?

Enough back and forth. I'm not going to come up with a special model
just for you when a model already exists, and you present no technical
argument against it.

I am against merging your code, if you want to override find other
vendors and senior upstream reviewers who will side with you.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-11-30  2:02                   ` Jakub Kicinski
@ 2022-12-01  0:12                     ` Shannon Nelson
  2022-12-01  3:45                       ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-12-01  0:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On 11/29/22 6:02 PM, Jakub Kicinski wrote:
> On Tue, 29 Nov 2022 09:57:25 -0800 Shannon Nelson wrote:
>>>> Yes, a PF representor simply so we can get access to the .ndo_set_vf_xxx
>>>> interfaces.  There is no network traffic running through the PF.
>>>
>>> In that case not only have you come up with your own name for
>>> a SmartNIC, you also managed to misuse one of our existing terms
>>> in your own way! It can't pass any traffic it's just a dummy to hook
>>> the legacy vf ndos to. It's the opposite of what a repr is.
>>
>> Sorry, this seemed to me an reasonable use of the term.  Is there an
>> alternative wording we should use for this case?
>>
>> Are there other existing methods we can use for getting the VF
>> configurations from the user, or does this make sense to keep in our
>> current simple model?
> 
> Enough back and forth. I'm not going to come up with a special model
> just for you when a model already exists, and you present no technical
> argument against it.
> 
> I am against merging your code, if you want to override find other
> vendors and senior upstream reviewers who will side with you.

We're not asking for a special model, just to use the PF interface to 
configure VFs as has been the practice in the past.

Anyway, this feature can wait and we can work out alternatives later. 
For now, we'll drop the netdev portion from the driver and rework the 
other bits as discussed in other messages.  I'll likely have a v2 for 
comments sometime next week.

Thanks for your help,
sln

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-12-01  0:12                     ` Shannon Nelson
@ 2022-12-01  3:45                       ` Jakub Kicinski
  2022-12-01 19:19                         ` Shannon Nelson
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Kicinski @ 2022-12-01  3:45 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On Wed, 30 Nov 2022 16:12:23 -0800 Shannon Nelson wrote:
> > Enough back and forth. I'm not going to come up with a special model
> > just for you when a model already exists, and you present no technical
> > argument against it.
> > 
> > I am against merging your code, if you want to override find other
> > vendors and senior upstream reviewers who will side with you.  
> 
> We're not asking for a special model, just to use the PF interface to 
> configure VFs as has been the practice in the past.

It simply does not compute for me. You're exposing a very advanced vDPA
interface, and yet you say you don't need any network configuration
beyond what Niantic had.

There are no upstream-minded users of IPUs, if it was up to me I'd flat
out ban them from the kernel.

> Anyway, this feature can wait and we can work out alternatives later. 
> For now, we'll drop the netdev portion from the driver and rework the 
> other bits as discussed in other messages.  I'll likely have a v2 for 
> comments sometime next week.

Seems reasonable, if it doesn't live in networking and doesn't use any
networking APIs it won't matter to me.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-12-01  3:45                       ` Jakub Kicinski
@ 2022-12-01 19:19                         ` Shannon Nelson
  2022-12-01 22:29                           ` Jakub Kicinski
  0 siblings, 1 reply; 61+ messages in thread
From: Shannon Nelson @ 2022-12-01 19:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On 11/30/22 7:45 PM, Jakub Kicinski wrote:
> On Wed, 30 Nov 2022 16:12:23 -0800 Shannon Nelson wrote:
>>
>> We're not asking for a special model, just to use the PF interface to
>> configure VFs as has been the practice in the past.
> 
> It simply does not compute for me. You're exposing a very advanced vDPA
> interface, and yet you say you don't need any network configuration
> beyond what Niantic had.

Would you have the same responses if we were trying to do this same kind 
of PF netdev on a simple Niantic-like device (simple sr-iov support, 
little filtering capability)?

> 
> There are no upstream-minded users of IPUs, if it was up to me I'd flat
> out ban them from the kernel.

Yeah, there's a lot of hidden magic going on behind the PCI devices 
presented to the host, and a lot of it depends on the use cases 
attempting to be addressed by the different product vendors and their 
various cloud and enterprise customers.  I tend to think that the most 
friction here comes from us being more familiar and comfortable with the 
enterprise use cases where we typically own the whole host, and not so 
comfortable these newer cloud use cases with control and configuration 
coming from outside the host.

sln


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 08/19] pds_core: initial VF configuration
  2022-12-01 19:19                         ` Shannon Nelson
@ 2022-12-01 22:29                           ` Jakub Kicinski
  0 siblings, 0 replies; 61+ messages in thread
From: Jakub Kicinski @ 2022-12-01 22:29 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, mst, jasowang, virtualization, drivers

On Thu, 1 Dec 2022 11:19:51 -0800 Shannon Nelson wrote:
> > It simply does not compute for me. You're exposing a very advanced vDPA
> > interface, and yet you say you don't need any network configuration
> > beyond what Niantic had.  
> 
> Would you have the same responses if we were trying to do this same kind 
> of PF netdev on a simple Niantic-like device (simple sr-iov support, 
> little filtering capability)?

It is really hard for me to imagine someone building a Niantic-like
device today.

Recently I was thought-experiment-designing simplest Niantic-like device
for container workloads. And my conclusion was that yes, TC would
probably be the best way to control forwarding. (Sorry not really an
answer to your question, I don't know of any real Niantics of the day)

> > There are no upstream-minded users of IPUs, if it was up to me I'd flat
> > out ban them from the kernel.  
> 
> Yeah, there's a lot of hidden magic going on behind the PCI devices 
> presented to the host, and a lot of it depends on the use cases 
> attempting to be addressed by the different product vendors and their 
> various cloud and enterprise customers.  I tend to think that the most 
> friction here comes from us being more familiar and comfortable with the 
> enterprise use cases where we typically own the whole host, and not so 
> comfortable these newer cloud use cases with control and configuration 
> coming from outside the host.

I know about cloud as much as I know about enterprise, being a Meta's
employee. But those who do have public clouds seem to develop all the
meaningful tech behind closed doors, under NDAs. And at best "bless" 
us with a code dump which is under an open source license.

The community is where various developers should come together and
design together. If you do a full design internally and then come
upstream to just ship the code then it's a SW distribution channel,
not an open source project. That's what I am not comfortable with.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces
  2022-11-30  0:11     ` Shannon Nelson
@ 2022-12-05  7:40       ` Jason Wang
  0 siblings, 0 replies; 61+ messages in thread
From: Jason Wang @ 2022-12-05  7:40 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shannon Nelson, netdev, davem, kuba, mst, virtualization, drivers

On Wed, Nov 30, 2022 at 8:11 AM Shannon Nelson <shnelson@amd.com> wrote:
>
> On 11/21/22 10:32 PM, Jason Wang wrote:
> > On Sat, Nov 19, 2022 at 6:57 AM Shannon Nelson <snelson@pensando.io> wrote:
> >>
> >> This is the vDPA device support, where we advertise that we can
> >> support the virtio queues and deal with the configuration work
> >> through the pds_core's adminq.
> >>
> >> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> >> ---
> >>   drivers/vdpa/pds/Makefile   |   3 +-
> >>   drivers/vdpa/pds/aux_drv.c  |  33 ++
> >>   drivers/vdpa/pds/debugfs.c  | 167 ++++++++
> >>   drivers/vdpa/pds/debugfs.h  |   4 +
> >>   drivers/vdpa/pds/vdpa_dev.c | 796 ++++++++++++++++++++++++++++++++++++
> >>   5 files changed, 1002 insertions(+), 1 deletion(-)
> >>   create mode 100644 drivers/vdpa/pds/vdpa_dev.c
> >>
> >> diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
> >> index fafd356ddf86..7fde4a4a1620 100644
> >> --- a/drivers/vdpa/pds/Makefile
> >> +++ b/drivers/vdpa/pds/Makefile
> >> @@ -7,4 +7,5 @@ pds_vdpa-y := aux_drv.o \
> >>                cmds.o \
> >>                pci_drv.o \
> >>                debugfs.o \
> >> -             virtio_pci.o
> >> +             virtio_pci.o \
> >> +             vdpa_dev.o
> >> diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
> >> index aef3c984dc90..83b9a5a79325 100644
> >> --- a/drivers/vdpa/pds/aux_drv.c
> >> +++ b/drivers/vdpa/pds/aux_drv.c
> >> @@ -12,6 +12,7 @@
> >>   #include <linux/pds/pds_vdpa.h>
> >>
> >>   #include "aux_drv.h"
> >> +#include "vdpa_dev.h"
> >>   #include "pci_drv.h"
> >>   #include "debugfs.h"
> >>
> >> @@ -25,10 +26,25 @@ static void
> >>   pds_vdpa_aux_notify_handler(struct pds_auxiliary_dev *padev,
> >>                              union pds_core_notifyq_comp *event)
> >>   {
> >> +       struct pds_vdpa_device *pdsv = padev->priv;
> >>          struct device *dev = &padev->aux_dev.dev;
> >>          u16 ecode = le16_to_cpu(event->ecode);
> >>
> >>          dev_info(dev, "%s: event code %d\n", __func__, ecode);
> >> +
> >> +       /* Give the upper layers a hint that something interesting
> >> +        * may have happened.  It seems that the only thing this
> >> +        * triggers in the virtio-net drivers above us is a check
> >> +        * of link status.
> >> +        *
> >> +        * We don't set the NEEDS_RESET flag for EVENT_RESET
> >> +        * because we're likely going through a recovery or
> >> +        * fw_update and will be back up and running soon.
> >> +        */
> >> +       if (ecode == PDS_EVENT_RESET || ecode == PDS_EVENT_LINK_CHANGE) {
> >> +               if (pdsv->hw.config_cb.callback)
> >> +                       pdsv->hw.config_cb.callback(pdsv->hw.config_cb.private);
> >> +       }
> >>   }
> >>
> >>   static int
> >> @@ -80,10 +96,25 @@ pds_vdpa_aux_probe(struct auxiliary_device *aux_dev,
> >>                  goto err_register_client;
> >>          }
> >>
> >> +       /* Get device ident info and set up the vdpa_mgmt_dev */
> >> +       err = pds_vdpa_get_mgmt_info(vdpa_aux);
> >> +       if (err)
> >> +               goto err_register_client;
> >> +
> >> +       /* Let vdpa know that we can provide devices */
> >> +       err = vdpa_mgmtdev_register(&vdpa_aux->vdpa_mdev);
> >> +       if (err) {
> >> +               dev_err(dev, "%s: Failed to initialize vdpa_mgmt interface: %pe\n",
> >> +                       __func__, ERR_PTR(err));
> >> +               goto err_mgmt_reg;
> >> +       }
> >> +
> >>          pds_vdpa_debugfs_add_ident(vdpa_aux);
> >>
> >>          return 0;
> >>
> >> +err_mgmt_reg:
> >> +       padev->ops->unregister_client(padev);
> >>   err_register_client:
> >>          auxiliary_set_drvdata(aux_dev, NULL);
> >>   err_invalid_driver:
> >> @@ -98,6 +129,8 @@ pds_vdpa_aux_remove(struct auxiliary_device *aux_dev)
> >>          struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
> >>          struct device *dev = &aux_dev->dev;
> >>
> >> +       vdpa_mgmtdev_unregister(&vdpa_aux->vdpa_mdev);
> >> +
> >>          vdpa_aux->padev->ops->unregister_client(vdpa_aux->padev);
> >>          if (vdpa_aux->vdpa_vf)
> >>                  pci_dev_put(vdpa_aux->vdpa_vf->pdev);
> >> diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
> >> index f766412209df..aa3143126a7e 100644
> >> --- a/drivers/vdpa/pds/debugfs.c
> >> +++ b/drivers/vdpa/pds/debugfs.c
> >> @@ -11,6 +11,7 @@
> >>   #include <linux/pds/pds_auxbus.h>
> >>   #include <linux/pds/pds_vdpa.h>
> >>
> >> +#include "vdpa_dev.h"
> >>   #include "aux_drv.h"
> >>   #include "pci_drv.h"
> >>   #include "debugfs.h"
> >> @@ -19,6 +20,72 @@
> >>
> >>   static struct dentry *dbfs_dir;
> >>
> >> +#define PRINT_SBIT_NAME(__seq, __f, __name)                     \
> >> +       do {                                                    \
> >> +               if (__f & __name)                               \
> >> +                       seq_printf(__seq, " %s", &#__name[16]); \
> >> +       } while (0)
> >> +
> >> +static void
> >> +print_status_bits(struct seq_file *seq, u16 status)
> >> +{
> >> +       seq_puts(seq, "status:");
> >> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> >> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER);
> >> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER_OK);
> >> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FEATURES_OK);
> >> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_NEEDS_RESET);
> >> +       PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FAILED);
> >> +       seq_puts(seq, "\n");
> >> +}
> >> +
> >> +#define PRINT_FBIT_NAME(__seq, __f, __name)                \
> >> +       do {                                               \
> >> +               if (__f & BIT_ULL(__name))                 \
> >> +                       seq_printf(__seq, " %s", #__name); \
> >> +       } while (0)
> >> +
> >> +static void
> >> +print_feature_bits(struct seq_file *seq, u64 features)
> >> +{
> >> +       seq_puts(seq, "features:");
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CSUM);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_CSUM);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MTU);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MAC);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO4);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_TSO6);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ECN);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_UFO);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO4);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_TSO6);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_ECN);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HOST_UFO);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MRG_RXBUF);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STATUS);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VQ);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_VLAN);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_RX_EXTRA);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_GUEST_ANNOUNCE);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_MQ);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_CTRL_MAC_ADDR);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_HASH_REPORT);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSS);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_RSC_EXT);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_STANDBY);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_NET_F_SPEED_DUPLEX);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_NOTIFY_ON_EMPTY);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ANY_LAYOUT);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_VERSION_1);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ACCESS_PLATFORM);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_RING_PACKED);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_ORDER_PLATFORM);
> >> +       PRINT_FBIT_NAME(seq, features, VIRTIO_F_SR_IOV);
> >> +       seq_puts(seq, "\n");
> >> +}
> >> +
> >>   void
> >>   pds_vdpa_debugfs_create(void)
> >>   {
> >> @@ -49,10 +116,18 @@ static int
> >>   identity_show(struct seq_file *seq, void *v)
> >>   {
> >>          struct pds_vdpa_aux *vdpa_aux = seq->private;
> >> +       struct vdpa_mgmt_dev *mgmt;
> >>
> >>          seq_printf(seq, "aux_dev:            %s\n",
> >>                     dev_name(&vdpa_aux->padev->aux_dev.dev));
> >>
> >> +       mgmt = &vdpa_aux->vdpa_mdev;
> >> +       seq_printf(seq, "max_vqs:            %d\n", mgmt->max_supported_vqs);
> >> +       seq_printf(seq, "config_attr_mask:   %#llx\n", mgmt->config_attr_mask);
> >> +       seq_printf(seq, "supported_features: %#llx\n", mgmt->supported_features);
> >> +       print_feature_bits(seq, mgmt->supported_features);
> >> +       seq_printf(seq, "local_mac_bit:      %d\n", vdpa_aux->local_mac_bit);
> >> +
> >>          return 0;
> >>   }
> >>   DEFINE_SHOW_ATTRIBUTE(identity);
> >> @@ -64,4 +139,96 @@ pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux)
> >>                              vdpa_aux, &identity_fops);
> >>   }
> >>
> >> +static int
> >> +config_show(struct seq_file *seq, void *v)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = seq->private;
> >> +       struct virtio_net_config *vc = &pdsv->vn_config;
> >> +
> >> +       seq_printf(seq, "mac:                  %pM\n", vc->mac);
> >> +       seq_printf(seq, "max_virtqueue_pairs:  %d\n",
> >> +                  __virtio16_to_cpu(true, vc->max_virtqueue_pairs));
> >> +       seq_printf(seq, "mtu:                  %d\n", __virtio16_to_cpu(true, vc->mtu));
> >> +       seq_printf(seq, "speed:                %d\n", le32_to_cpu(vc->speed));
> >> +       seq_printf(seq, "duplex:               %d\n", vc->duplex);
> >> +       seq_printf(seq, "rss_max_key_size:     %d\n", vc->rss_max_key_size);
> >> +       seq_printf(seq, "rss_max_indirection_table_length: %d\n",
> >> +                  le16_to_cpu(vc->rss_max_indirection_table_length));
> >> +       seq_printf(seq, "supported_hash_types: %#x\n",
> >> +                  le32_to_cpu(vc->supported_hash_types));
> >> +       seq_printf(seq, "vn_status:            %#x\n",
> >> +                  __virtio16_to_cpu(true, vc->status));
> >> +       print_status_bits(seq, __virtio16_to_cpu(true, vc->status));
> >> +
> >> +       seq_printf(seq, "hw_status:            %#x\n", pdsv->hw.status);
> >> +       print_status_bits(seq, pdsv->hw.status);
> >> +       seq_printf(seq, "req_features:         %#llx\n", pdsv->hw.req_features);
> >> +       print_feature_bits(seq, pdsv->hw.req_features);
> >> +       seq_printf(seq, "actual_features:      %#llx\n", pdsv->hw.actual_features);
> >> +       print_feature_bits(seq, pdsv->hw.actual_features);
> >> +       seq_printf(seq, "vdpa_index:           %d\n", pdsv->hw.vdpa_index);
> >> +       seq_printf(seq, "num_vqs:              %d\n", pdsv->hw.num_vqs);
> >> +
> >> +       return 0;
> >> +}
> >> +DEFINE_SHOW_ATTRIBUTE(config);
> >> +
> >> +static int
> >> +vq_show(struct seq_file *seq, void *v)
> >> +{
> >> +       struct pds_vdpa_vq_info *vq = seq->private;
> >> +       struct pds_vdpa_intr_info *intrs;
> >> +
> >> +       seq_printf(seq, "ready:      %d\n", vq->ready);
> >> +       seq_printf(seq, "desc_addr:  %#llx\n", vq->desc_addr);
> >> +       seq_printf(seq, "avail_addr: %#llx\n", vq->avail_addr);
> >> +       seq_printf(seq, "used_addr:  %#llx\n", vq->used_addr);
> >> +       seq_printf(seq, "q_len:      %d\n", vq->q_len);
> >> +       seq_printf(seq, "qid:        %d\n", vq->qid);
> >> +
> >> +       seq_printf(seq, "doorbell:   %#llx\n", vq->doorbell);
> >> +       seq_printf(seq, "avail_idx:  %d\n", vq->avail_idx);
> >> +       seq_printf(seq, "used_idx:   %d\n", vq->used_idx);
> >> +       seq_printf(seq, "intr_index: %d\n", vq->intr_index);
> >> +
> >> +       intrs = vq->pdsv->vdpa_aux->vdpa_vf->intrs;
> >> +       seq_printf(seq, "irq:        %d\n", intrs[vq->intr_index].irq);
> >> +       seq_printf(seq, "irq-name:   %s\n", intrs[vq->intr_index].name);
> >> +
> >> +       seq_printf(seq, "hw_qtype:   %d\n", vq->hw_qtype);
> >> +       seq_printf(seq, "hw_qindex:  %d\n", vq->hw_qindex);
> >> +
> >> +       return 0;
> >> +}
> >> +DEFINE_SHOW_ATTRIBUTE(vq);
> >> +
> >> +void
> >> +pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv)
> >> +{
> >> +       struct dentry *dentry;
> >> +       const char *name;
> >> +       int i;
> >> +
> >> +       dentry = pdsv->vdpa_aux->vdpa_vf->dentry;
> >> +       name = dev_name(&pdsv->vdpa_dev.dev);
> >> +
> >> +       pdsv->dentry = debugfs_create_dir(name, dentry);
> >> +
> >> +       debugfs_create_file("config", 0400, pdsv->dentry, pdsv, &config_fops);
> >> +
> >> +       for (i = 0; i < pdsv->hw.num_vqs; i++) {
> >> +               char name[8];
> >> +
> >> +               snprintf(name, sizeof(name), "vq%02d", i);
> >> +               debugfs_create_file(name, 0400, pdsv->dentry, &pdsv->hw.vqs[i], &vq_fops);
> >> +       }
> >> +}
> >> +
> >> +void
> >> +pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv)
> >> +{
> >> +       debugfs_remove_recursive(pdsv->dentry);
> >> +       pdsv->dentry = NULL;
> >> +}
> >> +
> >>   #endif /* CONFIG_DEBUG_FS */
> >> diff --git a/drivers/vdpa/pds/debugfs.h b/drivers/vdpa/pds/debugfs.h
> >> index 939a4c248aac..f0567e4ee4e4 100644
> >> --- a/drivers/vdpa/pds/debugfs.h
> >> +++ b/drivers/vdpa/pds/debugfs.h
> >> @@ -13,12 +13,16 @@ void pds_vdpa_debugfs_destroy(void);
> >>   void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
> >>   void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev);
> >>   void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux);
> >> +void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv);
> >> +void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv);
> >>   #else
> >>   static inline void pds_vdpa_debugfs_create(void) { }
> >>   static inline void pds_vdpa_debugfs_destroy(void) { }
> >>   static inline void pds_vdpa_debugfs_add_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
> >>   static inline void pds_vdpa_debugfs_del_pcidev(struct pds_vdpa_pci_device *vdpa_pdev) { }
> >>   static inline void pds_vdpa_debugfs_add_ident(struct pds_vdpa_aux *vdpa_aux) { }
> >> +static inline void pds_vdpa_debugfs_add_vdpadev(struct pds_vdpa_device *pdsv) { }
> >> +static inline void pds_vdpa_debugfs_del_vdpadev(struct pds_vdpa_device *pdsv) { }
> >>   #endif
> >>
> >>   #endif /* _PDS_VDPA_DEBUGFS_H_ */
> >> diff --git a/drivers/vdpa/pds/vdpa_dev.c b/drivers/vdpa/pds/vdpa_dev.c
> >> new file mode 100644
> >> index 000000000000..824be42aff0d
> >> --- /dev/null
> >> +++ b/drivers/vdpa/pds/vdpa_dev.c
> >> @@ -0,0 +1,796 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/* Copyright(c) 2022 Pensando Systems, Inc */
> >> +
> >> +#include <linux/interrupt.h>
> >> +#include <linux/module.h>
> >> +#include <linux/pci.h>
> >> +#include <linux/sysfs.h>
> >> +#include <linux/types.h>
> >> +#include <linux/vdpa.h>
> >> +#include <uapi/linux/virtio_pci.h>
> >> +#include <uapi/linux/vdpa.h>
> >> +
> >> +#include <linux/pds/pds_intr.h>
> >> +#include <linux/pds/pds_core_if.h>
> >> +#include <linux/pds/pds_adminq.h>
> >> +#include <linux/pds/pds_auxbus.h>
> >> +#include <linux/pds/pds_vdpa.h>
> >> +
> >> +#include "vdpa_dev.h"
> >> +#include "pci_drv.h"
> >> +#include "aux_drv.h"
> >> +#include "pci_drv.h"
> >> +#include "cmds.h"
> >> +#include "debugfs.h"
> >> +
> >> +static int
> >> +pds_vdpa_setup_driver(struct pds_vdpa_device *pdsv)
> >> +{
> >> +       struct device *dev = &pdsv->vdpa_dev.dev;
> >> +       int err = 0;
> >> +       int i;
> >> +
> >> +       /* Verify all vqs[] are in ready state */
> >> +       for (i = 0; i < pdsv->hw.num_vqs; i++) {
> >> +               if (!pdsv->hw.vqs[i].ready) {
> >> +                       dev_warn(dev, "%s: qid %d not ready\n", __func__, i);
> >> +                       err = -ENOENT;
> >> +               }
> >> +       }
> >> +
> >> +       return err;
> >> +}
> >> +
> >> +static struct pds_vdpa_device *
> >> +vdpa_to_pdsv(struct vdpa_device *vdpa_dev)
> >> +{
> >> +       return container_of(vdpa_dev, struct pds_vdpa_device, vdpa_dev);
> >> +}
> >> +
> >> +static struct pds_vdpa_hw *
> >> +vdpa_to_hw(struct vdpa_device *vdpa_dev)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +
> >> +       return &pdsv->hw;
> >> +}
> >> +
> >> +static int
> >> +pds_vdpa_set_vq_address(struct vdpa_device *vdpa_dev, u16 qid,
> >> +                       u64 desc_addr, u64 driver_addr, u64 device_addr)
> >> +{
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +
> >> +       hw->vqs[qid].desc_addr = desc_addr;
> >> +       hw->vqs[qid].avail_addr = driver_addr;
> >> +       hw->vqs[qid].used_addr = device_addr;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static void
> >> +pds_vdpa_set_vq_num(struct vdpa_device *vdpa_dev, u16 qid, u32 num)
> >> +{
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +
> >> +       hw->vqs[qid].q_len = num;
> >> +}
> >> +
> >> +static void
> >> +pds_vdpa_kick_vq(struct vdpa_device *vdpa_dev, u16 qid)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +
> >> +       iowrite16(qid, pdsv->hw.vqs[qid].notify);
> >> +}
> >> +
> >> +static void
> >> +pds_vdpa_set_vq_cb(struct vdpa_device *vdpa_dev, u16 qid,
> >> +                  struct vdpa_callback *cb)
> >> +{
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +
> >> +       hw->vqs[qid].event_cb = *cb;
> >> +}
> >> +
> >> +static irqreturn_t
> >> +pds_vdpa_isr(int irq, void *data)
> >> +{
> >> +       struct pds_core_intr __iomem *intr_ctrl;
> >> +       struct pds_vdpa_device *pdsv;
> >> +       struct pds_vdpa_vq_info *vq;
> >> +
> >> +       vq = data;
> >> +       pdsv = vq->pdsv;
> >> +
> >> +       if (vq->event_cb.callback)
> >> +               vq->event_cb.callback(vq->event_cb.private);
> >> +
> >> +       /* Since we don't actually know how many vq descriptors are
> >> +        * covered in this interrupt cycle, we simply clean all the
> >> +        * credits and re-enable the interrupt.
> >> +        */
> >> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
> >> +       pds_core_intr_clean_flags(&intr_ctrl[vq->intr_index],
> >> +                                 PDS_CORE_INTR_CRED_REARM);
> >> +
> >> +       return IRQ_HANDLED;
> >> +}
> >> +
> >> +static void
> >> +pds_vdpa_release_irq(struct pds_vdpa_device *pdsv, int qid)
> >> +{
> >> +       struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
> >> +       struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
> >> +       struct pds_core_intr __iomem *intr_ctrl;
> >> +       int index;
> >> +
> >> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
> >> +       index = pdsv->hw.vqs[qid].intr_index;
> >> +       if (index == VIRTIO_MSI_NO_VECTOR)
> >> +               return;
> >> +
> >> +       if (intrs[index].irq == VIRTIO_MSI_NO_VECTOR)
> >> +               return;
> >> +
> >> +       if (qid & 0x1) {
> >> +               pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
> >> +       } else {
> >> +               pds_core_intr_mask(&intr_ctrl[index], PDS_CORE_INTR_MASK_SET);
> >> +               devm_free_irq(&pdev->dev, intrs[index].irq, &pdsv->hw.vqs[qid]);
> >> +               pdsv->hw.vqs[qid].intr_index = VIRTIO_MSI_NO_VECTOR;
> >> +               intrs[index].irq = VIRTIO_MSI_NO_VECTOR;
> >> +       }
> >> +}
> >> +
> >> +static void
> >> +pds_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev, u16 qid, bool ready)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +       struct pci_dev *pdev = pdsv->vdpa_aux->vdpa_vf->pdev;
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +       struct device *dev = &pdsv->vdpa_dev.dev;
> >> +       struct pds_core_intr __iomem *intr_ctrl;
> >> +       int err;
> >> +
> >> +       dev_dbg(dev, "%s: qid %d ready %d => %d\n",
> >> +                __func__, qid, hw->vqs[qid].ready, ready);
> >> +       if (ready == hw->vqs[qid].ready)
> >> +               return;
> >> +
> >> +       intr_ctrl = (struct pds_core_intr __iomem *)pdsv->vdpa_aux->vdpa_vf->vd_mdev.isr;
> >
> > It looks to me pds has a different layout/semantic for isr than virtio
> > spec. I'd suggest to not reuse spec isr here to avoid confusion.
>
> Hmm, yes, that needs some straightening out.
>
> >
> >> +       if (ready) {
> >
> > Spec said no interrupt before DRIVER_OK, it looks more simple if we
> > move the interrupt setup to set_status():
> >
> > E.g we can know if we have sufficient vectors and use different
> > mapping policies in advance.
>
> I'll look at that.
>
> >
> >> +               struct pds_vdpa_intr_info *intrs = pdsv->vdpa_aux->vdpa_vf->intrs;
> >> +               int index = VIRTIO_MSI_NO_VECTOR;
> >> +               int i;
> >> +
> >> +               /*  Tx and Rx queues share interrupts, and they start with
> >> +                *  even numbers, so only find an interrupt for the even numbered
> >> +                *  qid, and let the odd number use what the previous queue got.
> >> +                */
> >> +               if (qid & 0x1) {
> >> +                       int even = qid & ~0x1;
> >> +
> >> +                       index = hw->vqs[even].intr_index;
> >> +               } else {
> >> +                       for (i = 0; i < pdsv->vdpa_aux->vdpa_vf->nintrs; i++) {
> >> +                               if (intrs[i].irq == VIRTIO_MSI_NO_VECTOR) {
> >> +                                       index = i;
> >> +                                       break;
> >> +                               }
> >> +                       }
> >> +               }
> >> +
> >> +               if (qid & 0x1) {
> >> +                       hw->vqs[qid].intr_index = index;
> >> +               } else if (index != VIRTIO_MSI_NO_VECTOR) {
> >> +                       int irq;
> >> +
> >> +                       irq = pci_irq_vector(pdev, index);
> >> +                       snprintf(intrs[index].name, sizeof(intrs[index].name),
> >> +                                "vdpa-%s-%d", dev_name(dev), index);
> >> +
> >> +                       err = devm_request_irq(&pdev->dev, irq, pds_vdpa_isr, 0,
> >> +                                              intrs[index].name, &hw->vqs[qid]);
> >> +                       if (err) {
> >> +                               dev_info(dev, "%s: no irq for qid %d: %pe\n",
> >> +                                        __func__, qid, ERR_PTR(err));
> >
> > Should we fail?
> >
> >> +                       } else {
> >> +                               intrs[index].irq = irq;
> >> +                               hw->vqs[qid].intr_index = index;
> >> +                               pds_core_intr_mask(&intr_ctrl[index],
> >> +                                                  PDS_CORE_INTR_MASK_CLEAR);
> >
> > I guess the reason that you don't simply use VF MSI-X is the DPU can
> > support vDPA subdevice in the future?
> >
> >> +                       }
> >> +               } else {
> >> +                       dev_info(dev, "%s: no intr slot for qid %d\n",
> >> +                                __func__, qid);
> >
> > Do we need to fail here?
> >
> >> +               }
> >> +
> >> +               /* Pass vq setup info to DSC */
> >> +               err = pds_vdpa_cmd_init_vq(pdsv, qid, &hw->vqs[qid]);
> >> +               if (err) {
> >> +                       pds_vdpa_release_irq(pdsv, qid);
> >> +                       ready = false;
> >> +               }
> >> +       } else {
> >> +               pds_vdpa_release_irq(pdsv, qid);
> >> +               (void) pds_vdpa_cmd_reset_vq(pdsv, qid);
> >> +       }
> >> +
> >> +       hw->vqs[qid].ready = ready;
> >> +}
> >> +
> >> +static bool
> >> +pds_vdpa_get_vq_ready(struct vdpa_device *vdpa_dev, u16 qid)
> >> +{
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +
> >> +       return hw->vqs[qid].ready;
> >> +}
> >> +
> >> +static int
> >> +pds_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> >> +                     const struct vdpa_vq_state *state)
> >> +{
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +
> >> +       hw->vqs[qid].used_idx = state->split.avail_index;
> >> +       hw->vqs[qid].avail_idx = state->split.avail_index;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static int
> >> +pds_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> >> +                     struct vdpa_vq_state *state)
> >> +{
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +
> >> +       state->split.avail_index = hw->vqs[qid].avail_idx;
> >
> > Who is in charge of reading avail_idx from the hardware?
>
> We didn't have that available in the early FW, so it isn't here yet.
> Work in progerss.
>
> >
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static struct vdpa_notification_area
> >> +pds_vdpa_get_vq_notification(struct vdpa_device *vdpa_dev, u16 qid)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +       struct virtio_pci_modern_device *vd_mdev;
> >> +       struct vdpa_notification_area area;
> >> +
> >> +       area.addr = hw->vqs[qid].notify_pa;
> >> +
> >> +       vd_mdev = &pdsv->vdpa_aux->vdpa_vf->vd_mdev;
> >> +       if (!vd_mdev->notify_offset_multiplier)
> >> +               area.size = PAGE_SIZE;
> >> +       else
> >> +               area.size = vd_mdev->notify_offset_multiplier;
> >> +
> >> +       return area;
> >> +}
> >> +
> >> +static int
> >> +pds_vdpa_get_vq_irq(struct vdpa_device *vdpa_dev, u16 qid)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +       int irq = VIRTIO_MSI_NO_VECTOR;
> >> +       int index;
> >> +
> >> +       if (pdsv->vdpa_aux->vdpa_vf->intrs) {
> >> +               index = hw->vqs[qid].intr_index;
> >> +               irq = pdsv->vdpa_aux->vdpa_vf->intrs[index].irq;
> >
> > The notification area mapping might only work well when each vq has
> > it's own irq. Otherwise guest may see spurious interrupt which may
> > degrade the performance.
>
> We haven't been expecting to use shared interrupts - are we being overly
> optimistic?

So at least from the codes above, I think we may end up with e.g two
queues that are using the same irq? And the comment said:

               /*  Tx and Rx queues share interrupts, and they start with
                 *  even numbers, so only find an interrupt for the
even numbered
                 *  qid, and let the odd number use what the previous queue got.
                 */
                if (qid & 0x1) {
                        int even = qid & ~0x1;

                index = hw->vqs[even].intr_index;

It said TX and RX share interrupts.

>
>
> >
> >> +       }
> >> +
> >> +       return irq;
> >> +}
> >> +
> >> +static u32
> >> +pds_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
> >> +{
> >> +
> >> +       return PAGE_SIZE;
> >> +}
> >> +
> >> +static u32
> >> +pds_vdpa_get_vq_group(struct vdpa_device *vdpa_dev, u16 idx)
> >> +{
> >> +       return 0;
> >> +}
> >> +
> >> +static u64
> >> +pds_vdpa_get_device_features(struct vdpa_device *vdpa_dev)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +
> >> +       return le64_to_cpu(pdsv->vdpa_aux->ident.hw_features);
> >> +}
> >> +
> >> +static int
> >> +pds_vdpa_set_driver_features(struct vdpa_device *vdpa_dev, u64 features)
> >> +{
> >> +       struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
> >> +       struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);
> >> +       struct device *dev = &pdsv->vdpa_dev.dev;
> >> +       u64 nego_features;
> >> +       u64 set_features;
> >> +       u64 missing;
> >> +       int err;
> >> +
> >> +       if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) {
> >> +               dev_err(dev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n");
> >> +               return -EOPNOTSUPP;
> >
> > Should we fail the FEATURE_OK in this case and all the other below
> > error conditions?
>
> Perhaps I'm missing a nuance in the inteface... isn't that what we're
> doing by returning a non-zero status?

Kind of, but to be compliant with the spec, the subsequent get_feature
should return status without FEATURE_OK, I'm not sure this can be
guaranteed:

static u8
pds_vdpa_get_status(struct vdpa_device *vdpa_dev)
{
        struct pds_vdpa_hw *hw = vdpa_to_hw(vdpa_dev);

      return hw->status;
}

> >> +                       dev_warn(dev, "Known FW issue - overriding to use max_vq_pairs %d\n",
> >> +                                hw->num_vqs / 2);
> >
> > Should we fail here? Since the device has a different max_vqp that expected.
>
> Wasn't sure if we should annoy users with a fail here, or try to adjust
> and continue on with something that should work.

I think it's better to fail since it's the behaviour of other vDPA
devices and software virtio devices.

Thanks


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2022-12-05  7:42 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 22:56 [RFC PATCH net-next 00/19] pds core and vdpa drivers Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 01/19] pds_core: initial framework for pds_core driver Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 02/19] pds_core: add devcmd device interfaces Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 03/19] pds_core: health timer and workqueue Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 04/19] pds_core: set up device and adminq Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 05/19] pds_core: Add adminq processing and commands Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 06/19] pds_core: add FW update feature to devlink Shannon Nelson
2022-11-28 18:27   ` Jakub Kicinski
2022-11-28 22:25     ` Shannon Nelson
2022-11-28 23:33       ` Jakub Kicinski
2022-11-28 23:45         ` Shannon Nelson
2022-11-29  0:18           ` Keller, Jacob E
2022-11-29  0:13         ` Keller, Jacob E
2022-11-18 22:56 ` [RFC PATCH net-next 07/19] pds_core: set up the VIF definitions and defaults Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 08/19] pds_core: initial VF configuration Shannon Nelson
2022-11-28 18:28   ` Jakub Kicinski
2022-11-28 22:25     ` Shannon Nelson
2022-11-28 23:37       ` Jakub Kicinski
2022-11-29  0:37         ` Shannon Nelson
2022-11-29  0:55           ` Jakub Kicinski
2022-11-29  1:08             ` Shannon Nelson
2022-11-29  1:54               ` Jakub Kicinski
2022-11-29 17:57                 ` Shannon Nelson
2022-11-30  2:02                   ` Jakub Kicinski
2022-12-01  0:12                     ` Shannon Nelson
2022-12-01  3:45                       ` Jakub Kicinski
2022-12-01 19:19                         ` Shannon Nelson
2022-12-01 22:29                           ` Jakub Kicinski
2022-11-18 22:56 ` [RFC PATCH net-next 09/19] pds_core: add auxiliary_bus devices Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 10/19] pds_core: devlink params for enabling VIF support Shannon Nelson
2022-11-28 18:29   ` Jakub Kicinski
2022-11-28 22:26     ` Shannon Nelson
2022-11-28 22:57       ` Andrew Lunn
2022-11-28 23:07         ` Shannon Nelson
2022-11-28 23:29           ` Andrew Lunn
2022-11-28 23:39             ` Jakub Kicinski
2022-11-29  9:00               ` Leon Romanovsky
2022-11-29  9:13               ` Jiri Pirko
2022-11-29 17:16                 ` Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 11/19] pds_core: add the aux client API Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 12/19] pds_core: publish events to the clients Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 13/19] pds_core: Kconfig and pds_core.rst Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 14/19] pds_vdpa: Add new PCI VF device for PDS vDPA services Shannon Nelson
2022-11-22  3:53   ` Jason Wang
2022-11-29 22:24     ` Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 15/19] pds_vdpa: virtio bar setup for vdpa Shannon Nelson
2022-11-22  3:32   ` Jason Wang
2022-11-22  6:36     ` Jason Wang
2022-11-29 23:02       ` Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 16/19] pds_vdpa: add auxiliary driver Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 17/19] pds_vdpa: add vdpa config client commands Shannon Nelson
2022-11-22  6:32   ` Jason Wang
2022-11-29 23:16     ` Shannon Nelson
2022-11-18 22:56 ` [RFC PATCH net-next 18/19] pds_vdpa: add support for vdpa and vdpamgmt interfaces Shannon Nelson
2022-11-22  6:32   ` Jason Wang
2022-11-30  0:11     ` Shannon Nelson
2022-12-05  7:40       ` Jason Wang
2022-11-18 22:56 ` [RFC PATCH net-next 19/19] pds_vdpa: add Kconfig entry and pds_vdpa.rst Shannon Nelson
2022-11-22  6:35   ` Jason Wang
2022-11-22 22:33     ` Shannon Nelson
2022-11-30  0:13     ` Shannon Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).