From: Bjorn Helgaas <helgaas@kernel.org>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
Dan Williams <dan.j.williams@intel.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Ben Widawsky <ben.widawsky@intel.com>,
Chris Browy <cbrowy@avery-design.com>,
linux-acpi@vger.kernel.org, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com,
linuxarm@huawei.com, Fangjian <f.fangjian@huawei.com>
Subject: Re: [RFC PATCH v2 2/4] PCI/doe: Initial support PCI Data Object Exchange
Date: Tue, 13 Apr 2021 14:49:27 -0500 [thread overview]
Message-ID: <20210413194927.GA2241331@bjorn-Precision-5520> (raw)
In-Reply-To: <20210413160159.935663-3-Jonathan.Cameron@huawei.com>
To match subject line history:
PCI/DOE: Add Data Object Exchange support
On Wed, Apr 14, 2021 at 12:01:57AM +0800, Jonathan Cameron wrote:
> Introduced in a PCI ECN [1], DOE provides a config space
> based mailbox with standard protocol discovery. Each mailbox
> is accessed through a DOE PCIE Extended Capability.
s/PCIE//
I don't think DOE depends on anything specific to PCIe.
> A device may have 1 or more DOE mailboxes, each of which is allowed
> to support any number of protocols (some DOE protocols
s/DOE protocols/DOE protocol/
> specifications apply additional restrictions). A given protocol
> may be supported on more than one DOE mailbox on a given function.
>
> If a driver wishes to access any number of DOE instances / protocols
> it makes a single call to pcie_doe_register_all() which will find
> available DOEs, create the required infrastructure and cache the
> protocols they support. pcie_doe_find() can then retrieve a
> pointer to an appropriate DOE instance.
Maybe mention the run-time interface (pcie_doe_sync()) here as well.
Can we make that name more descriptive? It'd be nice to include a
verb.
I think we should globally s/pcie/pci/ in filenames, struct,
interfaces, config option, etc.
> Testing conducted against QEMU using:
>
> https://lore.kernel.org/qemu-devel/1612900760-7361-1-git-send-email-cbrowy@avery-design.com/
> + fix for interrupt flag mentioned in that thread and a whole load
> of hacks to exercise error paths etc.
>
> [1] https://members.pcisig.com/wg/PCI-SIG/document/14143
Can you please include the title and data of approval here? The URL
is useless except to PCI-SIG members, and the URL is likely to become
stale eventually even to them.
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>
> Thanks to Dan Williams for reviewing v1.
> Note code has changed a lot, so I may well have missed stuff in this
> log. Details on approaches to the DOE serialization discussed in
> the cover letter.
>
> Changes since v1:
> Major stuff:
> * Rework the registration of DOE. DOE instances are added to a list in
> the struct pci_dev. A single call to pcie_doe_register_all() finds
> all DOEs present on a device and caches which protocols they support.
> The lifetime of these can then all be managed alongside the
> struct pci_dev avoiding need for reference counting etc whilst allowing
> a driver not to care if different protocols are using the same DOE
> instance, or different ones. The driver issues a call to
> pcie_doe_find() to get a pointer to the first appropriate DOE.
> Note, a side effect of minimizing impact of this on struct pci_dev
> is that we can't build DOE support as a module.
> * Instead of serializing on a mutex, move to a more complex state machine
> based scheme in which the DOE config space registers are only
> accessed from a single delayed work item. The state machine uses
> two states for normal operation, IDLE and WAITING. Two states also
> exist for ABORT to indicate if it was called in response to error,
> or at startup. Interrupts result in mod_delayed_work()
> to force an immediate check on the status register.
> Message queuing is implemented with a list of tasks and kicking
> the state machine work item as needed.
> * Decide, inside the core DOE code, whether to poll based
> on if the MSI/MSIX interrupts are enabled and the DOE
> instance advertises an interrupt.
> * Define a struct pcie_doe_exchange to encapsulate header info
> with payloads etc. Specific fields used for VID and protocol +
> lengths so the DOE header can be constructed inside the DOE
> functions rather than relying on callers to do that.
> * Don't hide away devm_ stuff in calls that don't make it obvious.
>
> Minor stuff:
> * pcie_doe_init() and pcie_doe_register separation to cover the
> bits that can fail in register() and the bits that can't in init()
> * Many typos
> * Rename cap_offset field to cap
> * Use new PCI_VENDOR_ID_PCI_SIG define to avoid open coding the
> value
> * Add defines for timeouts etc.
> * Missing EXPORT_SYMBOL_GPL() added to allow DOE access from
> modules.
>
> drivers/pci/pcie/Kconfig | 8 +
> drivers/pci/pcie/Makefile | 1 +
> drivers/pci/pcie/doe.c | 590 ++++++++++++++++++++++++++++++++++
I'd move this up to drivers/pci/ since I don't think there's any
actual PCIe dependency.
> include/linux/pci.h | 3 +
> include/linux/pcie-doe.h | 85 +++++
> include/uapi/linux/pci_regs.h | 29 +-
> 6 files changed, 715 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 45a2ef702b45..1b566d08bfef 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -142,3 +142,11 @@ config PCIE_EDR
> the PCI Firmware Specification r3.2. Enable this if you want to
> support hybrid DPC model which uses both firmware and OS to
> implement DPC.
> +
> +config PCIE_DOE
> + bool
> + help
> + This enables library support PCI Data Object Exchange capability.
> + DOE provides a simple mailbox in PCI express config space that is
> + used by a number of different protocols.
> + It is defined in the Data Object Exchange ECN to PCI 5.0.
s/support PCI/support for the PCI/
s/PCI express/PCI/
> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
> index b2980db88cc0..801fdd5fbfc1 100644
> --- a/drivers/pci/pcie/Makefile
> +++ b/drivers/pci/pcie/Makefile
> @@ -13,3 +13,4 @@ obj-$(CONFIG_PCIE_PME) += pme.o
> obj-$(CONFIG_PCIE_DPC) += dpc.o
> obj-$(CONFIG_PCIE_PTM) += ptm.o
> obj-$(CONFIG_PCIE_EDR) += edr.o
> +obj-$(CONFIG_PCIE_DOE) += doe.o
> diff --git a/drivers/pci/pcie/doe.c b/drivers/pci/pcie/doe.c
> new file mode 100644
> index 000000000000..b8a27c744d09
> --- /dev/null
> +++ b/drivers/pci/pcie/doe.c
> @@ -0,0 +1,590 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Data Object Exchange ECN
> + * https://members.pcisig.com/wg/PCI-SIG/document/14143
> + *
> + * Copyright (C) 2021 Huawei
> + * Jonathan Cameron <Jonathan.Cameron@huawei.com>
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/delay.h>
> +#include <linux/jiffies.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pcie-doe.h>
> +#include <linux/workqueue.h>
> +
> +#define PCI_DOE_PROTOCOL_DISCOVERY 0
> +
> +#define PCI_DOE_BUSY_MAX_RETRIES 16
> +#define PCI_DOE_POLL_INTERVAL (HZ / 128)
> +
> +/* Timeout of 1 second from 6.xx.1 ECN - Data Object Exchange */
Unfortunate that the ECN has two 6.xx.1 sections and no 6.xx.2
section ;) I think this is from the second 6.xx.1 section, i.e., the
one that *should* have been 6.xx.2.
> +#define PCI_DOE_TIMEOUT HZ
> +
> +static irqreturn_t pcie_doe_irq(int irq, void *data)
> +{
> + struct pcie_doe *doe = data;
> + struct pci_dev *pdev = doe->pdev;
> + u32 val;
> +
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> + if (FIELD_GET(PCI_DOE_STATUS_INT_STATUS, val)) {
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_STATUS, val);
> + mod_delayed_work(system_wq, &doe->statemachine, 0);
> + return IRQ_HANDLED;
> + }
> + /* Leave the error case to be handled outside irq */
> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
> + mod_delayed_work(system_wq, &doe->statemachine, 0);
> + return IRQ_HANDLED;
> + }
> +
> + return IRQ_NONE;
> +}
> +
> +/*
> + * Only call when safe to directly access the DOE, either because no tasks
> + * yet queued, or called from doe_statemachine_work() which has exclusive
> + * access to the DOE config space.
> + */
> +static void pcie_doe_abort_start(struct pcie_doe *doe)
> +{
> + struct pci_dev *pdev = doe->pdev;
> +
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_CTRL,
> + PCI_DOE_CTRL_ABORT);
> +
> + doe->timeout_jiffies = jiffies + HZ;
> + schedule_delayed_work(&doe->statemachine, HZ);
> +}
> +
> +static int pcie_doe_send_req(struct pcie_doe *doe, struct pcie_doe_exchange *ex)
> +{
> + struct pci_dev *pdev = doe->pdev;
> + u32 val;
> + int i;
> +
> + /*
> + * Check the DOE busy bit is not set.
> + * If it is set, this could indicate someone other than Linux is
> + * using the mailbox.
Rewrap to fill 80 columns (or add blank line if you want two
paragraphs). What are the possibilities other than Linux? Do you
envision user-space usage? Is there a denial-of-service attack here
if that other user leaves the busy bit set?
> + */
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> + if (FIELD_GET(PCI_DOE_STATUS_BUSY, val))
> + return -EBUSY;
> +
> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val))
> + return -EIO;
> +
> + /* Write DOE Header */
> + val = FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_1_VID, ex->vid) |
> + FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_1_TYPE, ex->protocol);
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_WRITE, val);
> + /* Length is 2 DW of header + length of payload in DW */
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_WRITE,
> + FIELD_PREP(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH,
> + 2 + ex->request_pl_sz / 4));
> + for (i = 0; i < ex->request_pl_sz / 4; i++)
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_WRITE,
> + ex->request_pl[i]);
> +
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_CTRL, PCI_DOE_CTRL_GO);
> + /* Request is sent - now wait for poll or irq */
> + return 0;
> +}
> +
> +static int pcie_doe_recv_resp(struct pcie_doe *doe, struct pcie_doe_exchange *ex)
> +{
> + struct pci_dev *pdev = doe->pdev;
> + size_t length;
> + u32 val;
> + int i;
> +
> + /* Read the first two dwords to get the length and protocol */
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_READ, &val);
> + if ((FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_VID, val) != ex->vid) ||
> + (FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_TYPE, val) != ex->protocol)) {
> + dev_err(&pdev->dev, "Expected [VID, Protocol] = [%x, %x], got [%x, %x]\n",
pci_err() (one more below)
> + ex->vid, ex->protocol,
> + FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_VID, val),
> + FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_TYPE, val));
> + return -EIO;
> + }
> +
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_READ, 0);
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_READ, &val);
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_READ, 0);
> +
> + length = FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH, val);
> + if (length > SZ_1M)
> + return -EIO;
> +
> + /* Read the rest of the response payload */
> + for (i = 0; i < min(length, ex->response_pl_sz / 4); i++) {
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_READ,
> + &ex->response_pl[i]);
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_READ, 0);
> + }
> +
> + /* Flush excess length */
> + for (; i < length; i++) {
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_READ, &val);
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_READ, 0);
> + }
> + /* Final error check to pick up on any since Data Object Ready */
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val))
> + return -EIO;
> +
> + return min(length, ex->response_pl_sz / 4) * 4;
> +}
> +
> +static void pcie_doe_task_complete(void *private)
> +{
> + complete(private);
> +}
> +
> +/**
> + * struct pcie_doe_task - description of a query / response task
> + * @h: Head to add it to the list of outstanding tasks.
> + * @ex: The info details of the task to be done.
> + * @rv: Return value. Length of received response or error.
> + * @cb: Callback for completion of task.
> + * @private: Private data passed to callback on completion.
Some doc has terminating periods, some doesn't.
> + */
> +struct pcie_doe_task {
> + struct list_head h;
> + struct pcie_doe_exchange *ex;
> + int rv;
> + void (*cb)(void *private);
> + void *private;
> +};
> +
> +/**
> + * pcie_doe_sync() - Send a request, then wait for and receive response.
> + * @doe: DOE mailbox state structure
> + * @ex: Description of the buffers and vid, type used in this
> + * request/response pair
s/vid/Vendor ID/ (I think?)
> + * Excess data will be discarded.
> + *
> + * Return: payload in bytes on success, < 0 on error
> + */
> +int pcie_doe_sync(struct pcie_doe *doe, struct pcie_doe_exchange *ex)
> +{
> + struct pcie_doe_task task;
> + DECLARE_COMPLETION_ONSTACK(c);
> + bool only_task;
list_empty() actually returns "int".
> + /* DOE requests must be a whole number of DW */
> + if (ex->request_pl_sz % sizeof(u32))
> + return -EINVAL;
> +
> + task.ex = ex;
> + task.cb = pcie_doe_task_complete;
> + task.private = &c;
> +
> + mutex_lock(&doe->tasks_lock);
> + if (doe->dead) {
> + mutex_unlock(&doe->tasks_lock);
> + return -EIO;
> + }
> + only_task = list_empty(&doe->tasks);
> + list_add_tail(&task.h, &doe->tasks);
> + if (only_task)
> + schedule_delayed_work(&doe->statemachine, 0);
> + mutex_unlock(&doe->tasks_lock);
> + wait_for_completion(&c);
> +
> + return task.rv;
> +}
> +
> +static void doe_statemachine_work(struct work_struct *work)
> +{
> + struct delayed_work *w = to_delayed_work(work);
> + struct pcie_doe *doe = container_of(w, struct pcie_doe, statemachine);
> + struct pci_dev *pdev = doe->pdev;
> + struct pcie_doe_task *task;
> + bool abort;
> + u32 val;
> + int rc;
> +
> + mutex_lock(&doe->tasks_lock);
> + task = list_first_entry_or_null(&doe->tasks, struct pcie_doe_task, h);
> + abort = doe->abort;
> + doe->abort = false;
> + mutex_unlock(&doe->tasks_lock);
> +
> + if (abort) {
> + /*
> + * Currently only used during init - care needed if we want to generally
> + * expose pcie_doe_abort() as it would impact queries in flight.
Wrap to fit in 80 columns (more below).
> + */
> + WARN_ON(task);
> + doe->state = DOE_WAIT_ABORT;
> + pcie_doe_abort_start(doe);
> + return;
> + }
> +
> + switch (doe->state) {
> + case DOE_IDLE:
> + if (task == NULL)
> + return;
> +
> + /* Nothing currently in flight so queue a task */
> + rc = pcie_doe_send_req(doe, task->ex);
> + /*
> + * The specification does not provide any guidance on how long some other
> + * entity could keep the DOE busy, so try for 1 second then fail.
> + * Busy handling is best effort only, because there is not way of avoiding
> + * racing against another user of the DOE.
s/not way/no way/
> + */
> + if (rc == -EBUSY) {
> + doe->busy_retries++;
> + if (doe->busy_retries == PCI_DOE_BUSY_MAX_RETRIES) {
> + /* Long enough, fail this request */
> + doe->busy_retries = 0;
> + goto busy;
> + }
> + schedule_delayed_work(w, HZ / PCI_DOE_BUSY_MAX_RETRIES);
> + return;
> + }
> + if (rc)
> + goto abort;
> + doe->busy_retries = 0;
> +
> + doe->state = DOE_WAIT_RESP;
> + doe->timeout_jiffies = jiffies + HZ;
> + /* Now poll or wait for IRQ with timeout */
Capitalize "IRQ" elsewhere to match this usage.
> + if (doe->irq > 0)
> + schedule_delayed_work(w, PCI_DOE_TIMEOUT);
> + else
> + schedule_delayed_work(w, PCI_DOE_POLL_INTERVAL);
> + return;
> +
> + case DOE_WAIT_RESP:
> + /* Not possible to get here with NULL task */
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> + if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
> + rc = -EIO;
> + goto abort;
> + }
> +
> + if (!FIELD_GET(PCI_DOE_STATUS_DATA_OBJECT_READY, val)) {
> + /* If not yet at timeout reschedule otherwise abort */
> + if (time_after(jiffies, doe->timeout_jiffies)) {
> + rc = -ETIMEDOUT;
> + goto abort;
> + }
> + schedule_delayed_work(w, PCI_DOE_POLL_INTERVAL);
> + return;
> + }
> +
> + rc = pcie_doe_recv_resp(doe, task->ex);
> + if (rc < 0)
> + goto abort;
> +
> + doe->state = DOE_IDLE;
> +
> + mutex_lock(&doe->tasks_lock);
> + list_del(&task->h);
> + if (!list_empty(&doe->tasks))
> + schedule_delayed_work(w, 0);
> + mutex_unlock(&doe->tasks_lock);
> +
> + /* Set the return value to the length of received payload */
> + task->rv = rc;
> + task->cb(task->private);
> + return;
> +
> + case DOE_WAIT_ABORT:
> + case DOE_WAIT_ABORT_ON_ERR:
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_STATUS, &val);
> +
> + if (!FIELD_GET(PCI_DOE_STATUS_ERROR, val) &&
> + !FIELD_GET(PCI_DOE_STATUS_BUSY, val)) {
> + /* Back to normal state - carry on */
> + mutex_lock(&doe->tasks_lock);
> + if (!list_empty(&doe->tasks))
> + schedule_delayed_work(w, 0);
> + mutex_unlock(&doe->tasks_lock);
> +
> + /* For deliberately triggered abort, someone is waiting */
> + if (doe->state == DOE_WAIT_ABORT)
> + complete(&doe->abort_c);
> + doe->state = DOE_IDLE;
> +
> + return;
> + }
> + if (time_after(jiffies, doe->timeout_jiffies)) {
> + struct pcie_doe_task *t, *n;
> +
> + /* We are dead - abort all queued tasks */
> + dev_err(&pdev->dev, "DOE ABORT timed out\n");
> + mutex_lock(&doe->tasks_lock);
> + doe->dead = true;
> + list_for_each_entry_safe(t, n, &doe->tasks, h) {
> + t->rv = -EIO;
> + t->cb(t->private);
> + list_del(&t->h);
> + }
> +
> + mutex_unlock(&doe->tasks_lock);
> + if (doe->state == DOE_WAIT_ABORT)
> + complete(&doe->abort_c);
> + }
> + return;
> + }
> +
> +abort:
> + pcie_doe_abort_start(doe);
> + doe->state = DOE_WAIT_ABORT_ON_ERR;
> +busy:
> + mutex_lock(&doe->tasks_lock);
> + list_del(&task->h);
> + mutex_unlock(&doe->tasks_lock);
> +
> + task->rv = rc;
> + task->cb(task->private);
> + /* If we got here via busy, and the queue isn't empty then we need to go again */
> + if (doe->state == DOE_IDLE) {
> + mutex_lock(&doe->tasks_lock);
> + if (!list_empty(&doe->tasks))
> + schedule_delayed_work(w, 0);
> + mutex_unlock(&doe->tasks_lock);
> + }
> +}
> +
> +static int pcie_doe_discovery(struct pcie_doe *doe, u8 *index, u16 *vid,
> + u8 *protocol)
> +{
> + u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, *index);
> + u32 response_pl;
> + struct pcie_doe_exchange ex = {
> + .vid = PCI_VENDOR_ID_PCI_SIG,
> + .protocol = PCI_DOE_PROTOCOL_DISCOVERY,
> + .request_pl = &request_pl,
> + .request_pl_sz = sizeof(request_pl),
> + .response_pl = &response_pl,
> + .response_pl_sz = sizeof(response_pl),
> + };
> + int ret;
> +
> + ret = pcie_doe_sync(doe, &ex);
> + if (ret < 0)
> + return ret;
> +
> + if (ret != sizeof(response_pl))
> + return -EIO;
> +
> + *vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
> + *protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL, response_pl);
> + *index = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX, response_pl);
> +
> + return 0;
> +}
> +
> +static int pcie_doe_cache_protocols(struct pcie_doe *doe)
> +{
> + u8 index = 0;
> + int rc;
> +
> + /* Discovery protocol must always be supported and must report itself */
> + doe->num_prots = 1;
> + doe->prots = kzalloc(sizeof(*doe->prots) * doe->num_prots, GFP_KERNEL);
> + if (doe->prots == NULL)
> + return -ENOMEM;
> +
> + do {
> + struct pcie_doe_prot *prot, *prot_new;
> +
> + prot = &doe->prots[doe->num_prots - 1];
> + rc = pcie_doe_discovery(doe, &index, &prot->vid, &prot->type);
> + if (rc)
> + goto free_prots;
> +
> + if (index) {
> + prot_new = krealloc(doe->prots,
> + sizeof(*doe->prots) * doe->num_prots,
> + GFP_KERNEL);
> + if (prot_new == NULL) {
> + rc = -ENOMEM;
> + goto free_prots;
> + }
> + doe->prots = prot_new;
> + doe->num_prots++;
> + }
> + } while (index);
> +
> + return 0;
> +
> +free_prots:
> + kfree(doe->prots);
> + return rc;
> +}
> +
> +static void pcie_doe_init(struct pcie_doe *doe, struct pci_dev *pdev, int doe_offset)
> +{
> + mutex_init(&doe->tasks_lock);
> + init_completion(&doe->abort_c);
> + doe->cap = doe_offset;
> + doe->pdev = pdev;
> + INIT_LIST_HEAD(&doe->tasks);
> + INIT_DELAYED_WORK(&doe->statemachine, doe_statemachine_work);
> +}
> +
> +static int pcie_doe_abort(struct pcie_doe *doe)
> +{
> + reinit_completion(&doe->abort_c);
> + mutex_lock(&doe->tasks_lock);
> + doe->abort = true;
> + mutex_unlock(&doe->tasks_lock);
> + schedule_delayed_work(&doe->statemachine, 0);
> + wait_for_completion(&doe->abort_c);
> +
> + if (doe->dead)
> + return -EIO;
> +
> + return 0;
> +}
> +
> +static int pcie_doe_register(struct pcie_doe *doe)
> +{
> + struct pci_dev *pdev = doe->pdev;
> + bool poll = !pci_dev_msi_enabled(pdev);
> + int rc, irq;
> + u32 val;
> +
> + pci_read_config_dword(pdev, doe->cap + PCI_DOE_CAP, &val);
> +
> + if (!poll && FIELD_GET(PCI_DOE_CAP_INT, val)) {
> + irq = pci_irq_vector(pdev, FIELD_GET(PCI_DOE_CAP_IRQ, val));
> + if (irq < 0)
> + return irq;
> +
> + rc = request_irq(irq, pcie_doe_irq, 0, "DOE", doe);
> + if (rc)
> + return rc;
> +
> + doe->irq = irq;
> + pci_write_config_dword(pdev, doe->cap + PCI_DOE_CTRL,
> + FIELD_PREP(PCI_DOE_CTRL_INT_EN, 1));
> + }
> +
> + /* Reset the mailbox by issuing an abort */
> + rc = pcie_doe_abort(doe);
> + if (rc)
> + goto err_free_irqs;
> +
> + return 0;
> +
> +err_free_irqs:
> + if (doe->irq > 0)
> + free_irq(doe->irq, doe);
> +
> + return rc;
> +}
> +
> +static void pcie_doe_unregister(struct pcie_doe *doe)
> +{
> + if (doe->irq > 0)
> + free_irq(doe->irq, doe);
> +}
> +
> +void pcie_doe_unregister_all(struct pci_dev *pdev)
> +{
> + struct pcie_doe *doe, *next;
> +
> + list_for_each_entry_safe(doe, next, &pdev->doe_list, h) {
> + /* First halt the state machine */
> + cancel_delayed_work_sync(&doe->statemachine);
> + kfree(doe->prots);
> + pcie_doe_unregister(doe);
> + kfree(doe);
> + }
> +}
> +EXPORT_SYMBOL_GPL(pcie_doe_unregister_all);
> +
> +/**
> + * pcie_doe_register_all() - Find and register all DOE mailboxes
> + * @pdev: PCI device whose DOE mailboxes we are finding.
> + *
> + * Will locate any DOE mailboxes present on the device and cache the
> + * protocols so that pcie_doe_find() can be used to retrieve a suitable DOE
> + * instance.
> + * DOE mailboxes are available until pcie_doe_unregister_all()
> + * is called.
> + *
> + * RETURNS: 0 on success, < 0 on error
Different styles, pick one:
* Return: payload in bytes on success, < 0 on error
* RETURNS: 0 on success, < 0 on error
* RETURNS: Pointer to DOE instance, or NULL if no suitable instance available.
> + */
> +int pcie_doe_register_all(struct pci_dev *pdev)
> +{
> + struct pcie_doe *doe;
> + int pos = 0;
> + int rc;
> +
> + INIT_LIST_HEAD(&pdev->doe_list);
> +
> + /* Walk the DOE extended capabilities and add to per pci_dev list */
> + while (true) {
> +
> + pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DOE);
> + if (!pos)
> + return 0;
> +
> + doe = kzalloc(sizeof(*doe), GFP_KERNEL);
> + if (!doe) {
> + rc = -ENOMEM;
> + goto free_does;
> + }
> +
> + pcie_doe_init(doe, pdev, pos);
> + rc = pcie_doe_register(doe);
> + if (rc) {
> + kfree(doe);
> + goto free_does;
> + }
> +
> + rc = pcie_doe_cache_protocols(doe);
> + if (rc) {
> + pcie_doe_unregister(doe);
> + kfree(doe);
> + goto free_does;
> + }
> +
> + list_add(&doe->h, &pdev->doe_list);
> + }
> +
> +free_does:
> + pcie_doe_unregister_all(pdev);
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(pcie_doe_register_all);
> +
> +/**
> + * pcie_doe_find() - Find the first DOE instance that supports a given protocol
> + * @pdev: Device on which to find the DOE instance.
> + * @vid: Vendor ID.
> + * @type: Specific protocol for this vendor.
> + *
> + * RETURNS: Pointer to DOE instance, or NULL if no suitable instance available.
> + */
> +struct pcie_doe *pcie_doe_find(struct pci_dev *pdev, u16 vid, u8 type)
> +{
> + struct pcie_doe *doe;
> + int i;
> +
> + list_for_each_entry(doe, &pdev->doe_list, h) {
> + for (i = 0; i < doe->num_prots; i++)
> + if ((doe->prots[i].vid == vid) &&
> + (doe->prots[i].type == type))
> + return doe;
> + }
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pcie_doe_find);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 86c799c97b77..5d958a63e8c7 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -332,6 +332,9 @@ struct pci_dev {
> #ifdef CONFIG_PCIEPORTBUS
> struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */
> struct pci_dev *rcec; /* Associated RCEC device */
> +#endif
> +#ifdef CONFIG_PCIE_DOE
> + struct list_head doe_list; /* Data Object Exchange mailboxes */
> #endif
> u8 pcie_cap; /* PCIe capability offset */
> u8 msi_cap; /* MSI capability offset */
> diff --git a/include/linux/pcie-doe.h b/include/linux/pcie-doe.h
> new file mode 100644
> index 000000000000..794d6f090779
> --- /dev/null
> +++ b/include/linux/pcie-doe.h
> @@ -0,0 +1,85 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Data Object Exchange was added to the PCI spec as an ECN to 5.0.
> + *
> + * Copyright (C) 2021 Huawei
> + * Jonathan Cameron <Jonathan.Cameron@huawei.com>
> + */
> +
> +#include <linux/completion.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +
> +#ifndef LINUX_PCIE_DOE_H
> +#define LINUX_PCIE_DOE_H
> +
> +struct pcie_doe_prot {
> + u16 vid;
> + u8 type;
> +};
> +
> +struct workqueue_struct;
> +
> +enum pcie_doe_state {
> + DOE_IDLE,
> + DOE_WAIT_RESP,
> + DOE_WAIT_ABORT,
> + DOE_WAIT_ABORT_ON_ERR,
> +};
> +
> +struct pcie_doe_exchange {
> + u16 vid;
> + u8 protocol;
> + u32 *request_pl;
> + size_t request_pl_sz;
> + u32 *response_pl;
> + size_t response_pl_sz;
> +};
> +
> +/**
> + * struct pcie_doe - State to support use of DOE mailbox
> + * @cap: Config space offset to base of DOE capability.
> + * @pdev: PCI device that hosts this DOE.
> + * @abort_c: Completion used for initial abort handling.
> + * @irq: Interrupt used for signaling DOE ready or abort.
> + * @prots: Cache of identifiers for protocols supported.
> + * @num_prots: Size of prots cache.
> + * @h: Used for DOE instance lifetime management.
> + * @wq: Workqueue used to handle state machine and polling / timeouts
> + * @tasks: List of task in flight + pending.
> + * @tasks_lock: Protect the tasks list.
> + * @statemachine: Work item for the DOE statemachine
> + * @state: Current state of this DOE.
> + * @timeout_jiffies: 1 second after GO set.
> + * @busy_retries: Count of retry attempts
> + * @abort: Request a manual abort (e.g. on init)
> + * @dead: Used to mark a DOE for which an ABORT has timed out.
> + * Further messages will immediately be aborted with error.
> + */
> +struct pcie_doe {
> + int cap;
> + struct pci_dev *pdev;
> + struct completion abort_c;
> + int irq;
> + struct pcie_doe_prot *prots;
> + int num_prots;
> + struct list_head h;
> +
> + struct workqueue_struct *wq;
> + struct list_head tasks;
> + struct mutex tasks_lock;
> + struct delayed_work statemachine;
> + enum pcie_doe_state state;
> + unsigned long timeout_jiffies;
> + unsigned int busy_retries;
> + bool abort;
> + bool dead;
Slight preference for "unsigned int abort:1" in a struct:
https://lore.kernel.org/r/CA+55aFzKQ6Pj18TB8p4Yr0M4t+S+BsiHH=BJNmn=76-NcjTj-g@mail.gmail.com/
> +};
> +
> +int pcie_doe_register_all(struct pci_dev *pdev);
> +void pcie_doe_unregister_all(struct pci_dev *pdev);
> +struct pcie_doe *pcie_doe_find(struct pci_dev *pdev, u16 vid, u8 type);
> +
> +int pcie_doe_sync(struct pcie_doe *doe, struct pcie_doe_exchange *ex);
> +
> +#endif
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index e709ae8235e7..4d8a5fee2cdf 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -730,7 +730,8 @@
> #define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */
> #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */
> #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */
> -#define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT
> +#define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */
> +#define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE
>
> #define PCI_EXT_CAP_DSN_SIZEOF 12
> #define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40
> @@ -1092,4 +1093,30 @@
> #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0
> #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4
>
> +/* Data Object Exchange */
> +#define PCI_DOE_CAP 0x04 /* DOE Capabilities Register */
> +#define PCI_DOE_CAP_INT 0x00000001 /* Interrupt Support */
> +#define PCI_DOE_CAP_IRQ 0x00000ffe /* Interrupt Message Number */
> +#define PCI_DOE_CTRL 0x08 /* DOE Control Register */
> +#define PCI_DOE_CTRL_ABORT 0x00000001 /* DOE Abort */
> +#define PCI_DOE_CTRL_INT_EN 0x00000002 /* DOE Interrupt Enable */
> +#define PCI_DOE_CTRL_GO 0x80000000 /* DOE Go */
> +#define PCI_DOE_STATUS 0x0C /* DOE Status Register */
> +#define PCI_DOE_STATUS_BUSY 0x00000001 /* DOE Busy */
> +#define PCI_DOE_STATUS_INT_STATUS 0x00000002 /* DOE Interrupt Status */
> +#define PCI_DOE_STATUS_ERROR 0x00000004 /* DOE Error */
> +#define PCI_DOE_STATUS_DATA_OBJECT_READY 0x80000000 /* Data Object Ready */
> +#define PCI_DOE_WRITE 0x10 /* DOE Write Data Mailbox Register */
> +#define PCI_DOE_READ 0x14 /* DOE Read Data Mailbox Register */
> +
> +/* DOE Data Object - note not actually registers */
> +#define PCI_DOE_DATA_OBJECT_HEADER_1_VID 0x0000FFFF
> +#define PCI_DOE_DATA_OBJECT_HEADER_1_TYPE 0x00FF0000
> +#define PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH 0x0003FFFF
> +
> +#define PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX 0x000000FF
> +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID 0x0000FFFF
> +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL 0x00FF0000
> +#define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX 0xFF000000
This file is a mix, but the consensus is lower-case in hex. I'd leave
PCI_EXT_CAP_ID_DOE as-is to match the local context and use
lower-case consistently in this hunk.
> #endif /* LINUX_PCI_REGS_H */
> --
> 2.19.1
>
next prev parent reply other threads:[~2021-04-13 19:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-13 16:01 [RFC PATCH v2 0/4] PCI Data Object Exchange support + CXL CDAT Jonathan Cameron
2021-04-13 16:01 ` [RFC PATCH v2 1/4] PCI: Add vendor define ID for the PCI SIG Jonathan Cameron
2021-04-13 16:34 ` Bjorn Helgaas
2021-04-13 18:21 ` Jonathan Cameron
2021-04-13 18:38 ` Bjorn Helgaas
2021-04-13 16:01 ` [RFC PATCH v2 2/4] PCI/doe: Initial support PCI Data Object Exchange Jonathan Cameron
2021-04-13 19:49 ` Bjorn Helgaas [this message]
2021-04-13 16:01 ` [RFC PATCH v2 3/4] cxl/mem: Add CDAT table reading from DOE Jonathan Cameron
2021-04-14 18:14 ` Konrad Rzeszutek Wilk
2021-04-14 18:50 ` Ben Widawsky
2021-04-13 16:01 ` [RFC PATCH v2 4/4] cxl/mem: Add a debug parser for CDAT commands Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210413194927.GA2241331@bjorn-Precision-5520 \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=ben.widawsky@intel.com \
--cc=cbrowy@avery-design.com \
--cc=dan.j.williams@intel.com \
--cc=f.fangjian@huawei.com \
--cc=ira.weiny@intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=lorenzo.pieralisi@arm.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).