All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Weiny, Ira" <ira.weiny@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-cxl@vger.kernel.org, Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: [PATCH V6 03/10] PCI/DOE: Add Data Object Exchange Aux Driver
Date: Wed, 9 Feb 2022 08:26:43 -0800	[thread overview]
Message-ID: <CAPcyv4g2nNHKPuYVOEH3TbJtCiB1rkRNCVbfDWHnWkotvTAcJg@mail.gmail.com> (raw)
In-Reply-To: <20220209101320.00000473@Huawei.com>

On Wed, Feb 9, 2022 at 2:13 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
[..]
> > > + *
> > > + * @doe_dev: The DOE Auxiliary device being driven
> > > + * @abort_c: Completion used for initial abort handling
> > > + * @irq: Interrupt used for signaling DOE ready or abort
> > > + * @irq_name: Name used to identify the irq for a particular DOE
> > > + * @prots: Array of identifiers for protocols supported
> >
> > "prot" already has a meaning in the kernel, just spell out
> > "protocols". This also looks like something that can be allocated
> > inline rather than out of line i.e.:
> >
> > struct pci_doe {
> > ...
> >     int nr_protocols
> >     struct pci_doe_protocol protocols[];
> > }
> >
> > ...and then use struct_size() to allocate it.
>
> Can't do that. The size isn't known when we first start using
> this structure - We need to use it to query what protocols are
> supported.  It's initially set to 1 to cover the discovery
> protocol and then we realloc to expand it as we discover more
> protocols.

Ok.

>
> >
> > > + * @num_prots: Size of prots array
> > > + * @cur_task: Current task the state machine is working on
> > > + * @wq: Wait queue to wait on if a query is in progress
> > > + * @state_lock: Protect the state of cur_task, abort, and dead
> > > + * @statemachine: Work item for the DOE state machine
> > > + * @state: Current state of this DOE
> > > + * @timeout_jiffies: 1 second after GO set
> > > + * @busy_retries: Count of retry attempts
> > > + * @abort: Request a manual abort (e.g. on init)
> > > + * @dead: Used to mark a DOE for which an ABORT has timed out. Further messages
> > > + *        will immediately be aborted with error
> > > + */
> > > +struct pci_doe {
> > > +       struct pci_doe_dev *doe_dev;
> > > +       struct completion abort_c;
> > > +       int irq;
> > > +       char *irq_name;
> > > +       struct pci_doe_protocol *prots;
> > > +       int num_prots;
> > > +
> > > +       struct pci_doe_task *cur_task;
> > > +       wait_queue_head_t wq;
> > > +       struct mutex state_lock;
> > > +       struct delayed_work statemachine;
> > > +       enum pci_doe_state state;
> > > +       unsigned long timeout_jiffies;
> > > +       unsigned int busy_retries;
> > > +       unsigned int abort:1;
> > > +       unsigned int dead:1;
> > > +};
> > > +
> > > +static irqreturn_t pci_doe_irq(int irq, void *data)
> > > +{
> > > +       struct pci_doe *doe = data;
> > > +       struct pci_dev *pdev = doe->doe_dev->pdev;
> > > +       int offset = doe->doe_dev->cap_offset;
> > > +       u32 val;
> > > +
> > > +       pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
> > > +
> > > +       /* Leave the error case to be handled outside IRQ */
> > > +       if (FIELD_GET(PCI_DOE_STATUS_ERROR, val)) {
> > > +               mod_delayed_work(system_wq, &doe->statemachine, 0);
> > > +               return IRQ_HANDLED;
> > > +       }
> > > +
> > > +       if (FIELD_GET(PCI_DOE_STATUS_INT_STATUS, val)) {
> > > +               pci_write_config_dword(pdev, offset + PCI_DOE_STATUS,
> > > +                                       PCI_DOE_STATUS_INT_STATUS);
> > > +               mod_delayed_work(system_wq, &doe->statemachine, 0);
> > > +               return IRQ_HANDLED;
> > > +       }
> > > +
> > > +       return IRQ_NONE;
> > > +}
> > > +
> > > +/*
> > > + * Only call when safe to directly access the DOE, either because no tasks yet
> > > + * queued, or called from doe_statemachine_work() which has exclusive access to
> > > + * the DOE config space.
> >
> > It doesn't have exclusive access unless the patch to lock out
> > userspace config writes are revived. Instead, I like Bjorn's idea of
> > tracking and warning / tainting, but not blocking conflicting
> > userspace access to sensitive configuration registers.
> >
> > Yes, it was somewhat of a throw-away comment from Bjorn in that
> > thread, "(and IMO should taint the kernel)", but DOE can do so much
> > subtle damage (compliance test modes, link-encryption / disruption,
> > vendor private who-knows-what...) that I think it behooves us as
> > kernel developers to know when we are debugging system behavior that
> > may be the result of non-kernel mitigated DOE access. The proposal is
> > that when kernel lockdown is not enabled, use the approach from the
> > exclusive config access patch [2] to trap, warn (once per device?),
> > and taint when userspace writes to DOE registers that have been
> > claimed by the kernel. This lets strict environments use
> > kernel-lockdown to block userspace DOE access altogether, in
> > non-strict environment it discourages userspace from clobbering DOE
> > driver state, and it allows a warn-free path if userspace takes the
> > step of at least unbinding the kernel DOE driver before running
> > userspace DOE cycles.
> >
> > [1]: https://lore.kernel.org/r/20211203235617.GA3036259@bhelgaas
> > [2]: https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
>
> Good info. I'd missed some of the subtle parts of that discussion.
>
> >
> > > + */
> > > +static void pci_doe_abort_start(struct pci_doe *doe)
> > > +{
> > > +       struct pci_dev *pdev = doe->doe_dev->pdev;
> > > +       int offset = doe->doe_dev->cap_offset;
> > > +       u32 val;
> > > +
> > > +       val = PCI_DOE_CTRL_ABORT;
> > > +       if (doe->irq)
> > > +               val |= PCI_DOE_CTRL_INT_EN;
> > > +       pci_write_config_dword(pdev, offset + PCI_DOE_CTRL, val);
> > > +
> > > +       doe->timeout_jiffies = jiffies + HZ;
> > > +       schedule_delayed_work(&doe->statemachine, HZ);
> >
> > Given the spec timeout is 1 second and the device clock might be
> > slightly off from the host clock how about make this a more generous
> > 1.5 or 2 seconds?
>
> Makes sense. Though if a clock is bad enough we need 2 seconds that
> is pretty awful! :)

Perhaps, though DOE is not a fast path, and block I/O defaults to a 30
second timeout, so I don't think anyone would blink at a 2 seconds for
DOE.

>
> >
> > > +}
> > > +
> > > +static int pci_doe_send_req(struct pci_doe *doe, struct pci_doe_exchange *ex)
> >
> > The relationship between tasks, requests, responses, and exchanges is
> > not immediately clear to me. For example, can this helper be renamed
> > in terms of its relationship to a task? A theory of operation document
> > would help, but it seems there is also room for the implementation to
> > be more self documenting.
>
> Not totally sure what such naming would be.
>
> A task is the management wrapper around an exchange which is a request
> + response pair.  In the sense you queue a task which will carry out
> and exchange by sending a request and receiving a response.
>
> Could rename this pci_doe_start_exchange() but that then obscures
> that we mean send the request to the hardware and removes the resemblance
> to what I recall the specification uses.

I'm not a big fan of copying spec names *if* Linux has a more
idiomatic name for the concept. I am mainly reviewing this from the
perspective that 'struct bio' and 'struct request' naming /
organization is idiomatic for Linux driver transaction flows. Up to
this point in the review I was mapping tasks to bios and exchanges to
requests but then the usage of "req" in this function name threw off
my ontology. At a minimum a decoder ring style comment, like your
reply, about the relationship between these terms would help avoid
this exercise again.

> > > +       case DOE_WAIT_ABORT:
> > > +       case DOE_WAIT_ABORT_ON_ERR:
> > > +               pci_read_config_dword(pdev, offset + PCI_DOE_STATUS, &val);
> > > +
> > > +               if (!FIELD_GET(PCI_DOE_STATUS_ERROR, val) &&
> > > +                   !FIELD_GET(PCI_DOE_STATUS_BUSY, val)) {
> > > +                       /* Back to normal state - carry on */
> > > +                       mutex_lock(&doe->state_lock);
> > > +                       doe->cur_task = NULL;
> > > +                       mutex_unlock(&doe->state_lock);
> > > +                       wake_up_interruptible(&doe->wq);
> > > +
> > > +                       /*
> > > +                        * For deliberately triggered abort, someone is
> > > +                        * waiting.
> > > +                        */
> > > +                       if (doe->state == DOE_WAIT_ABORT)
> > > +                               complete(&doe->abort_c);
> >
> > Why is a completion and waitqueue needed? I.e. a waiter could simply
> > look for an abort completion flag to be set instead.
>
> You mean use the main completion (the one for the non abort case)
> and a flag?
>
> Or a wait_event() with appropriate check?
>
> Could do that but I'm not sure I understand why we care either way?

Just reduction in machinery that needs to be maintained /
comprehended. 2 wait primitives when one will do will always be a
tempting cleanup target.

[..]
> > > +/**
> > > + * pci_doe_exchange_sync() - Send a request, then wait for and receive a
> > > + *                          response
> > > + * @doe_dev: DOE mailbox state structure
> > > + * @ex: Description of the buffers and Vendor ID + type used in this
> > > + *      request/response pair
> > > + *
> > > + * Excess data will be discarded.
> > > + *
> > > + * RETURNS: payload in bytes on success, < 0 on error
> > > + */
> > > +int pci_doe_exchange_sync(struct pci_doe_dev *doe_dev,
> > > +                         struct pci_doe_exchange *ex)
> > > +{
> > > +       struct pci_doe *doe = dev_get_drvdata(&doe_dev->adev.dev);
> > > +       struct pci_doe_task task;
> > > +       DECLARE_COMPLETION_ONSTACK(c);
> > > +
> > > +       if (!doe)
> > > +               return -EAGAIN;
> > > +
> > > +       /* DOE requests must be a whole number of DW */
> > > +       if (ex->request_pl_sz % sizeof(u32))
> > > +               return -EINVAL;
> > > +
> > > +       task.ex = ex;
> > > +       task.cb = pci_doe_task_complete;
> > > +       task.private = &c;
> > > +
> > > +again:
> > > +       mutex_lock(&doe->state_lock);
> > > +       if (doe->cur_task) {
> > > +               mutex_unlock(&doe->state_lock);
> > > +               wait_event_interruptible(doe->wq, doe->cur_task == NULL);
> > > +               goto again;
> > > +       }
> > > +
> > > +       if (doe->dead) {
> > > +               mutex_unlock(&doe->state_lock);
> > > +               return -EIO;
> > > +       }
> > > +       doe->cur_task = &task;
> > > +       schedule_delayed_work(&doe->statemachine, 0);
> > > +       mutex_unlock(&doe->state_lock);
> > > +
> > > +       wait_for_completion(&c);
> >
> > I would expect that the caller of this routine would want to specify
> > the task and end_task() callback and use that as the completion
> > signal. It may also want "no wait" behavior where it is prepared for
> > the DOE result to come back sometime later. With that change the
> > exchange fields can move into the task directly.
>
> This is the simple synchronous wrapper around an async core.
> If we want an async path at somepoint in the future where we have
> someone using it then sure, we can have an async version that
> takes the callback.

It just seems an unnecessary hunk of code for the core to carry when
it's trivial for a client of the core to do:

task->private = &completion;
task->end_task = complete_completion;
submit_task()
wait_for_completion(&completion);

> > > +       return task.rv;
> > > +}
> > > +EXPORT_SYMBOL_GPL(pci_doe_exchange_sync);
> > > +
> > > +/**
> > > + * pci_doe_supports_prot() - Return if the DOE instance supports the given
> > > + *                          protocol
> > > + * @pdev: Device on which to find the DOE instance
> > > + * @vid: Protocol Vendor ID
> > > + * @type: protocol type
> > > + *
> > > + * This device can then be passed to pci_doe_exchange_sync() to execute a
> > > + * mailbox exchange through that DOE mailbox.
> > > + *
> > > + * RETURNS: True if the DOE device supports the protocol specified
> > > + */
> > > +bool pci_doe_supports_prot(struct pci_doe_dev *doe_dev, u16 vid, u8 type)
> > > +{
> > > +       struct pci_doe *doe = dev_get_drvdata(&doe_dev->adev.dev);
> > > +       int i;
> > > +
> > > +       if (!doe)
> > > +               return false;
> > > +
> > > +       for (i = 0; i < doe->num_prots; i++)
> > > +               if ((doe->prots[i].vid == vid) &&
> > > +                   (doe->prots[i].type == type))
> > > +                       return true;
> > > +
> > > +       return false;
> > > +}
> > > +EXPORT_SYMBOL_GPL(pci_doe_supports_prot);
> > > +
> > > +static int pci_doe_discovery(struct pci_doe *doe, u8 *index, u16 *vid,
> > > +                            u8 *protocol)
> > > +{
> > > +       u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX,
> > > +                                   *index);
> > > +       u32 response_pl;
> > > +       struct pci_doe_exchange ex = {
> > > +               .prot.vid = PCI_VENDOR_ID_PCI_SIG,
> > > +               .prot.type = PCI_DOE_PROTOCOL_DISCOVERY,
> > > +               .request_pl = &request_pl,
> > > +               .request_pl_sz = sizeof(request_pl),
> > > +               .response_pl = &response_pl,
> > > +               .response_pl_sz = sizeof(response_pl),
> > > +       };
> > > +       int ret;
> > > +
> > > +       ret = pci_doe_exchange_sync(doe->doe_dev, &ex);
> > > +       if (ret < 0)
> > > +               return ret;
> > > +
> > > +       if (ret != sizeof(response_pl))
> > > +               return -EIO;
> > > +
> > > +       *vid = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID, response_pl);
> > > +       *protocol = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL,
> > > +                             response_pl);
> > > +       *index = FIELD_GET(PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX,
> > > +                          response_pl);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int pci_doe_cache_protocols(struct pci_doe *doe)
> > > +{
> > > +       u8 index = 0;
> > > +       int num_prots;
> > > +       int rc;
> > > +
> > > +       /* Discovery protocol must always be supported and must report itself */
> > > +       num_prots = 1;
> > > +       doe->prots = devm_kcalloc(&doe->doe_dev->adev.dev, num_prots,
> > > +                                 sizeof(*doe->prots), GFP_KERNEL);
> > > +       if (doe->prots == NULL)
> > > +               return -ENOMEM;
> > > +
> > > +       do {
> > > +               struct pci_doe_protocol *prot;
> > > +
> > > +               prot = &doe->prots[num_prots - 1];
> > > +               rc = pci_doe_discovery(doe, &index, &prot->vid, &prot->type);
> > > +               if (rc)
> > > +                       return rc;
> > > +
> > > +               if (index) {
> > > +                       struct pci_doe_protocol *prot_new;
> > > +
> > > +                       num_prots++;
> > > +                       prot_new = devm_krealloc(&doe->doe_dev->adev.dev,
> > > +                                                doe->prots,
> > > +                                                sizeof(*doe->prots) *
> > > +                                                       num_prots,
> > > +                                                GFP_KERNEL);
> > > +                       if (prot_new == NULL)
> > > +                               return -ENOMEM;
> > > +                       doe->prots = prot_new;
> > > +               }
> > > +       } while (index);
> > > +
> > > +       doe->num_prots = num_prots;
> > > +       return 0;
> > > +}
> > > +
> > > +static int pci_doe_abort(struct pci_doe *doe)
> > > +{
> > > +       reinit_completion(&doe->abort_c);
> > > +       mutex_lock(&doe->state_lock);
> > > +       doe->abort = true;
> >
> > Why not a flags field where atomic bitops can be used without need for a mutex.
>
> I'll go the other way, why bother with atomics when this isn't a high performance
> path or something expected to happen often?

It obfuscates what the lock is protecting if it's used for state
management and atomic flag management, but I am not holding the pen
here, so I can let this arbitrary trade-off go.

  reply	other threads:[~2022-02-09 16:27 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01  7:19 [PATCH V6 00/10] CXL: Read CDAT and DSMAS data from the device ira.weiny
2022-02-01  7:19 ` [PATCH V6 01/10] PCI: Add vendor ID for the PCI SIG ira.weiny
2022-02-03 17:11   ` Bjorn Helgaas
2022-02-03 20:28     ` Ira Weiny
2022-02-01  7:19 ` [PATCH V6 02/10] PCI: Replace magic constant for PCI Sig Vendor ID ira.weiny
2022-02-04 21:16   ` Dan Williams
2022-02-04 21:49   ` Bjorn Helgaas
2022-03-15 21:48     ` Ira Weiny
2022-02-01  7:19 ` [PATCH V6 03/10] PCI/DOE: Add Data Object Exchange Aux Driver ira.weiny
2022-02-03 22:40   ` Bjorn Helgaas
2022-03-15 21:48     ` Ira Weiny
2022-02-09  0:59   ` Dan Williams
2022-02-09 10:13     ` Jonathan Cameron
2022-02-09 16:26       ` Dan Williams [this message]
2022-02-09 16:57         ` Jonathan Cameron
2022-02-09 19:57           ` Dan Williams
2022-02-10 21:51             ` Ira Weiny
2022-03-16 22:50     ` Ira Weiny
2022-03-17 19:37       ` Ira Weiny
2022-02-01  7:19 ` [PATCH V6 04/10] PCI/DOE: Introduce pci_doe_create_doe_devices ira.weiny
2022-02-03 22:44   ` Bjorn Helgaas
2022-02-04 14:51     ` Jonathan Cameron
2022-02-04 16:27       ` Bjorn Helgaas
2022-02-11  2:54         ` Dan Williams
2022-03-24  0:26     ` Ira Weiny
2022-03-24 14:05       ` Jonathan Cameron
2022-03-24 23:44         ` Ira Weiny
2022-03-25 12:02           ` Jonathan Cameron
2022-02-01  7:19 ` [PATCH V6 05/10] cxl/pci: Create DOE auxiliary devices ira.weiny
2022-02-01  7:19 ` [PATCH V6 06/10] cxl/pci: Find the DOE mailbox which supports CDAT ira.weiny
2022-02-01 18:49   ` Ben Widawsky
2022-02-01 22:18     ` Ira Weiny
2022-02-04 14:04       ` Jonathan Cameron
2022-02-01  7:19 ` [PATCH V6 07/10] cxl/mem: Read CDAT table ira.weiny
2022-02-04 13:46   ` Jonathan Cameron
2022-02-01  7:19 ` [PATCH V6 08/10] cxl/cdat: Introduce cdat_hdr_valid() ira.weiny
2022-02-01 18:56   ` Ben Widawsky
2022-02-01 22:29     ` Ira Weiny
2022-02-04 13:17       ` Jonathan Cameron
2022-02-01  7:19 ` [PATCH V6 09/10] cxl/mem: Retry reading CDAT on failure ira.weiny
2022-02-01 18:59   ` Ben Widawsky
2022-02-01 22:31     ` Ira Weiny
2022-02-04 13:20       ` Jonathan Cameron
2022-02-01  7:19 ` [PATCH V6 10/10] cxl/cdat: Parse out DSMAS data from CDAT table ira.weiny
2022-02-01 19:05   ` Ben Widawsky
2022-02-01 22:37     ` Ira Weiny
2022-02-04 13:33       ` Jonathan Cameron
2022-02-04 13:41       ` Jonathan Cameron
2022-02-04 13:40   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4g2nNHKPuYVOEH3TbJtCiB1rkRNCVbfDWHnWkotvTAcJg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=bhelgaas@google.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.