All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Ben Widawsky <ben.widawsky@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-cxl@vger.kernel.org, Linux PCI <linux-pci@vger.kernel.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>
Subject: Re: [PATCH 04/23] cxl/pci: Implement Interface Ready Timeout
Date: Wed, 24 Nov 2021 23:14:04 -0800	[thread overview]
Message-ID: <CAPcyv4h8a9HD8GPJVd07471s+6RmR77Gk4B4vjgPYOE3x2gDoQ@mail.gmail.com> (raw)
In-Reply-To: <20211125061718.xarzhrm7slyjybvv@intel.com>

On Wed, Nov 24, 2021 at 10:17 PM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> On 21-11-24 11:56:36, Dan Williams wrote:
> > On Mon, Nov 22, 2021 at 9:54 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Mon, 22 Nov 2021 09:17:31 -0800
> > > Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > > On 21-11-22 15:02:27, Jonathan Cameron wrote:
> > > > > On Fri, 19 Nov 2021 16:02:31 -0800
> > > > > Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > >
> > > > > > The original driver implementation used the doorbell timeout for the
> > > > > > Mailbox Interface Ready bit to piggy back off of, since the latter
> > > > > > doesn't have a defined timeout. This functionality, introduced in
> > > > > > 8adaf747c9f0 ("cxl/mem: Find device capabilities"), can now be improved
> > > > > > since a timeout has been defined with an ECN to the 2.0 spec.
> > > > > >
> > > > > > While devices implemented prior to the ECN could have an arbitrarily
> > > > > > long wait and still be within spec, the max ECN value (256s) is chosen
> > > > > > as the default for all devices. All vendors in the consortium agreed to
> > > > > > this amount and so it is reasonable to assume no devices made will
> > > > > > exceed this amount.
> > > > >
> > > > > Optimistic :)
> > > > >
> > > >
> > > > Reasonable to assume is certainly not the same as "in reality". I can soften
> > > > this wording.
> > > >
> > > > > >
> > > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > > ---
> > > > > > This patch did not exist in RFCv2
> > > > > > ---
> > > > > >  drivers/cxl/pci.c | 29 +++++++++++++++++++++++++++++
> > > > > >  1 file changed, 29 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > > > > index 6c8d09fb3a17..2cef9fec8599 100644
> > > > > > --- a/drivers/cxl/pci.c
> > > > > > +++ b/drivers/cxl/pci.c
> > > > > > @@ -2,6 +2,7 @@
> > > > > >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > > > > >  #include <linux/io-64-nonatomic-lo-hi.h>
> > > > > >  #include <linux/module.h>
> > > > > > +#include <linux/delay.h>
> > > > > >  #include <linux/sizes.h>
> > > > > >  #include <linux/mutex.h>
> > > > > >  #include <linux/list.h>
> > > > > > @@ -298,6 +299,34 @@ static int cxl_pci_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *c
> > > > > >  static int cxl_pci_setup_mailbox(struct cxl_dev_state *cxlds)
> > > > > >  {
> > > > > >   const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
> > > > > > + unsigned long timeout;
> > > > > > + u64 md_status;
> > > > > > + int rc;
> > > > > > +
> > > > > > + /*
> > > > > > +  * CXL 2.0 ECN "Add Mailbox Ready Time" defines a capability field to
> > > > > > +  * dictate how long to wait for the mailbox to become ready. For
> > > > > > +  * simplicity, and to handle devices that might have been implemented
> > > > >
> > > > > I'm not keen on the 'for simplicity' argument here.  If the device is advertising
> > > > > a lower value, then that is what we should use.  It's fine to wait the max time
> > > > > if nothing is specified.  It'll cost us a few lines of code at most unless
> > > > > I am missing something...
> > > > >
> > > > > Jonathan
> > > > >
> > > >
> > > > Let me pose it a different way, if a device advertises 1s, but for whatever
> > > > takes 4s to come up, should we penalize it over the device advertising 256s?
> > >
> > > Yes, because it is buggy.  A compliance test should have failed on this anyway.
> > >
> > > > The
> > > > way this field is defined in the spec would [IMHO] lead vendors to simply put
> > > > the max field in there to game the driver, so why not start off with just
> > > > insisting they don't?
> > >
> > > Given reading this value and getting a big number gives the implication that
> > > the device is meant to be really slow to initialize, I'd expect that to push
> > > vendors a little in the directly of putting realistic values in).
> > >
> > > Maybe we should print the value in the log to make them look silly ;)
> >
> > A print message on the way to a static default timeout value is about
> > all a device's self reported timeout is useful.
> >
> > "device not ready after waiting %d seconds, continuing to wait up to %d seconds"
> >
> > It's also not clear to me that the Linux default timeout should be so
> > generous at 256 seconds. It might be suitable to just complain about
> > devices that are taking more than 60 seconds to initialize with an
> > option to override that timeout for odd outliers. Otherwise encourage
> > hardware implementations to beat the Linux timeout value to get
> > support out of the box.
> >
> > I notice that not even libata waits more than a minute for a given
> > device to finish post-reset shenanigans, so might as well set 60
> > seconds as what the driver will tolerate out of the box.
>
> 60s is infinity, so 4x * infinity doesn't really make much difference does it
> :P?

1 minute is half the hung task timeout in case something accidentally
did an uninterruptible sleep on probe completion event. 4 minutes is
on the order of what it takes a large server to boot. A single device
needs as much time as a server to boot?

> In my opinion if we're going to pick a limit, might as well tie it to a spec
> definition rather than 60s.. Perhaps 60s has some relevance I'm unaware of, but
> it seems equally arbitrary to me.

4 minutes just seems an unreasonable amount of time to wait to make a
decision that something is likely broken. If the industry actually
builds devices that nominally take multiple minutes to boot it's
already going to be in the realm of something custom in terms of
application expectations for when the server is ready. I'll buy you a
beverage of your choice if someone actually builds such a thing.

  reply	other threads:[~2021-11-25  7:15 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-20  0:02 [PATCH 00/23] Add drivers for CXL ports and mem devices Ben Widawsky
2021-11-20  0:02 ` [PATCH 01/23] cxl: Rename CXL_MEM to CXL_PCI Ben Widawsky
2021-11-22 14:47   ` Jonathan Cameron
2021-11-24  4:15   ` Dan Williams
2021-11-20  0:02 ` [PATCH 02/23] cxl: Flesh out register names Ben Widawsky
2021-11-22 14:49   ` Jonathan Cameron
2021-11-24  4:24   ` Dan Williams
2021-11-20  0:02 ` [PATCH 03/23] cxl/pci: Extract device status check Ben Widawsky
2021-11-22 15:03   ` Jonathan Cameron
2021-11-24 19:30   ` Dan Williams
2021-11-20  0:02 ` [PATCH 04/23] cxl/pci: Implement Interface Ready Timeout Ben Widawsky
2021-11-22 15:02   ` Jonathan Cameron
2021-11-22 17:17     ` Ben Widawsky
2021-11-22 17:53       ` Jonathan Cameron
2021-11-24 19:56         ` Dan Williams
2021-11-25  6:17           ` Ben Widawsky
2021-11-25  7:14             ` Dan Williams [this message]
2021-11-20  0:02 ` [PATCH 05/23] cxl/pci: Don't poll doorbell for mailbox access Ben Widawsky
2021-11-22 15:11   ` Jonathan Cameron
2021-11-22 17:24     ` Ben Widawsky
2021-11-24 21:55   ` Dan Williams
2021-11-29 18:33     ` Ben Widawsky
2021-11-29 19:02       ` Dan Williams
2021-11-29 19:11         ` Ben Widawsky
2021-11-29 19:18           ` Dan Williams
2021-11-29 19:31             ` Ben Widawsky
2021-11-29 19:37               ` Dan Williams
2021-11-29 19:50                 ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 06/23] cxl/pci: Don't check media status for mbox access Ben Widawsky
2021-11-22 15:19   ` Jonathan Cameron
2021-11-24 21:58   ` Dan Williams
2021-11-20  0:02 ` [PATCH 07/23] cxl/pci: Add new DVSEC definitions Ben Widawsky
2021-11-22 15:22   ` Jonathan Cameron
2021-11-22 17:32     ` Ben Widawsky
2021-11-24 22:03       ` Dan Williams
2021-11-20  0:02 ` [PATCH 08/23] cxl/acpi: Map component registers for Root Ports Ben Widawsky
2021-11-22 15:51   ` Jonathan Cameron
2021-11-22 19:28     ` Ben Widawsky
2021-11-24 22:18   ` Dan Williams
2021-11-20  0:02 ` [PATCH 09/23] cxl: Introduce module_cxl_driver Ben Widawsky
2021-11-22 15:54   ` Jonathan Cameron
2021-11-24 22:22   ` Dan Williams
2021-11-20  0:02 ` [PATCH 10/23] cxl/core: Convert decoder range to resource Ben Widawsky
2021-11-22 16:08   ` Jonathan Cameron
2021-11-24 22:41   ` Dan Williams
2021-11-20  0:02 ` [PATCH 11/23] cxl/core: Document and tighten up decoder APIs Ben Widawsky
2021-11-22 16:13   ` Jonathan Cameron
2021-11-24 22:55   ` Dan Williams
2021-11-20  0:02 ` [PATCH 12/23] cxl: Introduce endpoint decoders Ben Widawsky
2021-11-22 16:20   ` Jonathan Cameron
2021-11-22 19:37     ` Ben Widawsky
2021-11-25  0:07       ` Dan Williams
2021-11-29 20:05         ` Ben Widawsky
2021-11-29 20:07           ` Dan Williams
2021-11-29 20:12             ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 13/23] cxl/core: Move target population locking to caller Ben Widawsky
2021-11-22 16:33   ` Jonathan Cameron
2021-11-22 21:58     ` Ben Widawsky
2021-11-23 11:05       ` Jonathan Cameron
2021-11-25  0:34   ` Dan Williams
2021-11-20  0:02 ` [PATCH 14/23] cxl: Introduce topology host registration Ben Widawsky
2021-11-22 18:20   ` Jonathan Cameron
2021-11-22 22:30     ` Ben Widawsky
2021-11-25  1:09   ` Dan Williams
2021-11-29 21:23     ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 15/23] cxl/core: Store global list of root ports Ben Widawsky
2021-11-22 18:22   ` Jonathan Cameron
2021-11-22 22:32     ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 16/23] cxl/pci: Cache device DVSEC offset Ben Widawsky
2021-11-22 16:46   ` Jonathan Cameron
2021-11-22 22:34     ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 17/23] cxl: Cache and pass DVSEC ranges Ben Widawsky
2021-11-20  4:29   ` kernel test robot
2021-11-20  4:29     ` kernel test robot
2021-11-22 17:00   ` Jonathan Cameron
2021-11-22 22:50     ` Ben Widawsky
2021-11-26 11:37   ` Jonathan Cameron
2021-11-20  0:02 ` [PATCH 18/23] cxl/pci: Implement wait for media active Ben Widawsky
2021-11-22 17:03   ` Jonathan Cameron
2021-11-22 22:57     ` Ben Widawsky
2021-11-23 11:09       ` Jonathan Cameron
2021-11-23 16:04         ` Ben Widawsky
2021-11-23 17:48           ` Bjorn Helgaas
2021-11-23 19:37             ` Ben Widawsky
2021-11-26 11:36     ` Jonathan Cameron
2021-11-20  0:02 ` [PATCH 19/23] cxl/pci: Store component register base in cxlds Ben Widawsky
2021-11-20  7:28   ` kernel test robot
2021-11-20  7:28     ` kernel test robot
2021-11-22 17:11   ` Jonathan Cameron
2021-11-22 23:01     ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 20/23] cxl/port: Introduce a port driver Ben Widawsky
2021-11-20  3:14   ` kernel test robot
2021-11-20  3:14     ` kernel test robot
2021-11-20  5:38   ` kernel test robot
2021-11-20  5:38     ` kernel test robot
2021-11-22 17:41   ` Jonathan Cameron
2021-11-22 23:38     ` Ben Widawsky
2021-11-23 11:38       ` Jonathan Cameron
2021-11-23 16:14         ` Ben Widawsky
2021-11-23 18:21   ` Bjorn Helgaas
2021-11-23 22:03     ` Ben Widawsky
2021-11-23 22:36       ` Dan Williams
2021-11-23 23:38         ` Ben Widawsky
2021-11-23 23:55         ` Bjorn Helgaas
2021-11-24  0:40           ` Dan Williams
2021-11-24  6:33             ` Christoph Hellwig
2021-11-24  7:17               ` Dan Williams
2021-11-24  7:28                 ` Christoph Hellwig
2021-11-24  7:33                   ` Greg Kroah-Hartman
2021-11-24  7:54                     ` Dan Williams
2021-11-24  8:21                       ` Greg Kroah-Hartman
2021-11-24 18:24                         ` Dan Williams
2021-12-02 21:24                 ` Bjorn Helgaas
2021-12-03  1:38                   ` Dan Williams
2021-12-03 22:03                     ` Bjorn Helgaas
2021-12-04  1:24                       ` Dan Williams
2021-12-07  2:56                         ` Bjorn Helgaas
2021-12-07  4:48                           ` Dan Williams
2021-11-24 21:31       ` Bjorn Helgaas
2021-11-20  0:02 ` [PATCH 21/23] cxl: Unify port enumeration for decoders Ben Widawsky
2021-11-22 17:48   ` Jonathan Cameron
2021-11-22 23:44     ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 22/23] cxl/mem: Introduce cxl_mem driver Ben Widawsky
2021-11-20  0:40   ` Randy Dunlap
2021-11-21  3:55     ` Ben Widawsky
2021-11-22 18:17   ` Jonathan Cameron
2021-11-23  0:05     ` Ben Widawsky
2021-11-20  0:02 ` [PATCH 23/23] cxl/mem: Disable switch hierarchies for now Ben Widawsky
2021-11-22 18:19   ` Jonathan Cameron
2021-11-22 19:17     ` Ben Widawsky
2021-11-25 21:53 [PATCH 14/23] cxl: Introduce topology host registration kernel test robot
2021-11-29 11:42 ` Dan Carpenter
2021-11-29 11:42 ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4h8a9HD8GPJVd07471s+6RmR77Gk4B4vjgPYOE3x2gDoQ@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ben.widawsky@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.