From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62C84ECE564 for ; Tue, 18 Sep 2018 18:42:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 030A62150B for ; Tue, 18 Sep 2018 18:42:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EI+b6CDh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 030A62150B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730225AbeISAQq (ORCPT ); Tue, 18 Sep 2018 20:16:46 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:40853 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729618AbeISAQq (ORCPT ); Tue, 18 Sep 2018 20:16:46 -0400 Received: by mail-pf1-f196.google.com with SMTP id s13-v6so1433760pfi.7 for ; Tue, 18 Sep 2018 11:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ch1q9tyLzHlfEUq+lFlyelPOQLAB/H3ySbiSj0A7Z2U=; b=EI+b6CDh4TaWw68vsOVgxPBy2WQCXms9LwucTVv11Y65hepRvA+bEcwgRUzfHGC6fM PLJUAxXaeTSeFOdSfx0qw350EByQADvGLDJj9/gwJSqNDv48IjvreIShSmf/WmV1vEkZ sKpbiMNnv9f6egk+Y6/lPoF1Nz57iX8EVAnoJakOrP50+uqvhRf2kIlblzP+C9DS2Hr/ QOx0/aVfxiumiNqeNGtUotqh7WthbAu7FCGM3kqEwyxZYLTm6aSXNkzuS8vMUBHLThC1 89HLI+2PJOh+RnuA6HDtH/qNCEcZZWfpQz2wmLS2/xtU3amqTnyW7vXIe58B7sMYoJw8 DCQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ch1q9tyLzHlfEUq+lFlyelPOQLAB/H3ySbiSj0A7Z2U=; b=r8u1Nx5pMfEmkYw8fLh2raz6o5ZVhqoLJxmoTASyLOImBJPciYPmcO4Y2lGEpuN9sP +xJVkFwLg7fKUf3Emqsowj0nGsPsKB5X+lbgGdeMoXEaVmwAjtrnS8E/5qs7Q6/zguB2 T09jbbpCAto2Mapjve6j4aFIDE/m65naDMeYrZtnGrSRlYtqyN8xUSRP0WcJr67TOWwp 7YjqN+NpTyR2WjNwbq9aF8iBlD5Boibvm6PAZmA9gQvQEJKtV4JcW4D2bHMSp1WtyUtR OqqoePssuIJEx4q9GBpuHvkvhD9u4wp3scGvggfm+9X93dUYqaBB2A61DJ8hPEmtRoau cArQ== X-Gm-Message-State: APzg51CzRmxrgDHGvslJ+/wcCUl7ECYgTS0uQsCS4yg+IxQ8gXZcH24r La15h7lt/S4X1B3BbfmfRG9E4EkFxFHKKsO2VspE1A== X-Google-Smtp-Source: ANB0Vda5wT/bctc/0cwLuq4LLmmkbU3BbnjWS85R32SqMwz2dbPYMbq/PO7i1Bk0pFaB+TJHDOXgypuE9fSNKz8gfT8= X-Received: by 2002:a63:8543:: with SMTP id u64-v6mr29876886pgd.248.1537296171927; Tue, 18 Sep 2018 11:42:51 -0700 (PDT) MIME-Version: 1.0 References: <20180911225630.124502-1-venture@google.com> <585d1c3a-6121-c20d-f6d6-7567595cd1af@acm.org> In-Reply-To: From: Patrick Venture Date: Tue, 18 Sep 2018 11:42:40 -0700 Message-ID: Subject: Re: [PATCH v2] ipmi: looped device detection To: Corey Minyard Cc: Arnd Bergmann , Greg KH , openipmi-developer@lists.sourceforge.net, Linux Kernel Mailing List , OpenBMC Maillist Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 12, 2018 at 3:54 PM Patrick Venture wrote: > > On Wed, Sep 12, 2018 at 3:10 PM Corey Minyard wrote: > > > > On 09/11/2018 05:56 PM, Patrick Venture wrote: > > > Try to get the device ID repeatedly during initialization before giving up. > > > The BMC isn't always responsive, and this allows it to be slightly flaky > > > during early boot. > > > > > > Tested: Installed on a system with the BMC software disabled > > > such that it was non-responsive. The driver correctly detected this > > > and gave up as expected. Then I re-enabled the BMC software unloaded > > > and reloaded the driver and it was detected properly. > > > > The patch looks fine, but I wonder if this is something that is really > > valuable. > > I have wondered about this before. > > > > The question is: If the BMC is unavailable, what are the chances of it > > becoming > > available by the time you do 5 attempts? I would guess that is a pretty > > small > > chance, which is why I haven't done this already. Friendly ping. I'd like to get a sense of whether you're likely to accept this. If not, it's fine, will close out patch in current downstream rebase. Thanks > > This patch was actually critical for us to provide a reliable IPMI > interface. The version of OpenBMC or the state of the BMC at the > point the kernel was loading was flaky, so following the example in > the BIOS source, we just re-try a few times. We also can hold boot X > seconds until it's responding, but, this avoided some issues inherent > with that. > > > > > You could have something that re-tested periodically, but there are so many > > systems with IPMI specified in ACPI or SMBIOS that is wrong, and it would > > try forever. Also not really a good thing. > > If we did a periodic check, it could check X times, but I felt going > for a simple solution was ideal -- and this idea was proved out on a > few platforms. We have other drivers that are loaded by the kernel > (not at run-time) and they depend on IPMI, and without this patch they > would then have a non-trivial probability of failure. > > > > > So I've left it to reload the driver or use the hotmod interface. > > > > -corey > > > > > Signed-off-by: Patrick Venture > > > --- > > > v2: > > > - removed extra variable that was set but not used. > > > --- > > > drivers/char/ipmi/ipmi_si_intf.c | 23 ++++++++++++++++++++++- > > > 1 file changed, 22 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c > > > index 90ec010bffbd..5fed96897fe8 100644 > > > --- a/drivers/char/ipmi/ipmi_si_intf.c > > > +++ b/drivers/char/ipmi/ipmi_si_intf.c > > > @@ -1918,11 +1918,13 @@ int ipmi_si_add_smi(struct si_sm_io *io) > > > * held, primarily to keep smi_num consistent, we only one to do these > > > * one at a time. > > > */ > > > +#define GET_DEVICE_ID_ATTEMPTS 5 > > > static int try_smi_init(struct smi_info *new_smi) > > > { > > > int rv = 0; > > > int i; > > > char *init_name = NULL; > > > + unsigned long sleep_rm; > > > > > > pr_info(PFX "Trying %s-specified %s state machine at %s address 0x%lx, slave address 0x%x, irq %d\n", > > > ipmi_addr_src_to_str(new_smi->io.addr_source), > > > @@ -2003,7 +2005,26 @@ static int try_smi_init(struct smi_info *new_smi) > > > * Attempt a get device id command. If it fails, we probably > > > * don't have a BMC here. > > > */ > > > - rv = try_get_dev_id(new_smi); > > > + for (i = 0; i < GET_DEVICE_ID_ATTEMPTS; i++) { > > > + pr_info(PFX "Attempting to read BMC device ID\n"); > > > + rv = try_get_dev_id(new_smi); > > > + /* If it succeeded, stop trying */ > > > + if (!rv) > > > + break; > > > + > > > + /* Sleep for ~0.25s before trying again instead of hammering > > > + * the BMC. > > > + */ > > > + sleep_rm = msleep_interruptible(250); > > > + if (sleep_rm != 0) { > > > + pr_info(PFX "Find BMC interrupted\n"); > > > + rv = -EINTR; > > > + goto out_err; > > > + } > > > + } > > > + > > > + /* If we exited the loop above and rv is non-zero we ran out of tries. > > > + */ > > > if (rv) { > > > if (new_smi->io.addr_source) > > > dev_err(new_smi->io.dev, > > > >