linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Leon Romanovsky <leon@kernel.org>,
	bhelgaas@google.com, corbet@lwn.net, linux-doc@vger.kernel.org,
	linux-pci@vger.kernel.org, mstowe@redhat.com
Subject: Re: [PATCH] pci-driver: Add driver load messages
Date: Thu, 18 Feb 2021 13:36:35 -0500	[thread overview]
Message-ID: <dea3517c-8f2f-9a18-81d2-ab6468354040@redhat.com> (raw)
In-Reply-To: <20210126151259.GA2886142@bjorn-Precision-5520>



On 1/26/21 10:12 AM, Bjorn Helgaas wrote:
> Hi Prarit,
> 
> On Tue, Jan 26, 2021 at 09:05:23AM -0500, Prarit Bhargava wrote:
>> On 1/26/21 8:53 AM, Leon Romanovsky wrote:
>>> On Tue, Jan 26, 2021 at 08:42:12AM -0500, Prarit Bhargava wrote:
>>>> On 1/26/21 8:14 AM, Leon Romanovsky wrote:
>>>>> On Tue, Jan 26, 2021 at 07:54:46AM -0500, Prarit Bhargava wrote:
>>>>>>   Leon Romanovsky <leon@kernel.org> wrote:
>>>>>>> On Mon, Jan 25, 2021 at 02:41:38PM -0500, Prarit Bhargava wrote:
>>>>>>>> There are two situations where driver load messages are helpful.
>>>>>>>>
>>>>>>>> 1) Some drivers silently load on devices and debugging driver or system
>>>>>>>> failures in these cases is difficult.  While some drivers (networking
>>>>>>>> for example) may not completely initialize when the PCI driver probe() function
>>>>>>>> has returned, it is still useful to have some idea of driver completion.
>>>>>>>
>>>>>>> Sorry, probably it is me, but I don't understand this use case.
>>>>>>> Are you adding global to whole kernel command line boot argument to debug
>>>>>>> what and when?
>>>>>>>
>>>>>>> During boot:
>>>>>>> If device success, you will see it in /sys/bus/pci/[drivers|devices]/*.
>>>>>>> If device fails, you should get an error from that device (fix the
>>>>>>> device to return an error), or something immediately won't work and
>>>>>>> you won't see it in sysfs.
>>>>>>
>>>>>> What if there is a panic during boot?  There's no way to get to sysfs.
>>>>>> That's the case where this is helpful.
>>>>>
>>>>> How? If you have kernel panic, it means you have much more worse problem
>>>>> than not-supported device. If kernel panic was caused by the driver, you
>>>>> will see call trace related to it. If kernel panic was caused by
>>>>> something else, supported/not supported won't help here.
>>>>
>>>> I still have no idea *WHICH* device it was that the panic occurred on.
>>>
>>> The kernel panic is printed from the driver. There is one driver loaded
>>> for all same PCI devices which are probed without relation to their
>>> number.>
>>> If you have host with ten same cards, you will see one driver and this
>>> is where the problem and not in supported/not-supported device.
>>
>> That's true, but you can also have different cards loading the same driver.
>> See, for example, any PCI_IDs list in a driver.
>>
>> For example,
>>
>> 10:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
>> 20:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
>>
>> Both load the megaraid driver and have different profiles within the
>> driver.  I have no idea which one actually panicked until removing
>> one card.
>>
>> It's MUCH worse when debugging new hardware and getting a panic
>> from, for example, the uncore code which binds to a PCI mapped
>> device.  One device might work and the next one doesn't.  And then
>> you can multiply that by seeing *many* panics at once and trying to
>> determine if the problem was on one specific socket, die, or core.
> 
> Would a dev_panic() interface that identified the device and driver
> help with this?
> 

^^ the more I look at this problem, the more a dev_panic() that would output a
device specific message at panic time is what I really need.

> For driver_load_messages, it doesn't seem necessarily PCI-specific.
> If we want a message like that, maybe it could be in
> driver_probe_device() or similar?  There are already a few pr_debug()
> calls in that path.  There are some enabled by initcall_debug that
> include the return value from the probe; would those be close to what
> you're looking for?

I took a look at those, and unfortunately they do not meet my requirements.
Ultimately, at panic time, I need to know that a driver was loaded on a device
at a specific location in the PCI space.

The driver_probe_device() pr_debug calls tell me the location and the driver,
but not anything to uniquely identify the device (ie, the PCI vendor and device
IDs).

It sounds like you've had some thoughts about a dev_panic() implementation.
Care to share them with me?  I'm more than willing to implement it but just want
to get your more experienced view of what is needed.

P.

> 
> Bjorn
> 


  parent reply	other threads:[~2021-02-18 19:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25 19:41 [PATCH] pci-driver: Add driver load messages Prarit Bhargava
2021-01-26  6:39 ` Leon Romanovsky
2021-01-26 12:54   ` Prarit Bhargava
2021-01-26 13:14     ` Leon Romanovsky
2021-01-26 13:42       ` Prarit Bhargava
2021-01-26 13:53         ` Leon Romanovsky
2021-01-26 14:05           ` Prarit Bhargava
2021-01-26 15:12             ` Bjorn Helgaas
2021-01-29 18:38               ` Prarit Bhargava
2021-02-18 18:36               ` Prarit Bhargava [this message]
2021-02-18 19:06                 ` Bjorn Helgaas
2021-03-04 14:42                   ` Prarit Bhargava
2021-03-04 15:50                     ` Bjorn Helgaas
2021-03-05 18:20                       ` Prarit Bhargava
  -- strict thread matches above, loose matches on Subject: below --
2021-01-25 19:21 Prarit Bhargava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dea3517c-8f2f-9a18-81d2-ab6468354040@redhat.com \
    --to=prarit@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=helgaas@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mstowe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).