linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Leon Romanovsky <leon@kernel.org>,
	bhelgaas@google.com, corbet@lwn.net, linux-doc@vger.kernel.org,
	linux-pci@vger.kernel.org, mstowe@redhat.com
Subject: Re: [PATCH] pci-driver: Add driver load messages
Date: Fri, 5 Mar 2021 13:20:40 -0500	[thread overview]
Message-ID: <940bfcbb-2672-74b8-432b-cf7b33bc036a@redhat.com> (raw)
In-Reply-To: <20210304155040.GA844982@bjorn-Precision-5520>



On 3/4/21 10:50 AM, Bjorn Helgaas wrote:
> On Thu, Mar 04, 2021 at 09:42:44AM -0500, Prarit Bhargava wrote:
>>
>>
>> On 2/18/21 2:06 PM, Bjorn Helgaas wrote:
>>> On Thu, Feb 18, 2021 at 01:36:35PM -0500, Prarit Bhargava wrote:
>>>> On 1/26/21 10:12 AM, Bjorn Helgaas wrote:
>>>>> On Tue, Jan 26, 2021 at 09:05:23AM -0500, Prarit Bhargava wrote:
>>>>>> On 1/26/21 8:53 AM, Leon Romanovsky wrote:
>>>>>>> On Tue, Jan 26, 2021 at 08:42:12AM -0500, Prarit Bhargava wrote:
>>>>>>>> On 1/26/21 8:14 AM, Leon Romanovsky wrote:
>>>>>>>>> On Tue, Jan 26, 2021 at 07:54:46AM -0500, Prarit Bhargava wrote:
>>>>>>>>>>   Leon Romanovsky <leon@kernel.org> wrote:
>>>>>>>>>>> On Mon, Jan 25, 2021 at 02:41:38PM -0500, Prarit Bhargava wrote:
>>>>>>>>>>>> There are two situations where driver load messages are helpful.
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Some drivers silently load on devices and debugging driver or system
>>>>>>>>>>>> failures in these cases is difficult.  While some drivers (networking
>>>>>>>>>>>> for example) may not completely initialize when the PCI driver probe() function
>>>>>>>>>>>> has returned, it is still useful to have some idea of driver completion.
>>>>>>>>>>>
>>>>>>>>>>> Sorry, probably it is me, but I don't understand this use case.
>>>>>>>>>>> Are you adding global to whole kernel command line boot argument to debug
>>>>>>>>>>> what and when?
>>>>>>>>>>>
>>>>>>>>>>> During boot:
>>>>>>>>>>> If device success, you will see it in /sys/bus/pci/[drivers|devices]/*.
>>>>>>>>>>> If device fails, you should get an error from that device (fix the
>>>>>>>>>>> device to return an error), or something immediately won't work and
>>>>>>>>>>> you won't see it in sysfs.
>>>>>>>>>>
>>>>>>>>>> What if there is a panic during boot?  There's no way to get to sysfs.
>>>>>>>>>> That's the case where this is helpful.
>>>>>>>>>
>>>>>>>>> How? If you have kernel panic, it means you have much more worse problem
>>>>>>>>> than not-supported device. If kernel panic was caused by the driver, you
>>>>>>>>> will see call trace related to it. If kernel panic was caused by
>>>>>>>>> something else, supported/not supported won't help here.
>>>>>>>>
>>>>>>>> I still have no idea *WHICH* device it was that the panic occurred on.
>>>>>>>
>>>>>>> The kernel panic is printed from the driver. There is one driver loaded
>>>>>>> for all same PCI devices which are probed without relation to their
>>>>>>> number.>
>>>>>>> If you have host with ten same cards, you will see one driver and this
>>>>>>> is where the problem and not in supported/not-supported device.
>>>>>>
>>>>>> That's true, but you can also have different cards loading the same driver.
>>>>>> See, for example, any PCI_IDs list in a driver.
>>>>>>
>>>>>> For example,
>>>>>>
>>>>>> 10:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
>>>>>> 20:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
>>>>>>
>>>>>> Both load the megaraid driver and have different profiles within the
>>>>>> driver.  I have no idea which one actually panicked until removing
>>>>>> one card.
>>>>>>
>>>>>> It's MUCH worse when debugging new hardware and getting a panic
>>>>>> from, for example, the uncore code which binds to a PCI mapped
>>>>>> device.  One device might work and the next one doesn't.  And
>>>>>> then you can multiply that by seeing *many* panics at once and
>>>>>> trying to determine if the problem was on one specific socket,
>>>>>> die, or core.
>>>>>
>>>>> Would a dev_panic() interface that identified the device and
>>>>> driver help with this?
>>>>
>>>> ^^ the more I look at this problem, the more a dev_panic() that
>>>> would output a device specific message at panic time is what I
>>>> really need.
>>
>> I went down this road a bit and had a realization.  The issue isn't
>> with printing something at panic time, but the *data* that is
>> output.  Each PCI device is associated with a struct device.  That
>> device struct's name is output for dev_dbg(), etc., commands.  The
>> PCI subsystem sets the device struct name at drivers/pci/probe.c:
>> 1799
>>
>> 	        dev_set_name(&dev->dev, "%04x:%02x:%02x.%d", pci_domain_nr(dev->bus),
>>                      dev->bus->number, PCI_SLOT(dev->devfn),
>>                      PCI_FUNC(dev->devfn));
>>
>> My problem really is that the above information is insufficient when
>> I (or a user) need to debug a system.  The complexities of debugging
>> multiple broken driver loads would be much easier if I didn't have
>> to constantly add this output manually :).
> 
> This *should* already be in the dmesg log:
> 
>   pci 0000:00:00.0: [8086:5910] type 00 class 0x060000
>   pci 0000:00:01.0: [8086:1901] type 01 class 0x060400
>   pci 0000:00:02.0: [8086:591b] type 00 class 0x030000
> 
> So if you had a dev_panic(), that message would include the
> bus/device/function number, and that would be enough to find the
> vendor/device ID from when the device was first enumerated.
> 
> Or are you saying you can't get the part of the dmesg log that
> contains those vendor/device IDs?

/me hangs head in shame

I didn't notice that until now. :)

Uh thanks for the polite hit with a cluebat :)  I *think* that will work.  Let
me try some additional driver failure tests.

P.

> 
>> Would you be okay with adding a *debug* parameter to expand the
>> device name to include the vendor & device ID pair?  FWIW, I'm
>> somewhat against yet-another-kernel-option but that's really the
>> information I need.  I could then add dev_dbg() statements in the
>> local_pci_probe() function.
> 


  reply	other threads:[~2021-03-05 18:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25 19:41 [PATCH] pci-driver: Add driver load messages Prarit Bhargava
2021-01-26  6:39 ` Leon Romanovsky
2021-01-26 12:54   ` Prarit Bhargava
2021-01-26 13:14     ` Leon Romanovsky
2021-01-26 13:42       ` Prarit Bhargava
2021-01-26 13:53         ` Leon Romanovsky
2021-01-26 14:05           ` Prarit Bhargava
2021-01-26 15:12             ` Bjorn Helgaas
2021-01-29 18:38               ` Prarit Bhargava
2021-02-18 18:36               ` Prarit Bhargava
2021-02-18 19:06                 ` Bjorn Helgaas
2021-03-04 14:42                   ` Prarit Bhargava
2021-03-04 15:50                     ` Bjorn Helgaas
2021-03-05 18:20                       ` Prarit Bhargava [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-01-25 19:21 Prarit Bhargava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=940bfcbb-2672-74b8-432b-cf7b33bc036a@redhat.com \
    --to=prarit@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=helgaas@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mstowe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).