All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rajat Khandelwal <rajat.khandelwal@linux.intel.com>
To: Pankaj Raghav <p.raghav@samsung.com>
Cc: Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
	axboe@fb.com, sagi@grimberg.me, linux-nvme@lists.infradead.org,
	"Khandelwal, Rajat" <rajat.khandelwal@intel.com>,
	javier.gonz@samsung.com, monish.kumar.r@intel.com
Subject: Re: [BUG] nvme-pci: NVMe probe fails with ENODEV
Date: Mon, 13 Mar 2023 22:46:03 +0530	[thread overview]
Message-ID: <2b6163a8-61d5-729d-17a5-764e25ce1c07@linux.intel.com> (raw)
In-Reply-To: <20230313094944.nsonmbtpmgh4rtng@blixen>

Hi,

On 3/13/2023 3:19 PM, Pankaj Raghav wrote:
> jn Thu, Mar 09, 2023 at 11:43:33PM +0530, Rajat Khandelwal wrote:
>>>>>>> I have tried 5.10 and 6.1.15 kernels.
>>>>>> So we have a quirk for a device called Samsung X5 in core.c, which is a
>>>>>> bit of an unusual match.  Can you check that it gets applied for the
>>>>>> device that you are testing?
>>>>>>
>>>>>> Also if it gets applied, can you test this patch?
>>>>> That won't help here. The driver should be bailing on the device
>>>>> nvme_pci_enable() before we do the ready check:
>>>>>
>>>>> static int nvme_pci_enable(struct nvme_dev *dev)
>>>>> {
>>>>> ...
>>>>>            if (readl(dev->bar + NVME_REG_CSTS) == -1) {
>>>>>                    result = -ENODEV;
>>>>>                    goto disable;
>>>>>            }
>>>>>
>>>>> It sounds like the bridge has a valid memory window, and the kernel assigned it
>>>>> to the device, but for some reason the device didn't apply it to its BAR. Maybe
>>>>> the device just doesn't support hotplug?
>>>> The issue is sporadic in nature, witnessed even during reboots with the device
>>>> attached.
>>>> Is such a scenario even possible (BAR not getting written by the hardware)?
>>> It's not supposed to be possible, but your analysis checking the BAR register
>>> with setpci seems pretty convincing that that is happening.
> A bit more context on this issue FWIW:
>
> Monish contacted me a while ago regarding this issue happening in
> Samsung X5. I failed to reproduce this issue in an Intel 6th gen
> (skylake) laptop. I tried hotplugging the device multiple times but the
> device came up without any issue. That laptop used a JHL6540 Thunderbolt 3
> Bridge. I get from your email that you started seeing this issue from Alderlake.
>
> To isolate if this is an issue with the device, I repeated the same
> steps on an Apple Mac M1 but couldn't reproduce this error.

Hi, Monish is part of our team who initiated this a while ago, yes.
This is probably the first time this has been put on the open forum to gather any
useful inputs/suggestions on the kernel end.

For the first part, the issue is witnessed during reboots (cold/warm).

IIRC, the SSD was provided to the core Linux team also for reproducibility tries,
and they were able to reproduce on reboots.

>
> Unfortunately this device is already EOL, so our Firmware team is unable
> to help here.
>
> --
> Pankaj

Since the point here being BARs getting a garbage value, can we expect any traction
on this bug (keeping in mind the f/w team may not be able to help here)?

AFAIK, this device is currently commercialized, and we would want to make a decision
on whether to proceed with this or not.

Thanks
Rajat



      reply	other threads:[~2023-03-13 17:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f1ad4c1a-2871-57be-48cb-2b0e5cec1bfa@linux.intel.com>
2023-03-09 14:04 ` [BUG] nvme-pci: NVMe probe fails with ENODEV Rajat Khandelwal
2023-03-27 22:37   ` Bjorn Helgaas
2023-03-09 15:12 ` Christoph Hellwig
2023-03-09 15:24   ` Keith Busch
2023-03-09 17:06     ` Rajat Khandelwal
2023-03-09 17:24       ` Keith Busch
2023-03-09 18:13         ` Rajat Khandelwal
     [not found]           ` <CGME20230313095802eucas1p2ed9a708d3fb0fb1fac05015a6fb06b7f@eucas1p2.samsung.com>
2023-03-13  9:49             ` Pankaj Raghav
2023-03-13 17:16               ` Rajat Khandelwal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b6163a8-61d5-729d-17a5-764e25ce1c07@linux.intel.com \
    --to=rajat.khandelwal@linux.intel.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=javier.gonz@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=monish.kumar.r@intel.com \
    --cc=p.raghav@samsung.com \
    --cc=rajat.khandelwal@intel.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.