All of lore.kernel.org
 help / color / mirror / Atom feed
* "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
@ 2022-09-07  6:05 Jaroslav Pulchart
  2022-09-07  6:29 ` Christoph Hellwig
  2022-09-08 10:41 ` "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) #forregzbot Thorsten Leemhuis
  0 siblings, 2 replies; 10+ messages in thread
From: Jaroslav Pulchart @ 2022-09-07  6:05 UTC (permalink / raw)
  To: linux-nvme; +Cc: niklas.cassel, hch

Hello,

I would like to report a regression issue in 5.19.y in NVMe driver.

The issue is reproducible at AWS EC2 instances with local NVMe storage
like "r5d.*". Kernel report "IO queues not created":
[    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
[    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
[    2.940797] nvme nvme2: IO queues not created              <- Local volume
with 5.19.y (kernel 5.19) and the nvme storage cannot be used.

I bisected the issue to commit
"aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
enable bit in a separate write). Reverting it makes the nvme device
working again:
[    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
[    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
[    3.040040] nvme nvme1: 2/0/0 default/read/poll queues

Best,
--
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-07  6:05 "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) Jaroslav Pulchart
@ 2022-09-07  6:29 ` Christoph Hellwig
  2022-09-07  9:13   ` Sironi, Filippo
  2022-09-08 10:41 ` "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2022-09-07  6:29 UTC (permalink / raw)
  To: Jaroslav Pulchart; +Cc: linux-nvme, niklas.cassel, Filippo Sironi

Filippo,

can you help figuring out what is going on with this Amazon controller?

On Wed, Sep 07, 2022 at 08:05:05AM +0200, Jaroslav Pulchart wrote:
> Hello,
> 
> I would like to report a regression issue in 5.19.y in NVMe driver.
> 
> The issue is reproducible at AWS EC2 instances with local NVMe storage
> like "r5d.*". Kernel report "IO queues not created":
> [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
> [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
> [    2.940797] nvme nvme2: IO queues not created              <- Local volume
> with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
> 
> I bisected the issue to commit
> "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
> enable bit in a separate write). Reverting it makes the nvme device
> working again:
> [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
> [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
> [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues
> 
> Best,
> --
> Jaroslav Pulchart
> Sr. Principal SW Engineer
> GoodData
---end quoted text---


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-07  6:29 ` Christoph Hellwig
@ 2022-09-07  9:13   ` Sironi, Filippo
  2022-09-07  9:58     ` Sironi, Filippo
  0 siblings, 1 reply; 10+ messages in thread
From: Sironi, Filippo @ 2022-09-07  9:13 UTC (permalink / raw)
  To: Christoph Hellwig, Jaroslav Pulchart, Greenberg, Aviv
  Cc: linux-nvme, niklas.cassel

I'm aware of a customer contact regarding this issue and I know that the team responsible for the NVMe implementation is investigating.
Adding Aviv since he's closer to this space than I am and may have more insights into how the investigation is progressing. 

On 07.09.22, 08:30, "Christoph Hellwig" <hch@lst.de> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Filippo,

    can you help figuring out what is going on with this Amazon controller?

    On Wed, Sep 07, 2022 at 08:05:05AM +0200, Jaroslav Pulchart wrote:
    > Hello,
    >
    > I would like to report a regression issue in 5.19.y in NVMe driver.
    >
    > The issue is reproducible at AWS EC2 instances with local NVMe storage
    > like "r5d.*". Kernel report "IO queues not created":
    > [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
    > [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
    > [    2.940797] nvme nvme2: IO queues not created              <- Local volume
    > with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
    >
    > I bisected the issue to commit
    > "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
    > enable bit in a separate write). Reverting it makes the nvme device
    > working again:
    > [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
    > [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
    > [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues
    >
    > Best,
    > --
    > Jaroslav Pulchart
    > Sr. Principal SW Engineer
    > GoodData
    ---end quoted text---




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-07  9:13   ` Sironi, Filippo
@ 2022-09-07  9:58     ` Sironi, Filippo
  2022-09-20  9:41       ` Thorsten Leemhuis
  0 siblings, 1 reply; 10+ messages in thread
From: Sironi, Filippo @ 2022-09-07  9:58 UTC (permalink / raw)
  To: Christoph Hellwig, Jaroslav Pulchart, Greenberg, Aviv, Dutta,
	Soumyaroop, Priescu, Valentin, Machulsky, Zorik
  Cc: linux-nvme, niklas.cassel

Adding more folks (Soumyaroop, Valentin, and Zorik) involved in the investigation, which is by now concluded.

On 07.09.22, 11:13, "Sironi, Filippo" <sironi@amazon.de> wrote:

    I'm aware of a customer contact regarding this issue and I know that the team responsible for the NVMe implementation is investigating.
    Adding Aviv since he's closer to this space than I am and may have more insights into how the investigation is progressing. 

    On 07.09.22, 08:30, "Christoph Hellwig" <hch@lst.de> wrote:

        CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



        Filippo,

        can you help figuring out what is going on with this Amazon controller?

        On Wed, Sep 07, 2022 at 08:05:05AM +0200, Jaroslav Pulchart wrote:
        > Hello,
        >
        > I would like to report a regression issue in 5.19.y in NVMe driver.
        >
        > The issue is reproducible at AWS EC2 instances with local NVMe storage
        > like "r5d.*". Kernel report "IO queues not created":
        > [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
        > [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
        > [    2.940797] nvme nvme2: IO queues not created              <- Local volume
        > with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
        >
        > I bisected the issue to commit
        > "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
        > enable bit in a separate write). Reverting it makes the nvme device
        > working again:
        > [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
        > [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
        > [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues
        >
        > Best,
        > --
        > Jaroslav Pulchart
        > Sr. Principal SW Engineer
        > GoodData
        ---end quoted text---





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) #forregzbot
  2022-09-07  6:05 "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) Jaroslav Pulchart
  2022-09-07  6:29 ` Christoph Hellwig
@ 2022-09-08 10:41 ` Thorsten Leemhuis
  2022-09-21 10:03   ` Thorsten Leemhuis
  1 sibling, 1 reply; 10+ messages in thread
From: Thorsten Leemhuis @ 2022-09-08 10:41 UTC (permalink / raw)
  To: regressions; +Cc: linux-nvme

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.

On 07.09.22 08:05, Jaroslav Pulchart wrote:
> Hello,
> 
> I would like to report a regression issue in 5.19.y in NVMe driver.
> 
> The issue is reproducible at AWS EC2 instances with local NVMe storage
> like "r5d.*". Kernel report "IO queues not created":
> [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
> [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
> [    2.940797] nvme nvme2: IO queues not created              <- Local volume
> with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
> 
> I bisected the issue to commit
> "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
> enable bit in a separate write). Reverting it makes the nvme device
> working again:
> [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
> [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
> [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues

CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced aa41d2fe60ee2e4
#regzbot title block: nvme: AWS EC2 instances with local NVMe storage fail
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-07  9:58     ` Sironi, Filippo
@ 2022-09-20  9:41       ` Thorsten Leemhuis
  2022-09-20 19:42         ` Keith Busch
  2022-09-20 19:53         ` Dutta, Soumyaroop
  0 siblings, 2 replies; 10+ messages in thread
From: Thorsten Leemhuis @ 2022-09-20  9:41 UTC (permalink / raw)
  To: Sironi, Filippo, Christoph Hellwig, Jaroslav Pulchart, Greenberg,
	Aviv, Dutta, Soumyaroop, Priescu, Valentin, Machulsky, Zorik
  Cc: linux-nvme, niklas.cassel

Hi, this is your Linux kernel regression tracker.

On 07.09.22 11:58, Sironi, Filippo wrote:
> Adding more folks (Soumyaroop, Valentin, and Zorik) involved in the investigation, which is by now concluded.

Has any progress been made to get this regression fixed? I might be
missing something, but from here it looks like nothing happened since
two weeks. Thing is: ideally it shouldn't take this long to fix
regressions in production releases, as explained in
https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke

> On 07.09.22, 11:13, "Sironi, Filippo" <sironi@amazon.de> wrote:
> 
>     I'm aware of a customer contact regarding this issue and I know that the team responsible for the NVMe implementation is investigating.
>     Adding Aviv since he's closer to this space than I am and may have more insights into how the investigation is progressing. 
> 
>     On 07.09.22, 08:30, "Christoph Hellwig" <hch@lst.de> wrote:
> 
>         CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
>         Filippo,
> 
>         can you help figuring out what is going on with this Amazon controller?
> 
>         On Wed, Sep 07, 2022 at 08:05:05AM +0200, Jaroslav Pulchart wrote:
>         > Hello,
>         >
>         > I would like to report a regression issue in 5.19.y in NVMe driver.
>         >
>         > The issue is reproducible at AWS EC2 instances with local NVMe storage
>         > like "r5d.*". Kernel report "IO queues not created":
>         > [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
>         > [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
>         > [    2.940797] nvme nvme2: IO queues not created              <- Local volume
>         > with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
>         >
>         > I bisected the issue to commit
>         > "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
>         > enable bit in a separate write). Reverting it makes the nvme device
>         > working again:
>         > [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
>         > [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
>         > [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues
>         >
>         > Best,
>         > --
>         > Jaroslav Pulchart
>         > Sr. Principal SW Engineer
>         > GoodData
>         ---end quoted text---
> 
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-20  9:41       ` Thorsten Leemhuis
@ 2022-09-20 19:42         ` Keith Busch
  2022-09-21  8:59           ` Sironi, Filippo
  2022-09-20 19:53         ` Dutta, Soumyaroop
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2022-09-20 19:42 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Sironi, Filippo, Christoph Hellwig, Jaroslav Pulchart, Greenberg,
	Aviv, Dutta, Soumyaroop, Priescu, Valentin, Machulsky, Zorik,
	linux-nvme, niklas.cassel

On Tue, Sep 20, 2022 at 11:41:58AM +0200, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
> 
> On 07.09.22 11:58, Sironi, Filippo wrote:
> > Adding more folks (Soumyaroop, Valentin, and Zorik) involved in the investigation, which is by now concluded.
> 
> Has any progress been made to get this regression fixed? I might be
> missing something, but from here it looks like nothing happened since
> two weeks. Thing is: ideally it shouldn't take this long to fix
> regressions in production releases, as explained in
> https://docs.kernel.org/process/handling-regressions.html

This is a device bug, not a kernel one. The ideal fix will come from the
device's vendor to bring itself into protocol complaince.

We do work around these types of problems when necessary, but we usually want a
statement from the vendor that they can't/won't fix it before we create new
quirks to maintain. The vendor has said they are investigating this and will
update with their conclusions. As far as I know, the ball is still in their
court.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-20  9:41       ` Thorsten Leemhuis
  2022-09-20 19:42         ` Keith Busch
@ 2022-09-20 19:53         ` Dutta, Soumyaroop
  1 sibling, 0 replies; 10+ messages in thread
From: Dutta, Soumyaroop @ 2022-09-20 19:53 UTC (permalink / raw)
  To: Thorsten Leemhuis, Sironi, Filippo, Christoph Hellwig,
	Jaroslav Pulchart, Greenberg, Aviv, Priescu, Valentin, Machulsky,
	Zorik
  Cc: linux-nvme, niklas.cassel

Hello,

We apologize for the delayed reply. Our Investigation and testing of the proposed fix had concluded since then. 
We Identified a latent issue in our NVMe Controller firmware that, coupled with the recent changes in Linux 5.19 resulted into the IO queues not being created after controller enablement.
We have identified the fix, and have it tested through different versions. The new release, which includes this fix is now under deployment, and should be available across the AWS fleet over the next few weeks.

Thanks
Soumyaroop

On 9/20/22, 2:42 AM, "Thorsten Leemhuis" <regressions@leemhuis.info> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Hi, this is your Linux kernel regression tracker.

    On 07.09.22 11:58, Sironi, Filippo wrote:
    > Adding more folks (Soumyaroop, Valentin, and Zorik) involved in the investigation, which is by now concluded.

    Has any progress been made to get this regression fixed? I might be
    missing something, but from here it looks like nothing happened since
    two weeks. Thing is: ideally it shouldn't take this long to fix
    regressions in production releases, as explained in
    https://docs.kernel.org/process/handling-regressions.html

    Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

    P.S.: As the Linux kernel's regression tracker I deal with a lot of
    reports and sometimes miss something important when writing mails like
    this. If that's the case here, don't hesitate to tell me in a public
    reply, it's in everyone's interest to set the public record straight.

    #regzbot poke

    > On 07.09.22, 11:13, "Sironi, Filippo" <sironi@amazon.de> wrote:
    >
    >     I'm aware of a customer contact regarding this issue and I know that the team responsible for the NVMe implementation is investigating.
    >     Adding Aviv since he's closer to this space than I am and may have more insights into how the investigation is progressing.
    >
    >     On 07.09.22, 08:30, "Christoph Hellwig" <hch@lst.de> wrote:
    >
    >         CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >
    >
    >
    >         Filippo,
    >
    >         can you help figuring out what is going on with this Amazon controller?
    >
    >         On Wed, Sep 07, 2022 at 08:05:05AM +0200, Jaroslav Pulchart wrote:
    >         > Hello,
    >         >
    >         > I would like to report a regression issue in 5.19.y in NVMe driver.
    >         >
    >         > The issue is reproducible at AWS EC2 instances with local NVMe storage
    >         > like "r5d.*". Kernel report "IO queues not created":
    >         > [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
    >         > [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
    >         > [    2.940797] nvme nvme2: IO queues not created              <- Local volume
    >         > with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
    >         >
    >         > I bisected the issue to commit
    >         > "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
    >         > enable bit in a separate write). Reverting it makes the nvme device
    >         > working again:
    >         > [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
    >         > [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
    >         > [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues
    >         >
    >         > Best,
    >         > --
    >         > Jaroslav Pulchart
    >         > Sr. Principal SW Engineer
    >         > GoodData
    >         ---end quoted text---
    >
    >
    >
    >
    >
    > Amazon Development Center Germany GmbH
    > Krausenstr. 38
    > 10117 Berlin
    > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
    > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
    > Sitz: Berlin
    > Ust-ID: DE 289 237 879
    >
    >


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected)
  2022-09-20 19:42         ` Keith Busch
@ 2022-09-21  8:59           ` Sironi, Filippo
  0 siblings, 0 replies; 10+ messages in thread
From: Sironi, Filippo @ 2022-09-21  8:59 UTC (permalink / raw)
  To: Keith Busch, Thorsten Leemhuis
  Cc: Christoph Hellwig, Jaroslav Pulchart, Greenberg, Aviv, Dutta,
	Soumyaroop, Priescu, Valentin, Machulsky, Zorik, linux-nvme,
	niklas.cassel

On 20.09.22, 21:57, "Keith Busch" <kbusch@kernel.org> wrote:
> On Tue, Sep 20, 2022 at 11:41:58AM +0200, Thorsten Leemhuis wrote:
>  > Hi, this is your Linux kernel regression tracker.
>  >
>  > On 07.09.22 11:58, Sironi, Filippo wrote:
>  > > Adding more folks (Soumyaroop, Valentin, and Zorik) involved in the investigation, which is by now concluded.
>  >
>  > Has any progress been made to get this regression fixed? I might be
>  > missing something, but from here it looks like nothing happened since
>  > two weeks. Thing is: ideally it shouldn't take this long to fix
>  > regressions in production releases, as explained in
>  > https://docs.kernel.org/process/handling-regressions.html
>
>  This is a device bug, not a kernel one. The ideal fix will come from the
>  device's vendor to bring itself into protocol complaince.
>
>  We do work around these types of problems when necessary, but we usually want a
>  statement from the vendor that they can't/won't fix it before we create new
>  quirks to maintain. The vendor has said they are investigating this and will
>  update with their conclusions. As far as I know, the ball is still in their
>  court.

Soumyaroop replied a few minutes before your email.

This has been root caused to an NVMe controller firmware latent issue
that was uncovered with the recent changes in Linux 5.19. We identified
a fix for this issue and it is rolling out in our fleet as we speak. The
rollout will finish in the next few weeks.




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) #forregzbot
  2022-09-08 10:41 ` "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) #forregzbot Thorsten Leemhuis
@ 2022-09-21 10:03   ` Thorsten Leemhuis
  0 siblings, 0 replies; 10+ messages in thread
From: Thorsten Leemhuis @ 2022-09-21 10:03 UTC (permalink / raw)
  To: regressions; +Cc: linux-nvme

On 08.09.22 12:41, Thorsten Leemhuis wrote:
> TWIMC: this mail is primarily send for documentation purposes and for
> regzbot, my Linux kernel regression tracking bot. These mails usually
> contain '#forregzbot' in the subject, to make them easy to spot and filter.
> 
> On 07.09.22 08:05, Jaroslav Pulchart wrote:
>> Hello,
>>
>> I would like to report a regression issue in 5.19.y in NVMe driver.
>>
>> The issue is reproducible at AWS EC2 instances with local NVMe storage
>> like "r5d.*". Kernel report "IO queues not created":
>> [    2.936641] nvme nvme0: 2/0/0 default/read/poll queues  <- EBS volume
>> [    2.939493] nvme nvme1: 2/0/0 default/read/poll queues  <- EBS volume
>> [    2.940797] nvme nvme2: IO queues not created              <- Local volume
>> with 5.19.y (kernel 5.19) and the nvme storage cannot be used.
>>
>> I bisected the issue to commit
>> "aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d" (nvme: set controller
>> enable bit in a separate write). Reverting it makes the nvme device
>> working again:
>> [    3.025599] nvme nvme0: 2/0/0 default/read/poll queues
>> [    3.032467] nvme nvme2: 8/0/0 default/read/poll queues
>> [    3.040040] nvme nvme1: 2/0/0 default/read/poll queues
> 
> #regzbot ^introduced aa41d2fe60ee2e4
> #regzbot title block: nvme: AWS EC2 instances with local NVMe storage fail
> #regzbot ignore-activity

#regzbot invalid: will be fixed by a rollout of a new firmware, for
details see the answers to this mail:
https://lore.kernel.org/linux-nvme/18e67378-1365-5e36-981b-13cada73bcda@leemhuis.info/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-21 10:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07  6:05 "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) Jaroslav Pulchart
2022-09-07  6:29 ` Christoph Hellwig
2022-09-07  9:13   ` Sironi, Filippo
2022-09-07  9:58     ` Sironi, Filippo
2022-09-20  9:41       ` Thorsten Leemhuis
2022-09-20 19:42         ` Keith Busch
2022-09-21  8:59           ` Sironi, Filippo
2022-09-20 19:53         ` Dutta, Soumyaroop
2022-09-08 10:41 ` "nvme nvmeX: IO queues not created" with "Amazon.com, Inc. NVMe SSD Controller" from 5.19.y (issue bisected) #forregzbot Thorsten Leemhuis
2022-09-21 10:03   ` Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.