linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641)
       [not found] <20211103141454.GA30634@xsang-OptiPlex-9020>
@ 2021-11-03 19:51 ` Jens Axboe
  2021-11-03 20:52   ` Chaitanya Kulkarni
  2021-11-03 21:38   ` Keith Busch
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2021-11-03 19:51 UTC (permalink / raw)
  To: kernel test robot; +Cc: lkp, lkp, linux-block, hch

On 11/3/21 8:14 AM, kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: f9c499bbbf603389abad60d1931c16b2f96dee06 ("[PATCH 1/2] nvme: move command clear into the various setup helpers")
> url: https://github.com/0day-ci/linux/commits/Jens-Axboe/nvme-move-command-clear-into-the-various-setup-helpers/20211018-214956
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 519d81956ee277b4419c723adfb154603c2565ba
> patch link: https://lore.kernel.org/linux-block/20211018124934.235658-2-axboe@kernel.dk
> 
> in testcase: will-it-scale
> version: will-it-scale-x86_64-a34a85c-1_20211029
> with following parameters:
> 
> 	nr_task: 50%
> 	mode: process
> 	test: readseek1
> 	cpufreq_governor: performance
> 	ucode: 0x700001e
> 
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
> 
> 
> on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> 
> [   38.907274][  T868] nvme nvme0: pci function 0000:24:00.0
> [   38.924627][ T1103] scsi host0: ahci
> 0m.
> [   38.948010][  T773] nvme nvme0: Identify Controller failed (16641)
> [   38.951220][ T1103] scsi host1: ahci
> [   38.954193][  T773] nvme nvme0: Removing after probe failure status: -5

This is odd, looks like it's saying invalid opcode. Looking at the probe
path, it's pretty standard and the command passed in is cleared already.
So not quite sure why the patch would make a difference here. I'll
poke at it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641)
  2021-11-03 19:51 ` [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641) Jens Axboe
@ 2021-11-03 20:52   ` Chaitanya Kulkarni
  2021-11-03 21:38   ` Keith Busch
  1 sibling, 0 replies; 5+ messages in thread
From: Chaitanya Kulkarni @ 2021-11-03 20:52 UTC (permalink / raw)
  To: kernel test robot; +Cc: lkp, lkp, linux-block, hch, Jens Axboe

On 11/3/21 12:51, Jens Axboe wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 11/3/21 8:14 AM, kernel test robot wrote:
>>
>>
>> Greeting,
>>
>> FYI, we noticed the following commit (built with gcc-9):
>>
>> commit: f9c499bbbf603389abad60d1931c16b2f96dee06 ("[PATCH 1/2] nvme: move command clear into the various setup helpers")
>> url: https://github.com/0day-ci/linux/commits/Jens-Axboe/nvme-move-command-clear-into-the-various-setup-helpers/20211018-214956
>> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 519d81956ee277b4419c723adfb154603c2565ba
>> patch link: https://lore.kernel.org/linux-block/20211018124934.235658-2-axboe@kernel.dk
>>
>> in testcase: will-it-scale
>> version: will-it-scale-x86_64-a34a85c-1_20211029
>> with following parameters:
>>
>>        nr_task: 50%
>>        mode: process
>>        test: readseek1
>>        cpufreq_governor: performance
>>        ucode: 0x700001e
>>
>> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>> test-url: https://github.com/antonblanchard/will-it-scale
>>
>>
>> on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
>>
>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>
>>
>> [   38.907274][  T868] nvme nvme0: pci function 0000:24:00.0
>> [   38.924627][ T1103] scsi host0: ahci
>> 0m.
>> [   38.948010][  T773] nvme nvme0: Identify Controller failed (16641)
>> [   38.951220][ T1103] scsi host1: ahci
>> [   38.954193][  T773] nvme nvme0: Removing after probe failure status: -5
> 

For PCIe controller, I don't see any reason to fail the identify
controller command except this might be a controller issue.

Can you please provide what type of controller you are using ?

Also, can this be reproduced on the different controllers ?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641)
  2021-11-03 19:51 ` [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641) Jens Axboe
  2021-11-03 20:52   ` Chaitanya Kulkarni
@ 2021-11-03 21:38   ` Keith Busch
  2021-11-03 21:47     ` Keith Busch
  1 sibling, 1 reply; 5+ messages in thread
From: Keith Busch @ 2021-11-03 21:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: kernel test robot, lkp, lkp, linux-block, hch

On Wed, Nov 03, 2021 at 01:51:18PM -0600, Jens Axboe wrote:
> On 11/3/21 8:14 AM, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: f9c499bbbf603389abad60d1931c16b2f96dee06 ("[PATCH 1/2] nvme: move command clear into the various setup helpers")
> > url: https://github.com/0day-ci/linux/commits/Jens-Axboe/nvme-move-command-clear-into-the-various-setup-helpers/20211018-214956
> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 519d81956ee277b4419c723adfb154603c2565ba
> > patch link: https://lore.kernel.org/linux-block/20211018124934.235658-2-axboe@kernel.dk
> > 
> > in testcase: will-it-scale
> > version: will-it-scale-x86_64-a34a85c-1_20211029
> > with following parameters:
> > 
> > 	nr_task: 50%
> > 	mode: process
> > 	test: readseek1
> > 	cpufreq_governor: performance
> > 	ucode: 0x700001e
> > 
> > test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> > test-url: https://github.com/antonblanchard/will-it-scale
> > 
> > 
> > on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > 
> > 
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > 
> > 
> > [   38.907274][  T868] nvme nvme0: pci function 0000:24:00.0
> > [   38.924627][ T1103] scsi host0: ahci
> > 0m.
> > [   38.948010][  T773] nvme nvme0: Identify Controller failed (16641)
> > [   38.951220][ T1103] scsi host1: ahci
> > [   38.954193][  T773] nvme nvme0: Removing after probe failure status: -5
> 
> This is odd, looks like it's saying invalid opcode. Looking at the probe
> path, it's pretty standard and the command passed in is cleared already.
> So not quite sure why the patch would make a difference here. I'll
> poke at it.

It's actually an Invalid Queue Identifier error (0x4101). That error
makes no sense for an Identify command, so it sounds like the controller
observed a different opcode than the driver intended to send, which
seems odd; I didn't observe any problems and I'm pretty sure I'm running
the same code. I'll take a second look as well.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641)
  2021-11-03 21:38   ` Keith Busch
@ 2021-11-03 21:47     ` Keith Busch
  2021-11-03 23:28       ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2021-11-03 21:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: kernel test robot, lkp, lkp, linux-block, hch

On Wed, Nov 03, 2021 at 02:38:53PM -0700, Keith Busch wrote:
> On Wed, Nov 03, 2021 at 01:51:18PM -0600, Jens Axboe wrote:
> > On 11/3/21 8:14 AM, kernel test robot wrote:
> > > 
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed the following commit (built with gcc-9):
> > > 
> > > commit: f9c499bbbf603389abad60d1931c16b2f96dee06 ("[PATCH 1/2] nvme: move command clear into the various setup helpers")
> > > url: https://github.com/0day-ci/linux/commits/Jens-Axboe/nvme-move-command-clear-into-the-various-setup-helpers/20211018-214956
> > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 519d81956ee277b4419c723adfb154603c2565ba
> > > patch link: https://lore.kernel.org/linux-block/20211018124934.235658-2-axboe@kernel.dk
> > > 
> > > in testcase: will-it-scale
> > > version: will-it-scale-x86_64-a34a85c-1_20211029
> > > with following parameters:
> > > 
> > > 	nr_task: 50%
> > > 	mode: process
> > > 	test: readseek1
> > > 	cpufreq_governor: performance
> > > 	ucode: 0x700001e
> > > 
> > > test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> > > test-url: https://github.com/antonblanchard/will-it-scale
> > > 
> > > 
> > > on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
> > > 
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > > 
> > > 
> > > 
> > > 
> > > If you fix the issue, kindly add following tag
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > 
> > > 
> > > [   38.907274][  T868] nvme nvme0: pci function 0000:24:00.0
> > > [   38.924627][ T1103] scsi host0: ahci
> > > 0m.
> > > [   38.948010][  T773] nvme nvme0: Identify Controller failed (16641)
> > > [   38.951220][ T1103] scsi host1: ahci
> > > [   38.954193][  T773] nvme nvme0: Removing after probe failure status: -5
> > 
> > This is odd, looks like it's saying invalid opcode. Looking at the probe
> > path, it's pretty standard and the command passed in is cleared already.
> > So not quite sure why the patch would make a difference here. I'll
> > poke at it.
> 
> It's actually an Invalid Queue Identifier error (0x4101). That error
> makes no sense for an Identify command, so it sounds like the controller
> observed a different opcode than the driver intended to send, which
> seems odd; I didn't observe any problems and I'm pretty sure I'm running
> the same code. I'll take a second look as well.

The git url that was used in this test points to commit:

  https://github.com/0day-ci/linux/commit/f9c499bbbf603389abad60d1931c16b2f96dee06

And that commit has an extra memset in the REQ_OP_DRV_IN/OUT case, and
it doesn't belong there. I don't see that memset in the upstream commit,
Did the bot pick up the wrong patch?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641)
  2021-11-03 21:47     ` Keith Busch
@ 2021-11-03 23:28       ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2021-11-03 23:28 UTC (permalink / raw)
  To: Keith Busch; +Cc: kernel test robot, lkp, lkp, linux-block, hch

On 11/3/21 3:47 PM, Keith Busch wrote:
> On Wed, Nov 03, 2021 at 02:38:53PM -0700, Keith Busch wrote:
>> On Wed, Nov 03, 2021 at 01:51:18PM -0600, Jens Axboe wrote:
>>> On 11/3/21 8:14 AM, kernel test robot wrote:
>>>>
>>>>
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: f9c499bbbf603389abad60d1931c16b2f96dee06 ("[PATCH 1/2] nvme: move command clear into the various setup helpers")
>>>> url: https://github.com/0day-ci/linux/commits/Jens-Axboe/nvme-move-command-clear-into-the-various-setup-helpers/20211018-214956
>>>> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 519d81956ee277b4419c723adfb154603c2565ba
>>>> patch link: https://lore.kernel.org/linux-block/20211018124934.235658-2-axboe@kernel.dk
>>>>
>>>> in testcase: will-it-scale
>>>> version: will-it-scale-x86_64-a34a85c-1_20211029
>>>> with following parameters:
>>>>
>>>> 	nr_task: 50%
>>>> 	mode: process
>>>> 	test: readseek1
>>>> 	cpufreq_governor: performance
>>>> 	ucode: 0x700001e
>>>>
>>>> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>>> test-url: https://github.com/antonblanchard/will-it-scale
>>>>
>>>>
>>>> on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
>>>>
>>>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>>>
>>>>
>>>>
>>>>
>>>> If you fix the issue, kindly add following tag
>>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>>>
>>>>
>>>> [   38.907274][  T868] nvme nvme0: pci function 0000:24:00.0
>>>> [   38.924627][ T1103] scsi host0: ahci
>>>> 0m.
>>>> [   38.948010][  T773] nvme nvme0: Identify Controller failed (16641)
>>>> [   38.951220][ T1103] scsi host1: ahci
>>>> [   38.954193][  T773] nvme nvme0: Removing after probe failure status: -5
>>>
>>> This is odd, looks like it's saying invalid opcode. Looking at the probe
>>> path, it's pretty standard and the command passed in is cleared already.
>>> So not quite sure why the patch would make a difference here. I'll
>>> poke at it.
>>
>> It's actually an Invalid Queue Identifier error (0x4101). That error
>> makes no sense for an Identify command, so it sounds like the controller
>> observed a different opcode than the driver intended to send, which
>> seems odd; I didn't observe any problems and I'm pretty sure I'm running
>> the same code. I'll take a second look as well.
> 
> The git url that was used in this test points to commit:
> 
>   https://github.com/0day-ci/linux/commit/f9c499bbbf603389abad60d1931c16b2f96dee06
> 
> And that commit has an extra memset in the REQ_OP_DRV_IN/OUT case, and
> it doesn't belong there. I don't see that memset in the upstream commit,
> Did the bot pick up the wrong patch?

Ah good catch, it's picking up a previous broken version. Good question on
why that might be, that's counter productive...

In any case, we can ignore it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-03 23:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20211103141454.GA30634@xsang-OptiPlex-9020>
2021-11-03 19:51 ` [nvme] f9c499bbbf: nvme nvme0: Identify Controller failed (16641) Jens Axboe
2021-11-03 20:52   ` Chaitanya Kulkarni
2021-11-03 21:38   ` Keith Busch
2021-11-03 21:47     ` Keith Busch
2021-11-03 23:28       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).