All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
@ 2016-11-01 15:27 Rashika Kheria
  2016-11-12 17:41 ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Rashika Kheria @ 2016-11-01 15:27 UTC (permalink / raw)
  To: linux-kernel

Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
introduced a regression in which it did not replace nvme_dev_unmap()
with nvme_pci_disable() in the error path of nvme_probe_work().

This led to the following NVMe driver crash on systems where the devices
did not initialise in the first try.

BUG: unable to handle kernel paging request at ffffc90006da001c
IP: [<ffffffffa027b6bb>] nvme_dev_remove+0x5b/0xf0 [nvme]
RIP: e030:[<ffffffffa027b6bb>]  [<ffffffffa027b6bb>]
nvme_dev_remove+0x5b/0xf0 [nvme]
RSP: e02b:ffff8806659c3cb8  EFLAGS: 00010286
RAX: ffffc90006da0000 RBX: ffff88067cbc3000 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8806864eda40
RBP: ffff8806659c3cd8 R08: 0000000000000006 R09: 000000000000fffe
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88067e087000
R13: ffffffffa0281d20 R14: ffff88067e087098 R15: ffff8806799d8598
FS:  00007f880d5ba700(0000) GS:ffff8806864e0000(0000)
knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90006da001c CR3: 0000000676a97000 CR4: 0000000000042660
Call Trace:
[<ffffffffa027b7ea>] nvme_remove+0x9a/0x140 [nvme]
[<ffffffff813503ef>] pci_device_remove+0x3f/0xc0
[<ffffffff81449869>] ? __pm_runtime_idle+0x89/0x90
[<ffffffff8143ed4f>] __device_release_driver+0xaf/0x140
[<ffffffff8143eec8>] device_release_driver+0x28/0x40
[<ffffffff8143db66>] unbind_store+0x96/0xb0
[<ffffffff8143d027>] drv_attr_store+0x27/0x30
[<ffffffff8122e279>] sysfs_kf_write+0x39/0x40
[<ffffffff8122d9e4>] kernfs_fop_write+0xe4/0x160
[<ffffffff811b15df>] __vfs_write+0x2f/0x100
[<ffffffff81003640>] ? syscall_slow_exit_work+0x140/0x180
[<ffffffff81161db9>] ? vm_mmap_pgoff+0xb9/0xe0
[<ffffffff810af981>] ? percpu_down_read+0x11/0x60
[<ffffffff811b2bce>] vfs_write+0xbe/0x190
[<ffffffff811b2d81>] SyS_write+0x51/0xb0
[<ffffffff815b8aee>] entry_SYSCALL_64_fastpath+0x12/0x71

Cc: stable@vger.kernel.org # 4.4.y
Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: linux-nvme@lists.infradead.org
Fixes: d5537e988eec ("NVMe: Don't unmap controller registers on reset")
Signed-off-by: Rashika Kheria <rashika@amazon.de>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c851bc5..f5d1579 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3184,7 +3184,7 @@ static void nvme_probe_work(struct work_struct *work)
 	nvme_disable_queue(dev, 0);
 	nvme_dev_list_remove(dev);
  unmap:
-	nvme_dev_unmap(dev);
+	nvme_pci_disable(dev);
  out:
 	if (!work_busy(&dev->reset_work))
 		nvme_dead_ctrl(dev);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
  2016-11-01 15:27 [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work Rashika Kheria
@ 2016-11-12 17:41 ` Christoph Hellwig
  2016-11-14  8:57   ` Rashika Kheria
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-11-12 17:41 UTC (permalink / raw)


Bouncing to Keith and linux-nvme

On Tue, Nov 01, 2016@04:27:56PM +0100, Rashika Kheria wrote:
> Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
> introduced a regression in which it did not replace nvme_dev_unmap()
> with nvme_pci_disable() in the error path of nvme_probe_work().
> 
> This led to the following NVMe driver crash on systems where the devices
> did not initialise in the first try.
> 
> BUG: unable to handle kernel paging request at ffffc90006da001c
> IP: [<ffffffffa027b6bb>] nvme_dev_remove+0x5b/0xf0 [nvme]
> RIP: e030:[<ffffffffa027b6bb>]  [<ffffffffa027b6bb>]
> nvme_dev_remove+0x5b/0xf0 [nvme]
> RSP: e02b:ffff8806659c3cb8  EFLAGS: 00010286
> RAX: ffffc90006da0000 RBX: ffff88067cbc3000 RCX: 0000000000000006
> RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8806864eda40
> RBP: ffff8806659c3cd8 R08: 0000000000000006 R09: 000000000000fffe
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88067e087000
> R13: ffffffffa0281d20 R14: ffff88067e087098 R15: ffff8806799d8598
> FS:  00007f880d5ba700(0000) GS:ffff8806864e0000(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffc90006da001c CR3: 0000000676a97000 CR4: 0000000000042660
> Call Trace:
> [<ffffffffa027b7ea>] nvme_remove+0x9a/0x140 [nvme]
> [<ffffffff813503ef>] pci_device_remove+0x3f/0xc0
> [<ffffffff81449869>] ? __pm_runtime_idle+0x89/0x90
> [<ffffffff8143ed4f>] __device_release_driver+0xaf/0x140
> [<ffffffff8143eec8>] device_release_driver+0x28/0x40
> [<ffffffff8143db66>] unbind_store+0x96/0xb0
> [<ffffffff8143d027>] drv_attr_store+0x27/0x30
> [<ffffffff8122e279>] sysfs_kf_write+0x39/0x40
> [<ffffffff8122d9e4>] kernfs_fop_write+0xe4/0x160
> [<ffffffff811b15df>] __vfs_write+0x2f/0x100
> [<ffffffff81003640>] ? syscall_slow_exit_work+0x140/0x180
> [<ffffffff81161db9>] ? vm_mmap_pgoff+0xb9/0xe0
> [<ffffffff810af981>] ? percpu_down_read+0x11/0x60
> [<ffffffff811b2bce>] vfs_write+0xbe/0x190
> [<ffffffff811b2d81>] SyS_write+0x51/0xb0
> [<ffffffff815b8aee>] entry_SYSCALL_64_fastpath+0x12/0x71
> 
> Cc: stable at vger.kernel.org # 4.4.y
> Cc: Jens Axboe <axboe at fb.com>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Gabriel Krisman Bertazi <krisman at linux.vnet.ibm.com>
> Cc: linux-nvme at lists.infradead.org
> Fixes: d5537e988eec ("NVMe: Don't unmap controller registers on reset")
> Signed-off-by: Rashika Kheria <rashika at amazon.de>
> ---
>  drivers/nvme/host/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c851bc5..f5d1579 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3184,7 +3184,7 @@ static void nvme_probe_work(struct work_struct *work)
>  	nvme_disable_queue(dev, 0);
>  	nvme_dev_list_remove(dev);
>   unmap:
> -	nvme_dev_unmap(dev);
> +	nvme_pci_disable(dev);
>   out:
>  	if (!work_busy(&dev->reset_work))
>  		nvme_dead_ctrl(dev);
> -- 
> 2.10.2
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
  2016-11-12 17:41 ` Christoph Hellwig
@ 2016-11-14  8:57   ` Rashika Kheria
  2016-11-14 13:21     ` Gabriel Krisman Bertazi
  2016-11-14 18:47     ` Keith Busch
  0 siblings, 2 replies; 7+ messages in thread
From: Rashika Kheria @ 2016-11-14  8:57 UTC (permalink / raw)


Hi everyone,

Could you please review the following patch? This solves a regression in 
stable 4.4.y tree.


On 11/12/16 18:41, Christoph Hellwig wrote:
> Bouncing to Keith and linux-nvme
>
> On Tue, Nov 01, 2016@04:27:56PM +0100, Rashika Kheria wrote:
>> Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
>> introduced a regression in which it did not replace nvme_dev_unmap()
>> with nvme_pci_disable() in the error path of nvme_probe_work().
>>
>> This led to the following NVMe driver crash on systems where the devices
>> did not initialise in the first try.
>>
>> BUG: unable to handle kernel paging request at ffffc90006da001c
>> IP: [<ffffffffa027b6bb>] nvme_dev_remove+0x5b/0xf0 [nvme]
>> RIP: e030:[<ffffffffa027b6bb>]  [<ffffffffa027b6bb>]
>> nvme_dev_remove+0x5b/0xf0 [nvme]
>> RSP: e02b:ffff8806659c3cb8  EFLAGS: 00010286
>> RAX: ffffc90006da0000 RBX: ffff88067cbc3000 RCX: 0000000000000006
>> RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8806864eda40
>> RBP: ffff8806659c3cd8 R08: 0000000000000006 R09: 000000000000fffe
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88067e087000
>> R13: ffffffffa0281d20 R14: ffff88067e087098 R15: ffff8806799d8598
>> FS:  00007f880d5ba700(0000) GS:ffff8806864e0000(0000)
>> knlGS:0000000000000000
>> CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffffc90006da001c CR3: 0000000676a97000 CR4: 0000000000042660
>> Call Trace:
>> [<ffffffffa027b7ea>] nvme_remove+0x9a/0x140 [nvme]
>> [<ffffffff813503ef>] pci_device_remove+0x3f/0xc0
>> [<ffffffff81449869>] ? __pm_runtime_idle+0x89/0x90
>> [<ffffffff8143ed4f>] __device_release_driver+0xaf/0x140
>> [<ffffffff8143eec8>] device_release_driver+0x28/0x40
>> [<ffffffff8143db66>] unbind_store+0x96/0xb0
>> [<ffffffff8143d027>] drv_attr_store+0x27/0x30
>> [<ffffffff8122e279>] sysfs_kf_write+0x39/0x40
>> [<ffffffff8122d9e4>] kernfs_fop_write+0xe4/0x160
>> [<ffffffff811b15df>] __vfs_write+0x2f/0x100
>> [<ffffffff81003640>] ? syscall_slow_exit_work+0x140/0x180
>> [<ffffffff81161db9>] ? vm_mmap_pgoff+0xb9/0xe0
>> [<ffffffff810af981>] ? percpu_down_read+0x11/0x60
>> [<ffffffff811b2bce>] vfs_write+0xbe/0x190
>> [<ffffffff811b2d81>] SyS_write+0x51/0xb0
>> [<ffffffff815b8aee>] entry_SYSCALL_64_fastpath+0x12/0x71
>>
>> Cc: stable at vger.kernel.org # 4.4.y
>> Cc: Jens Axboe <axboe at fb.com>
>> Cc: Keith Busch <keith.busch at intel.com>
>> Cc: Gabriel Krisman Bertazi <krisman at linux.vnet.ibm.com>
>> Cc: linux-nvme at lists.infradead.org
>> Fixes: d5537e988eec ("NVMe: Don't unmap controller registers on reset")
>> Signed-off-by: Rashika Kheria <rashika at amazon.de>
>> ---
>>   drivers/nvme/host/pci.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index c851bc5..f5d1579 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3184,7 +3184,7 @@ static void nvme_probe_work(struct work_struct *work)
>>   	nvme_disable_queue(dev, 0);
>>   	nvme_dev_list_remove(dev);
>>    unmap:
>> -	nvme_dev_unmap(dev);
>> +	nvme_pci_disable(dev);
>>    out:
>>   	if (!work_busy(&dev->reset_work))
>>   		nvme_dead_ctrl(dev);
>> -- 
>> 2.10.2
>>
> ---end quoted text---

-- 
Regards,
Rashika

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
  2016-11-14  8:57   ` Rashika Kheria
@ 2016-11-14 13:21     ` Gabriel Krisman Bertazi
  2016-11-14 14:02       ` Rashika Kheria
  2016-11-14 18:47     ` Keith Busch
  1 sibling, 1 reply; 7+ messages in thread
From: Gabriel Krisman Bertazi @ 2016-11-14 13:21 UTC (permalink / raw)


Rashika Kheria <rashika at amazon.com> writes:

> Hi everyone,
>
> Could you please review the following patch? This solves a regression in
> stable 4.4.y tree.
>
>
> On 11/12/16 18:41, Christoph Hellwig wrote:
>> Bouncing to Keith and linux-nvme
>>
>> On Tue, Nov 01, 2016@04:27:56PM +0100, Rashika Kheria wrote:
>>> Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
>>> introduced a regression in which it did not replace nvme_dev_unmap()
>>> with nvme_pci_disable() in the error path of nvme_probe_work().
>>>

Hmm, the original commit had the same issue, which I think was fixed
upstream by f58944e265d4 ("NVMe: Simplify device reset failure"), which
was included in 4.5-rc7.  Isn't the upstream commit a better candidate
for -stable?  It's a bit larger but the commit message says it may
prevent other issues too.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
  2016-11-14 13:21     ` Gabriel Krisman Bertazi
@ 2016-11-14 14:02       ` Rashika Kheria
  0 siblings, 0 replies; 7+ messages in thread
From: Rashika Kheria @ 2016-11-14 14:02 UTC (permalink / raw)



On 11/14/16 14:21, Gabriel Krisman Bertazi wrote:
> Rashika Kheria <rashika at amazon.com> writes:
>
>> Hi everyone,
>>
>> Could you please review the following patch? This solves a regression in
>> stable 4.4.y tree.
>>
>>
>> On 11/12/16 18:41, Christoph Hellwig wrote:
>>> Bouncing to Keith and linux-nvme
>>>
>>> On Tue, Nov 01, 2016@04:27:56PM +0100, Rashika Kheria wrote:
>>>> Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
>>>> introduced a regression in which it did not replace nvme_dev_unmap()
>>>> with nvme_pci_disable() in the error path of nvme_probe_work().
>>>>
> Hmm, the original commit had the same issue, which I think was fixed
> upstream by f58944e265d4 ("NVMe: Simplify device reset failure"), which
> was included in 4.5-rc7.  Isn't the upstream commit a better candidate
> for -stable?  It's a bit larger but the commit message says it may
> prevent other issues too.
>
I agree that the upstream commit does not have this issue. However, this 
patch does not apply cleanly on -stable 4.4 tree and might need 
ingestion of multiple other related patches. I am not sure if upstream 
is open to ingest patches other than bug fix in -stable branches.

-- 
Regards,
Rashika

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
  2016-11-14  8:57   ` Rashika Kheria
  2016-11-14 13:21     ` Gabriel Krisman Bertazi
@ 2016-11-14 18:47     ` Keith Busch
  2016-12-22 13:16       ` Fw: " Kheria, Rashika
  1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2016-11-14 18:47 UTC (permalink / raw)


On Mon, Nov 14, 2016@09:57:27AM +0100, Rashika Kheria wrote:
> Hi everyone,
> 
> Could you please review the following patch? This solves a regression in
> stable 4.4.y tree.

I missed the "Don't unmap" back-port to 4.4.y. I'm not sure, but I think
we may have addressed that differently with something less risky if we
needed that behaviour on 4.4-stable. That's okay, though, this new patch
looks correct. The original was part of a series that fixes this in its
following commit, but it should have looked like this from the beginning.

Acked-by: Keith Busch <keith.busch at intel.com>

 
> > On Tue, Nov 01, 2016@04:27:56PM +0100, Rashika Kheria wrote:
> > > Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
> > > introduced a regression in which it did not replace nvme_dev_unmap()
> > > with nvme_pci_disable() in the error path of nvme_probe_work().
> > > 
> > > This led to the following NVMe driver crash on systems where the devices
> > > did not initialise in the first try.
> > > 
> > > BUG: unable to handle kernel paging request at ffffc90006da001c
> > > IP: [<ffffffffa027b6bb>] nvme_dev_remove+0x5b/0xf0 [nvme]
> > > RIP: e030:[<ffffffffa027b6bb>]  [<ffffffffa027b6bb>]
> > > nvme_dev_remove+0x5b/0xf0 [nvme]
> > > RSP: e02b:ffff8806659c3cb8  EFLAGS: 00010286
> > > RAX: ffffc90006da0000 RBX: ffff88067cbc3000 RCX: 0000000000000006
> > > RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8806864eda40
> > > RBP: ffff8806659c3cd8 R08: 0000000000000006 R09: 000000000000fffe
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88067e087000
> > > R13: ffffffffa0281d20 R14: ffff88067e087098 R15: ffff8806799d8598
> > > FS:  00007f880d5ba700(0000) GS:ffff8806864e0000(0000)
> > > knlGS:0000000000000000
> > > CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: ffffc90006da001c CR3: 0000000676a97000 CR4: 0000000000042660
> > > Call Trace:
> > > [<ffffffffa027b7ea>] nvme_remove+0x9a/0x140 [nvme]
> > > [<ffffffff813503ef>] pci_device_remove+0x3f/0xc0
> > > [<ffffffff81449869>] ? __pm_runtime_idle+0x89/0x90
> > > [<ffffffff8143ed4f>] __device_release_driver+0xaf/0x140
> > > [<ffffffff8143eec8>] device_release_driver+0x28/0x40
> > > [<ffffffff8143db66>] unbind_store+0x96/0xb0
> > > [<ffffffff8143d027>] drv_attr_store+0x27/0x30
> > > [<ffffffff8122e279>] sysfs_kf_write+0x39/0x40
> > > [<ffffffff8122d9e4>] kernfs_fop_write+0xe4/0x160
> > > [<ffffffff811b15df>] __vfs_write+0x2f/0x100
> > > [<ffffffff81003640>] ? syscall_slow_exit_work+0x140/0x180
> > > [<ffffffff81161db9>] ? vm_mmap_pgoff+0xb9/0xe0
> > > [<ffffffff810af981>] ? percpu_down_read+0x11/0x60
> > > [<ffffffff811b2bce>] vfs_write+0xbe/0x190
> > > [<ffffffff811b2d81>] SyS_write+0x51/0xb0
> > > [<ffffffff815b8aee>] entry_SYSCALL_64_fastpath+0x12/0x71
> > > 
> > > Cc: stable at vger.kernel.org # 4.4.y
> > > Cc: Jens Axboe <axboe at fb.com>
> > > Cc: Keith Busch <keith.busch at intel.com>
> > > Cc: Gabriel Krisman Bertazi <krisman at linux.vnet.ibm.com>
> > > Cc: linux-nvme at lists.infradead.org
> > > Fixes: d5537e988eec ("NVMe: Don't unmap controller registers on reset")
> > > Signed-off-by: Rashika Kheria <rashika at amazon.de>
> > > ---
> > >   drivers/nvme/host/pci.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index c851bc5..f5d1579 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -3184,7 +3184,7 @@ static void nvme_probe_work(struct work_struct *work)
> > >   	nvme_disable_queue(dev, 0);
> > >   	nvme_dev_list_remove(dev);
> > >    unmap:
> > > -	nvme_dev_unmap(dev);
> > > +	nvme_pci_disable(dev);
> > >    out:
> > >   	if (!work_busy(&dev->reset_work))
> > >   		nvme_dead_ctrl(dev);
> > > -- 
> > > 2.10.2
> > > 
> > ---end quoted text---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Fw: [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work
  2016-11-14 18:47     ` Keith Busch
@ 2016-12-22 13:16       ` Kheria, Rashika
  0 siblings, 0 replies; 7+ messages in thread
From: Kheria, Rashika @ 2016-12-22 13:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: stable


________________________________________
From: Keith Busch <keith.busch@intel.com>
Sent: Monday, November 14, 2016 7:47 PM
To: Kheria, Rashika
Cc: Christoph Hellwig; Kheria, Rashika; linux-nvme@lists.infradead.org
Subject: Re: [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work

On Mon, Nov 14, 2016 at 09:57:27AM +0100, Rashika Kheria wrote:
> Hi everyone,
>
> Could you please review the following patch? This solves a regression in
> stable 4.4.y tree.

I missed the "Don't unmap" back-port to 4.4.y. I'm not sure, but I think
we may have addressed that differently with something less risky if we
needed that behaviour on 4.4-stable. That's okay, though, this new patch
looks correct. The original was part of a series that fixes this in its
following commit, but it should have looked like this from the beginning.

Acked-by: Keith Busch <keith.busch@intel.com>


> > On Tue, Nov 01, 2016 at 04:27:56PM +0100, Rashika Kheria wrote:
> > > Commit d5537e988eec ("NVMe: Don't unmap controller registers on reset"),
> > > introduced a regression in which it did not replace nvme_dev_unmap()
> > > with nvme_pci_disable() in the error path of nvme_probe_work().
> > >
> > > This led to the following NVMe driver crash on systems where the devices
> > > did not initialise in the first try.
> > >
> > > BUG: unable to handle kernel paging request at ffffc90006da001c
> > > IP: [<ffffffffa027b6bb>] nvme_dev_remove+0x5b/0xf0 [nvme]
> > > RIP: e030:[<ffffffffa027b6bb>]  [<ffffffffa027b6bb>]
> > > nvme_dev_remove+0x5b/0xf0 [nvme]
> > > RSP: e02b:ffff8806659c3cb8  EFLAGS: 00010286
> > > RAX: ffffc90006da0000 RBX: ffff88067cbc3000 RCX: 0000000000000006
> > > RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff8806864eda40
> > > RBP: ffff8806659c3cd8 R08: 0000000000000006 R09: 000000000000fffe
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88067e087000
> > > R13: ffffffffa0281d20 R14: ffff88067e087098 R15: ffff8806799d8598
> > > FS:  00007f880d5ba700(0000) GS:ffff8806864e0000(0000)
> > > knlGS:0000000000000000
> > > CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: ffffc90006da001c CR3: 0000000676a97000 CR4: 0000000000042660
> > > Call Trace:
> > > [<ffffffffa027b7ea>] nvme_remove+0x9a/0x140 [nvme]
> > > [<ffffffff813503ef>] pci_device_remove+0x3f/0xc0
> > > [<ffffffff81449869>] ? __pm_runtime_idle+0x89/0x90
> > > [<ffffffff8143ed4f>] __device_release_driver+0xaf/0x140
> > > [<ffffffff8143eec8>] device_release_driver+0x28/0x40
> > > [<ffffffff8143db66>] unbind_store+0x96/0xb0
> > > [<ffffffff8143d027>] drv_attr_store+0x27/0x30
> > > [<ffffffff8122e279>] sysfs_kf_write+0x39/0x40
> > > [<ffffffff8122d9e4>] kernfs_fop_write+0xe4/0x160
> > > [<ffffffff811b15df>] __vfs_write+0x2f/0x100
> > > [<ffffffff81003640>] ? syscall_slow_exit_work+0x140/0x180
> > > [<ffffffff81161db9>] ? vm_mmap_pgoff+0xb9/0xe0
> > > [<ffffffff810af981>] ? percpu_down_read+0x11/0x60
> > > [<ffffffff811b2bce>] vfs_write+0xbe/0x190
> > > [<ffffffff811b2d81>] SyS_write+0x51/0xb0
> > > [<ffffffff815b8aee>] entry_SYSCALL_64_fastpath+0x12/0x71
> > >
> > > Cc: stable@vger.kernel.org # 4.4.y
> > > Cc: Jens Axboe <axboe@fb.com>
> > > Cc: Keith Busch <keith.busch@intel.com>
> > > Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
> > > Cc: linux-nvme@lists.infradead.org
> > > Fixes: d5537e988eec ("NVMe: Don't unmap controller registers on reset")
> > > Signed-off-by: Rashika Kheria <rashika@amazon.de>
> > > ---
> > >   drivers/nvme/host/pci.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index c851bc5..f5d1579 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -3184,7 +3184,7 @@ static void nvme_probe_work(struct work_struct *work)
> > >           nvme_disable_queue(dev, 0);
> > >           nvme_dev_list_remove(dev);
> > >    unmap:
> > > - nvme_dev_unmap(dev);
> > > + nvme_pci_disable(dev);
> > >    out:
> > >           if (!work_busy(&dev->reset_work))
> > >                   nvme_dead_ctrl(dev);
> > > --
> > > 2.10.2
> > >
> > ---end quoted text---

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-12-22 13:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-01 15:27 [PATCH] NVMe: Call nvme_pci_disable on error path of nvme_probe_work Rashika Kheria
2016-11-12 17:41 ` Christoph Hellwig
2016-11-14  8:57   ` Rashika Kheria
2016-11-14 13:21     ` Gabriel Krisman Bertazi
2016-11-14 14:02       ` Rashika Kheria
2016-11-14 18:47     ` Keith Busch
2016-12-22 13:16       ` Fw: " Kheria, Rashika

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.