linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe()
@ 2020-08-31  6:53 Xianting Tian
  2020-08-31  9:40 ` kernel test robot
  0 siblings, 1 reply; 3+ messages in thread
From: Xianting Tian @ 2020-08-31  6:53 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel, Xianting Tian

We met a crash issue when hot-insert a nvme device, blk_mq_tag_to_rq()
returned null(req=null), then crash happened in nvme_end_request():
	struct nvme_request *rq = nvme_req(req);
	rq->result = result;  <==crash here

The test env is, a server is configured with 2 backplanes, each backplane
support 8 nvme devices, this crash happened when hot-insert a nvme device
to the second backplane. We measured the signal, which is send out of cpu
to ack nvme interrupt, the signal is very weak when it reached the second
backplane, the device can't distinguish it as a ack signal. So it caused
the device can't clear the interrupt flag.
After updating related driver, the signal sending out of cpu to the second
backplane is good, the crash issue disappeared.

As blk_mq_tag_to_rq() may return null, so it should be check whether it is
null before using it to prevent a crash.

	[ 1124.256246] nvme nvme5: pci function 0000:e1:00.0
	[ 1124.256323] nvme 0000:e1:00.0: enabling device (0000 -> 0002)
	[ 1125.720859] nvme nvme5: 96/0/0 default/read/poll queues
	[ 1125.732483]  nvme5n1: p1 p2 p3
	[ 1125.788049] BUG: unable to handle kernel NULL pointer dereference at 0000000000000130
	[ 1125.788054] PGD 0 P4D 0
	[ 1125.788057] Oops: 0002 [#1] SMP NOPTI
	[ 1125.788059] CPU: 50 PID: 0 Comm: swapper/50 Kdump: loaded Tainted: G   --------- -t - 4.18.0-147.el8.x86_64 #1
	[ 1125.788065] RIP: 0010:nvme_irq+0xe8/0x240 [nvme]
	[ 1125.788068] RSP: 0018:ffff916b8ec83ed0 EFLAGS: 00010813
	[ 1125.788069] RAX: 0000000000000000 RBX: ffff918ae9211b00 RCX: 0000000000000000
	[ 1125.788070] RDX: 000000000000400b RSI: 0000000000000000 RDI: 0000000000000000
	[ 1125.788071] RBP: ffff918ae8870000 R08: 0000000000000004 R09: ffff918ae8870000
	[ 1125.788072] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
	[ 1125.788073] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
	[ 1125.788075] FS:  0000000000000000(0000) GS:ffff916b8ec80000(0000) knlGS:0000000000000000
	[ 1125.788075] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 1125.788076] CR2: 0000000000000130 CR3: 0000001768f00000 CR4: 0000000000340ee0
	[ 1125.788077] Call Trace:
	[ 1125.788080]  <IRQ>
	[ 1125.788085]  __handle_irq_event_percpu+0x40/0x180
	[ 1125.788087]  handle_irq_event_percpu+0x30/0x80
	[ 1125.788089]  handle_irq_event+0x36/0x53
	[ 1125.788090]  handle_edge_irq+0x82/0x190
	[ 1125.788094]  handle_irq+0xbf/0x100
	[ 1125.788098]  do_IRQ+0x49/0xd0
	[ 1125.788100]  common_interrupt+0xf/0xf

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
---
 drivers/nvme/host/pci.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47..32712a41c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -939,6 +939,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 {
 	struct nvme_completion *cqe = &nvmeq->cqes[idx];
 	struct request *req;
+	struct blk_mq_tags *tags;
 
 	if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
 		dev_warn(nvmeq->dev->ctrl.device,
@@ -959,7 +960,15 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 		return;
 	}
 
-	req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
+	tags = nvme_queue_tagset(nvmeq);
+	req = blk_mq_tag_to_rq(tags, cqe->command_id);
+	if (unlikely(!req)) {
+		dev_warn(nvmeq->dev->ctrl.device,
+			"req is null(tag:%d nr_tags:%d) on queue %d\n"
+			cqe->command_id, tags->nr_tags, le16_to_cpu(cqe->sq_id);
+		return;
+	}
+
 	trace_nvme_sq(req, cqe->sq_head, nvmeq->sq_tail);
 	if (!nvme_end_request(req, cqe->status, cqe->result))
 		nvme_pci_complete_rq(req);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe()
  2020-08-31  6:53 [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe() Xianting Tian
@ 2020-08-31  9:40 ` kernel test robot
  2020-08-31 11:03   ` Tianxianting
  0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2020-08-31  9:40 UTC (permalink / raw)
  To: Xianting Tian, kbusch, axboe, hch, sagi
  Cc: kbuild-all, linux-nvme, linux-kernel, Xianting Tian

[-- Attachment #1: Type: text/plain, Size: 6121 bytes --]

Hi Xianting,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.9-rc2]
[also build test ERROR on next-20200828]
[cannot apply to linus/master v5.9-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xianting-Tian/nvme-pci-check-req-to-prevent-crash-in-nvme_handle_cqe/20200831-155653
base:    d012a7190fc1fd72ed48911e77ca97ba4521bccd
config: parisc-randconfig-r004-20200831 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/nvme/host/pci.c: In function 'nvme_handle_cqe':
>> drivers/nvme/host/pci.c:3244: error: unterminated argument list invoking macro "dev_warn"
    3244 | module_exit(nvme_exit);
         | 
>> drivers/nvme/host/pci.c:966:3: error: 'dev_warn' undeclared (first use in this function); did you mean '_dev_warn'?
     966 |   dev_warn(nvmeq->dev->ctrl.device,
         |   ^~~~~~~~
         |   _dev_warn
   drivers/nvme/host/pci.c:966:3: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/nvme/host/pci.c:966:11: error: expected ';' at end of input
     966 |   dev_warn(nvmeq->dev->ctrl.device,
         |           ^
         |           ;
   ......
    3244 | module_exit(nvme_exit);
         |            
>> drivers/nvme/host/pci.c:966:3: error: expected declaration or statement at end of input
     966 |   dev_warn(nvmeq->dev->ctrl.device,
         |   ^~~~~~~~
>> drivers/nvme/host/pci.c:966:3: error: expected declaration or statement at end of input
   drivers/nvme/host/pci.c: At top level:
   drivers/nvme/host/pci.c:105:13: warning: 'nvme_dev_disable' declared 'static' but never defined [-Wunused-function]
     105 | static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
         |             ^~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:106:13: warning: '__nvme_disable_io_queues' declared 'static' but never defined [-Wunused-function]
     106 | static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:901:13: warning: 'nvme_pci_complete_rq' defined but not used [-Wunused-function]
     901 | static void nvme_pci_complete_rq(struct request *req)
         |             ^~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:853:21: warning: 'nvme_queue_rq' defined but not used [-Wunused-function]
     853 | static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
         |                     ^~~~~~~~~~~~~
   drivers/nvme/host/pci.c:484:13: warning: 'nvme_commit_rqs' defined but not used [-Wunused-function]
     484 | static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
         |             ^~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:427:12: warning: 'nvme_pci_map_queues' defined but not used [-Wunused-function]
     427 | static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
         |            ^~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:403:12: warning: 'nvme_init_request' defined but not used [-Wunused-function]
     403 | static int nvme_init_request(struct blk_mq_tag_set *set, struct request *req,
         |            ^~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:392:12: warning: 'nvme_init_hctx' defined but not used [-Wunused-function]
     392 | static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
         |            ^~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:379:12: warning: 'nvme_admin_init_hctx' defined but not used [-Wunused-function]
     379 | static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
         |            ^~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:371:15: warning: 'nvme_pci_iod_alloc_size' defined but not used [-Wunused-function]
     371 | static size_t nvme_pci_iod_alloc_size(void)
         |               ^~~~~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:294:13: warning: 'nvme_dbbuf_set' defined but not used [-Wunused-function]
     294 | static void nvme_dbbuf_set(struct nvme_dev *dev)
         |             ^~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:282:13: warning: 'nvme_dbbuf_init' defined but not used [-Wunused-function]
     282 | static void nvme_dbbuf_init(struct nvme_dev *dev,
         |             ^~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:241:12: warning: 'nvme_dbbuf_dma_alloc' defined but not used [-Wunused-function]
     241 | static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
         |            ^~~~~~~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/e3761e6c554adb55d40e176915df361ad9e272e1
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Xianting-Tian/nvme-pci-check-req-to-prevent-crash-in-nvme_handle_cqe/20200831-155653
git checkout e3761e6c554adb55d40e176915df361ad9e272e1
vim +/dev_warn +3244 drivers/nvme/host/pci.c

b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3239  
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3240  MODULE_AUTHOR("Matthew Wilcox <willy@linux.intel.com>");
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3241  MODULE_LICENSE("GPL");
c78b47136f7ade drivers/block/nvme-core.c Keith Busch    2014-11-21  3242  MODULE_VERSION("1.0");
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3243  module_init(nvme_init);
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20 @3244  module_exit(nvme_exit);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31802 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe()
  2020-08-31  9:40 ` kernel test robot
@ 2020-08-31 11:03   ` Tianxianting
  0 siblings, 0 replies; 3+ messages in thread
From: Tianxianting @ 2020-08-31 11:03 UTC (permalink / raw)
  To: kernel test robot, kbusch, axboe, hch, sagi
  Cc: kbuild-all, linux-nvme, linux-kernel

I am very sorry, I used the wrong patch file :(
I send it again. Please review, thanks.

-----Original Message-----
From: kernel test robot [mailto:lkp@intel.com] 
Sent: Monday, August 31, 2020 5:41 PM
To: tianxianting (RD) <tian.xianting@h3c.com>; kbusch@kernel.org; axboe@fb.com; hch@lst.de; sagi@grimberg.me
Cc: kbuild-all@lists.01.org; linux-nvme@lists.infradead.org; linux-kernel@vger.kernel.org; tianxianting (RD) <tian.xianting@h3c.com>
Subject: Re: [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe()

Hi Xianting,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.9-rc2]
[also build test ERROR on next-20200828] [cannot apply to linus/master v5.9-rc3] [If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xianting-Tian/nvme-pci-check-req-to-prevent-crash-in-nvme_handle_cqe/20200831-155653
base:    d012a7190fc1fd72ed48911e77ca97ba4521bccd
config: parisc-randconfig-r004-20200831 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/nvme/host/pci.c: In function 'nvme_handle_cqe':
>> drivers/nvme/host/pci.c:3244: error: unterminated argument list invoking macro "dev_warn"
    3244 | module_exit(nvme_exit);
         | 
>> drivers/nvme/host/pci.c:966:3: error: 'dev_warn' undeclared (first use in this function); did you mean '_dev_warn'?
     966 |   dev_warn(nvmeq->dev->ctrl.device,
         |   ^~~~~~~~
         |   _dev_warn
   drivers/nvme/host/pci.c:966:3: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/nvme/host/pci.c:966:11: error: expected ';' at end of input
     966 |   dev_warn(nvmeq->dev->ctrl.device,
         |           ^
         |           ;
   ......
    3244 | module_exit(nvme_exit);
         |            
>> drivers/nvme/host/pci.c:966:3: error: expected declaration or 
>> statement at end of input
     966 |   dev_warn(nvmeq->dev->ctrl.device,
         |   ^~~~~~~~
>> drivers/nvme/host/pci.c:966:3: error: expected declaration or 
>> statement at end of input
   drivers/nvme/host/pci.c: At top level:
   drivers/nvme/host/pci.c:105:13: warning: 'nvme_dev_disable' declared 'static' but never defined [-Wunused-function]
     105 | static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
         |             ^~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:106:13: warning: '__nvme_disable_io_queues' declared 'static' but never defined [-Wunused-function]
     106 | static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode);
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:901:13: warning: 'nvme_pci_complete_rq' defined but not used [-Wunused-function]
     901 | static void nvme_pci_complete_rq(struct request *req)
         |             ^~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:853:21: warning: 'nvme_queue_rq' defined but not used [-Wunused-function]
     853 | static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
         |                     ^~~~~~~~~~~~~
   drivers/nvme/host/pci.c:484:13: warning: 'nvme_commit_rqs' defined but not used [-Wunused-function]
     484 | static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
         |             ^~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:427:12: warning: 'nvme_pci_map_queues' defined but not used [-Wunused-function]
     427 | static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
         |            ^~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:403:12: warning: 'nvme_init_request' defined but not used [-Wunused-function]
     403 | static int nvme_init_request(struct blk_mq_tag_set *set, struct request *req,
         |            ^~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:392:12: warning: 'nvme_init_hctx' defined but not used [-Wunused-function]
     392 | static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
         |            ^~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:379:12: warning: 'nvme_admin_init_hctx' defined but not used [-Wunused-function]
     379 | static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
         |            ^~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:371:15: warning: 'nvme_pci_iod_alloc_size' defined but not used [-Wunused-function]
     371 | static size_t nvme_pci_iod_alloc_size(void)
         |               ^~~~~~~~~~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:294:13: warning: 'nvme_dbbuf_set' defined but not used [-Wunused-function]
     294 | static void nvme_dbbuf_set(struct nvme_dev *dev)
         |             ^~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:282:13: warning: 'nvme_dbbuf_init' defined but not used [-Wunused-function]
     282 | static void nvme_dbbuf_init(struct nvme_dev *dev,
         |             ^~~~~~~~~~~~~~~
   drivers/nvme/host/pci.c:241:12: warning: 'nvme_dbbuf_dma_alloc' defined but not used [-Wunused-function]
     241 | static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
         |            ^~~~~~~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/e3761e6c554adb55d40e176915df361ad9e272e1
git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xianting-Tian/nvme-pci-check-req-to-prevent-crash-in-nvme_handle_cqe/20200831-155653
git checkout e3761e6c554adb55d40e176915df361ad9e272e1
vim +/dev_warn +3244 drivers/nvme/host/pci.c

b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3239  
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3240  MODULE_AUTHOR("Matthew Wilcox <willy@linux.intel.com>");
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3241  MODULE_LICENSE("GPL");
c78b47136f7ade drivers/block/nvme-core.c Keith Busch    2014-11-21  3242  MODULE_VERSION("1.0");
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20  3243  module_init(nvme_init);
b60503ba432b16 drivers/block/nvme.c      Matthew Wilcox 2011-01-20 @3244  module_exit(nvme_exit);

---
0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-31 12:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31  6:53 [PATCH] nvme-pci: check req to prevent crash in nvme_handle_cqe() Xianting Tian
2020-08-31  9:40 ` kernel test robot
2020-08-31 11:03   ` Tianxianting

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).