All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] scsi: libsas: only clear phy->in_shutdown after shutdown event done
@ 2019-05-14  2:42 ` Jason Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Yan @ 2019-05-14  2:42 UTC (permalink / raw)
  To: martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, john.garry, zhaohongjiang,
	Jason Yan, Ewan Milne, Tomas Henzl

When the event queue is full of phy up and down events and reached the
threshold, we will queue a shutdown-event, and set phy->in_shutdown so
that we will not queue a shutdown-event again. But before the
shutdown-event can be executed, every phy-down event will clear
phy->in_shutdown and a new shutdown-event will be queued. The queue will
be full of these shutdown-events.

Fix this by only clear phy->in_shutdown in sas_phye_shutdown(), that is
after the first shutdown-event has been executed.

Fixes: f12486e06ae8 ("scsi: libsas: shut down the PHY if events reached the threshold")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Johannes Thumshirn <jthumshirn@suse.de>
CC: Ewan Milne <emilne@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Tomas Henzl <thenzl@redhat.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Hannes Reinecke <hare@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/libsas/sas_phy.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index e030e1452136..b71f5ac6c7dc 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -35,7 +35,6 @@ static void sas_phye_loss_of_signal(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	phy->in_shutdown = 0;
 	phy->error = 0;
 	sas_deform_port(phy, 1);
 }
@@ -45,7 +44,6 @@ static void sas_phye_oob_done(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	phy->in_shutdown = 0;
 	phy->error = 0;
 }
 
@@ -126,6 +124,7 @@ static void sas_phye_shutdown(struct work_struct *work)
 				  ret);
 	} else
 		pr_notice("phy%d is not enabled, cannot shutdown\n", phy->id);
+	phy->in_shutdown = 0;
 }
 
 /* ---------- Phy class registration ---------- */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] scsi: libsas: only clear phy->in_shutdown after shutdown event done
@ 2019-05-14  2:42 ` Jason Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Yan @ 2019-05-14  2:42 UTC (permalink / raw)
  To: martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, john.garry, zhaohongjiang,
	Jason Yan, Ewan Milne, Tomas Henzl

When the event queue is full of phy up and down events and reached the
threshold, we will queue a shutdown-event, and set phy->in_shutdown so
that we will not queue a shutdown-event again. But before the
shutdown-event can be executed, every phy-down event will clear
phy->in_shutdown and a new shutdown-event will be queued. The queue will
be full of these shutdown-events.

Fix this by only clear phy->in_shutdown in sas_phye_shutdown(), that is
after the first shutdown-event has been executed.

Fixes: f12486e06ae8 ("scsi: libsas: shut down the PHY if events reached the threshold")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Johannes Thumshirn <jthumshirn@suse.de>
CC: Ewan Milne <emilne@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Tomas Henzl <thenzl@redhat.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Hannes Reinecke <hare@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/libsas/sas_phy.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index e030e1452136..b71f5ac6c7dc 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -35,7 +35,6 @@ static void sas_phye_loss_of_signal(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	phy->in_shutdown = 0;
 	phy->error = 0;
 	sas_deform_port(phy, 1);
 }
@@ -45,7 +44,6 @@ static void sas_phye_oob_done(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	phy->in_shutdown = 0;
 	phy->error = 0;
 }
 
@@ -126,6 +124,7 @@ static void sas_phye_shutdown(struct work_struct *work)
 				  ret);
 	} else
 		pr_notice("phy%d is not enabled, cannot shutdown\n", phy->id);
+	phy->in_shutdown = 0;
 }
 
 /* ---------- Phy class registration ---------- */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
  2019-05-14  2:42 ` Jason Yan
@ 2019-05-14  2:42   ` Jason Yan
  -1 siblings, 0 replies; 12+ messages in thread
From: Jason Yan @ 2019-05-14  2:42 UTC (permalink / raw)
  To: martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, john.garry, zhaohongjiang,
	Jason Yan

The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
be deleted when the expander failed to discover. This will cause
resource leak and a further issue of kernel BUG like below:

[159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
already part of another port
[159785.852144] ------------[ cut here  ]------------
[159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
[159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
[159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
[159785.878458] Hardware name: Huawei Technologies Co., Ltd.
Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
[159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
[159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
[159785.900094] pc : sas_port_add_phy+0x188/0x1b8
[159785.904524] lr : sas_port_add_phy+0x188/0x1b8
[159785.908952] sp : ffff0001120e3b80
[159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
[159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
[159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
[159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
[159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
[159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
[159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
[159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
[159785.955417] x13: 726c612073277469 x12: ffff00006887b830
[159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
[159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
[159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
[159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
[159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
[159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
[159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
0x0000000072dae094)
[159786.000576] Call trace:
[159786.003097]  sas_port_add_phy+0x188/0x1b8
[159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
[159786.012130]  sas_ex_discover_expander+0x128/0x408
[159786.016906]  sas_ex_discover_dev+0x218/0x4c8
[159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
[159786.025852]  sas_discover_root_expander+0x134/0x160
[159786.030802]  sas_discover_domain+0x1b8/0x1e8
[159786.035148]  process_one_work+0x1b4/0x3f8
[159786.039230]  worker_thread+0x54/0x470
[159786.042967]  kthread+0x134/0x138
[159786.046269]  ret_from_fork+0x10/0x18
[159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
[159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
[159786.067202] ---[ end trace 03622b9e2d99e196  ]---
[159786.071893] Kernel panic - not syncing: Fatal exception
[159786.077190] SMP: stopping secondary CPUs
[159786.081192] Kernel Offset: disabled
[159786.084753] CPU features: 0x2,a2a00a38

Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Reported-by: Jian Luo <luojian5@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
---
 drivers/scsi/libsas/sas_expander.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 83f2fd70ce76..9f7e2457360e 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1019,6 +1019,8 @@ static struct domain_device *sas_ex_discover_expander(
 		list_del(&child->dev_list_node);
 		spin_unlock_irq(&parent->port->dev_list_lock);
 		sas_put_device(child);
+		sas_port_delete(phy->port);
+		phy->port = NULL;
 		return NULL;
 	}
 	list_add_tail(&child->siblings, &parent->ex_dev.children);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
@ 2019-05-14  2:42   ` Jason Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Yan @ 2019-05-14  2:42 UTC (permalink / raw)
  To: martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, john.garry, zhaohongjiang,
	Jason Yan

The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
be deleted when the expander failed to discover. This will cause
resource leak and a further issue of kernel BUG like below:

[159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
already part of another port
[159785.852144] ------------[ cut here  ]------------
[159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
[159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
[159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
[159785.878458] Hardware name: Huawei Technologies Co., Ltd.
Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
[159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
[159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
[159785.900094] pc : sas_port_add_phy+0x188/0x1b8
[159785.904524] lr : sas_port_add_phy+0x188/0x1b8
[159785.908952] sp : ffff0001120e3b80
[159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
[159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
[159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
[159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
[159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
[159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
[159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
[159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
[159785.955417] x13: 726c612073277469 x12: ffff00006887b830
[159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
[159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
[159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
[159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
[159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
[159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
[159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
0x0000000072dae094)
[159786.000576] Call trace:
[159786.003097]  sas_port_add_phy+0x188/0x1b8
[159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
[159786.012130]  sas_ex_discover_expander+0x128/0x408
[159786.016906]  sas_ex_discover_dev+0x218/0x4c8
[159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
[159786.025852]  sas_discover_root_expander+0x134/0x160
[159786.030802]  sas_discover_domain+0x1b8/0x1e8
[159786.035148]  process_one_work+0x1b4/0x3f8
[159786.039230]  worker_thread+0x54/0x470
[159786.042967]  kthread+0x134/0x138
[159786.046269]  ret_from_fork+0x10/0x18
[159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
[159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
[159786.067202] ---[ end trace 03622b9e2d99e196  ]---
[159786.071893] Kernel panic - not syncing: Fatal exception
[159786.077190] SMP: stopping secondary CPUs
[159786.081192] Kernel Offset: disabled
[159786.084753] CPU features: 0x2,a2a00a38

Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Reported-by: Jian Luo <luojian5@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
---
 drivers/scsi/libsas/sas_expander.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 83f2fd70ce76..9f7e2457360e 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1019,6 +1019,8 @@ static struct domain_device *sas_ex_discover_expander(
 		list_del(&child->dev_list_node);
 		spin_unlock_irq(&parent->port->dev_list_lock);
 		sas_put_device(child);
+		sas_port_delete(phy->port);
+		phy->port = NULL;
 		return NULL;
 	}
 	list_add_tail(&child->siblings, &parent->ex_dev.children);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
  2019-05-14  2:42   ` Jason Yan
@ 2019-05-14 10:45     ` John Garry
  -1 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2019-05-14 10:45 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, zhaohongjiang, Linuxarm

On 14/05/2019 03:42, Jason Yan wrote:
> The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
> be deleted when the expander failed to discover. This will cause
> resource leak and a further issue of kernel BUG like below:
>
> [159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
> already part of another port
> [159785.852144] ------------[ cut here  ]------------
> [159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
> [159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
> [159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
> W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
> [159785.878458] Hardware name: Huawei Technologies Co., Ltd.
> Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
> [159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
> [159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
> [159785.900094] pc : sas_port_add_phy+0x188/0x1b8
> [159785.904524] lr : sas_port_add_phy+0x188/0x1b8
> [159785.908952] sp : ffff0001120e3b80
> [159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
> [159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
> [159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
> [159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
> [159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
> [159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
> [159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
> [159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
> [159785.955417] x13: 726c612073277469 x12: ffff00006887b830
> [159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
> [159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
> [159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
> [159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
> [159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
> [159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
> [159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
> 0x0000000072dae094)
> [159786.000576] Call trace:
> [159786.003097]  sas_port_add_phy+0x188/0x1b8
> [159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
> [159786.012130]  sas_ex_discover_expander+0x128/0x408
> [159786.016906]  sas_ex_discover_dev+0x218/0x4c8
> [159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
> [159786.025852]  sas_discover_root_expander+0x134/0x160
> [159786.030802]  sas_discover_domain+0x1b8/0x1e8
> [159786.035148]  process_one_work+0x1b4/0x3f8
> [159786.039230]  worker_thread+0x54/0x470
> [159786.042967]  kthread+0x134/0x138
> [159786.046269]  ret_from_fork+0x10/0x18
> [159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
> [159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
> hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
> [159786.067202] ---[ end trace 03622b9e2d99e196  ]---
> [159786.071893] Kernel panic - not syncing: Fatal exception
> [159786.077190] SMP: stopping secondary CPUs
> [159786.081192] Kernel Offset: disabled
> [159786.084753] CPU features: 0x2,a2a00a38
>
> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
> Reported-by: Jian Luo <luojian5@huawei.com>
> Signed-off-by: Jason Yan <yanaijie@huawei.com>
> CC: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/libsas/sas_expander.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
> index 83f2fd70ce76..9f7e2457360e 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -1019,6 +1019,8 @@ static struct domain_device *sas_ex_discover_expander(
>  		list_del(&child->dev_list_node);
>  		spin_unlock_irq(&parent->port->dev_list_lock);
>  		sas_put_device(child);
> +		sas_port_delete(phy->port);
> +		phy->port = NULL;

This looks ok.

However, I wonder if we miss something else, and I see this code earlier 
in sas_ex_discover_expander():

     kref_get(&parent->kref);
     child->parent = parent;
     child->port = port;
     child->iproto = phy->attached_iproto;
     child->tproto = phy->attached_tproto;

I assume the kref get is for the child referencing the parent. Do we 
need the kref put? I couldn't see it.

If yes, maybe it should it go in sas_free_device(), along with kfree() 
for expander device phys.

Thanks,
John

>  		return NULL;
>  	}
>  	list_add_tail(&child->siblings, &parent->ex_dev.children);
>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
@ 2019-05-14 10:45     ` John Garry
  0 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2019-05-14 10:45 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, zhaohongjiang, Linuxarm

On 14/05/2019 03:42, Jason Yan wrote:
> The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
> be deleted when the expander failed to discover. This will cause
> resource leak and a further issue of kernel BUG like below:
>
> [159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
> already part of another port
> [159785.852144] ------------[ cut here  ]------------
> [159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
> [159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
> [159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
> W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
> [159785.878458] Hardware name: Huawei Technologies Co., Ltd.
> Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
> [159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
> [159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
> [159785.900094] pc : sas_port_add_phy+0x188/0x1b8
> [159785.904524] lr : sas_port_add_phy+0x188/0x1b8
> [159785.908952] sp : ffff0001120e3b80
> [159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
> [159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
> [159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
> [159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
> [159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
> [159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
> [159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
> [159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
> [159785.955417] x13: 726c612073277469 x12: ffff00006887b830
> [159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
> [159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
> [159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
> [159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
> [159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
> [159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
> [159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
> 0x0000000072dae094)
> [159786.000576] Call trace:
> [159786.003097]  sas_port_add_phy+0x188/0x1b8
> [159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
> [159786.012130]  sas_ex_discover_expander+0x128/0x408
> [159786.016906]  sas_ex_discover_dev+0x218/0x4c8
> [159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
> [159786.025852]  sas_discover_root_expander+0x134/0x160
> [159786.030802]  sas_discover_domain+0x1b8/0x1e8
> [159786.035148]  process_one_work+0x1b4/0x3f8
> [159786.039230]  worker_thread+0x54/0x470
> [159786.042967]  kthread+0x134/0x138
> [159786.046269]  ret_from_fork+0x10/0x18
> [159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
> [159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
> hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
> [159786.067202] ---[ end trace 03622b9e2d99e196  ]---
> [159786.071893] Kernel panic - not syncing: Fatal exception
> [159786.077190] SMP: stopping secondary CPUs
> [159786.081192] Kernel Offset: disabled
> [159786.084753] CPU features: 0x2,a2a00a38
>
> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
> Reported-by: Jian Luo <luojian5@huawei.com>
> Signed-off-by: Jason Yan <yanaijie@huawei.com>
> CC: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/libsas/sas_expander.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
> index 83f2fd70ce76..9f7e2457360e 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -1019,6 +1019,8 @@ static struct domain_device *sas_ex_discover_expander(
>  		list_del(&child->dev_list_node);
>  		spin_unlock_irq(&parent->port->dev_list_lock);
>  		sas_put_device(child);
> +		sas_port_delete(phy->port);
> +		phy->port = NULL;

This looks ok.

However, I wonder if we miss something else, and I see this code earlier 
in sas_ex_discover_expander():

     kref_get(&parent->kref);
     child->parent = parent;
     child->port = port;
     child->iproto = phy->attached_iproto;
     child->tproto = phy->attached_tproto;

I assume the kref get is for the child referencing the parent. Do we 
need the kref put? I couldn't see it.

If yes, maybe it should it go in sas_free_device(), along with kfree() 
for expander device phys.

Thanks,
John

>  		return NULL;
>  	}
>  	list_add_tail(&child->siblings, &parent->ex_dev.children);
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
  2019-05-14 10:45     ` John Garry
@ 2019-05-14 11:37       ` Jason Yan
  -1 siblings, 0 replies; 12+ messages in thread
From: Jason Yan @ 2019-05-14 11:37 UTC (permalink / raw)
  To: John Garry, martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, zhaohongjiang, Linuxarm



On 2019/5/14 18:45, John Garry wrote:
> On 14/05/2019 03:42, Jason Yan wrote:
>> The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
>> be deleted when the expander failed to discover. This will cause
>> resource leak and a further issue of kernel BUG like below:
>>
>> [159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
>> already part of another port
>> [159785.852144] ------------[ cut here  ]------------
>> [159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
>> [159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
>> [159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
>> W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
>> [159785.878458] Hardware name: Huawei Technologies Co., Ltd.
>> Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
>> [159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
>> [159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
>> [159785.900094] pc : sas_port_add_phy+0x188/0x1b8
>> [159785.904524] lr : sas_port_add_phy+0x188/0x1b8
>> [159785.908952] sp : ffff0001120e3b80
>> [159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
>> [159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
>> [159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
>> [159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
>> [159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
>> [159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
>> [159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
>> [159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
>> [159785.955417] x13: 726c612073277469 x12: ffff00006887b830
>> [159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
>> [159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
>> [159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
>> [159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
>> [159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
>> [159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
>> [159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
>> 0x0000000072dae094)
>> [159786.000576] Call trace:
>> [159786.003097]  sas_port_add_phy+0x188/0x1b8
>> [159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
>> [159786.012130]  sas_ex_discover_expander+0x128/0x408
>> [159786.016906]  sas_ex_discover_dev+0x218/0x4c8
>> [159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
>> [159786.025852]  sas_discover_root_expander+0x134/0x160
>> [159786.030802]  sas_discover_domain+0x1b8/0x1e8
>> [159786.035148]  process_one_work+0x1b4/0x3f8
>> [159786.039230]  worker_thread+0x54/0x470
>> [159786.042967]  kthread+0x134/0x138
>> [159786.046269]  ret_from_fork+0x10/0x18
>> [159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
>> [159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
>> hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
>> [159786.067202] ---[ end trace 03622b9e2d99e196  ]---
>> [159786.071893] Kernel panic - not syncing: Fatal exception
>> [159786.077190] SMP: stopping secondary CPUs
>> [159786.081192] Kernel Offset: disabled
>> [159786.084753] CPU features: 0x2,a2a00a38
>>
>> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
>> Reported-by: Jian Luo <luojian5@huawei.com>
>> Signed-off-by: Jason Yan <yanaijie@huawei.com>
>> CC: John Garry <john.garry@huawei.com>
>> ---
>>  drivers/scsi/libsas/sas_expander.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/libsas/sas_expander.c 
>> b/drivers/scsi/libsas/sas_expander.c
>> index 83f2fd70ce76..9f7e2457360e 100644
>> --- a/drivers/scsi/libsas/sas_expander.c
>> +++ b/drivers/scsi/libsas/sas_expander.c
>> @@ -1019,6 +1019,8 @@ static struct domain_device 
>> *sas_ex_discover_expander(
>>          list_del(&child->dev_list_node);
>>          spin_unlock_irq(&parent->port->dev_list_lock);
>>          sas_put_device(child);
>> +        sas_port_delete(phy->port);
>> +        phy->port = NULL;
> 
> This looks ok.
> 
> However, I wonder if we miss something else, and I see this code earlier 
> in sas_ex_discover_expander():
> 
>      kref_get(&parent->kref);
>      child->parent = parent;
>      child->port = port;
>      child->iproto = phy->attached_iproto;
>      child->tproto = phy->attached_tproto;
> 
> I assume the kref get is for the child referencing the parent. Do we 
> need the kref put? I couldn't see it.
> 
> If yes, maybe it should it go in sas_free_device(), along with kfree() 
> for expander device phys.
> 

Yes, the parent's kref will put in sas_free_device(). So the call chain 
will be:

sas_put_device
   ->sas_free_device
     ->sas_put_device(dev->parent);
       ->kref_put

Thanks,
Jason

> Thanks,
> John
> 
>>          return NULL;
>>      }
>>      list_add_tail(&child->siblings, &parent->ex_dev.children);
>>
> 
> 
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
@ 2019-05-14 11:37       ` Jason Yan
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Yan @ 2019-05-14 11:37 UTC (permalink / raw)
  To: John Garry, martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, zhaohongjiang, Linuxarm



On 2019/5/14 18:45, John Garry wrote:
> On 14/05/2019 03:42, Jason Yan wrote:
>> The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
>> be deleted when the expander failed to discover. This will cause
>> resource leak and a further issue of kernel BUG like below:
>>
>> [159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
>> already part of another port
>> [159785.852144] ------------[ cut here  ]------------
>> [159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
>> [159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
>> [159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
>> W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
>> [159785.878458] Hardware name: Huawei Technologies Co., Ltd.
>> Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
>> [159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
>> [159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
>> [159785.900094] pc : sas_port_add_phy+0x188/0x1b8
>> [159785.904524] lr : sas_port_add_phy+0x188/0x1b8
>> [159785.908952] sp : ffff0001120e3b80
>> [159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
>> [159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
>> [159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
>> [159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
>> [159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
>> [159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
>> [159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
>> [159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
>> [159785.955417] x13: 726c612073277469 x12: ffff00006887b830
>> [159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
>> [159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
>> [159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
>> [159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
>> [159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
>> [159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
>> [159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
>> 0x0000000072dae094)
>> [159786.000576] Call trace:
>> [159786.003097]  sas_port_add_phy+0x188/0x1b8
>> [159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
>> [159786.012130]  sas_ex_discover_expander+0x128/0x408
>> [159786.016906]  sas_ex_discover_dev+0x218/0x4c8
>> [159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
>> [159786.025852]  sas_discover_root_expander+0x134/0x160
>> [159786.030802]  sas_discover_domain+0x1b8/0x1e8
>> [159786.035148]  process_one_work+0x1b4/0x3f8
>> [159786.039230]  worker_thread+0x54/0x470
>> [159786.042967]  kthread+0x134/0x138
>> [159786.046269]  ret_from_fork+0x10/0x18
>> [159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
>> [159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
>> hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
>> [159786.067202] ---[ end trace 03622b9e2d99e196  ]---
>> [159786.071893] Kernel panic - not syncing: Fatal exception
>> [159786.077190] SMP: stopping secondary CPUs
>> [159786.081192] Kernel Offset: disabled
>> [159786.084753] CPU features: 0x2,a2a00a38
>>
>> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
>> Reported-by: Jian Luo <luojian5@huawei.com>
>> Signed-off-by: Jason Yan <yanaijie@huawei.com>
>> CC: John Garry <john.garry@huawei.com>
>> ---
>>  drivers/scsi/libsas/sas_expander.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/libsas/sas_expander.c 
>> b/drivers/scsi/libsas/sas_expander.c
>> index 83f2fd70ce76..9f7e2457360e 100644
>> --- a/drivers/scsi/libsas/sas_expander.c
>> +++ b/drivers/scsi/libsas/sas_expander.c
>> @@ -1019,6 +1019,8 @@ static struct domain_device 
>> *sas_ex_discover_expander(
>>          list_del(&child->dev_list_node);
>>          spin_unlock_irq(&parent->port->dev_list_lock);
>>          sas_put_device(child);
>> +        sas_port_delete(phy->port);
>> +        phy->port = NULL;
> 
> This looks ok.
> 
> However, I wonder if we miss something else, and I see this code earlier 
> in sas_ex_discover_expander():
> 
>      kref_get(&parent->kref);
>      child->parent = parent;
>      child->port = port;
>      child->iproto = phy->attached_iproto;
>      child->tproto = phy->attached_tproto;
> 
> I assume the kref get is for the child referencing the parent. Do we 
> need the kref put? I couldn't see it.
> 
> If yes, maybe it should it go in sas_free_device(), along with kfree() 
> for expander device phys.
> 

Yes, the parent's kref will put in sas_free_device(). So the call chain 
will be:

sas_put_device
   ->sas_free_device
     ->sas_put_device(dev->parent);
       ->kref_put

Thanks,
Jason

> Thanks,
> John
> 
>>          return NULL;
>>      }
>>      list_add_tail(&child->siblings, &parent->ex_dev.children);
>>
> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
  2019-05-14 11:37       ` Jason Yan
@ 2019-05-14 11:46         ` John Garry
  -1 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2019-05-14 11:46 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, zhaohongjiang, Linuxarm

On 14/05/2019 12:37, Jason Yan wrote:
>
>
> On 2019/5/14 18:45, John Garry wrote:
>> On 14/05/2019 03:42, Jason Yan wrote:
>>> The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
>>> be deleted when the expander failed to discover. This will cause
>>> resource leak and a further issue of kernel BUG like below:
>>>
>>> [159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
>>> already part of another port
>>> [159785.852144] ------------[ cut here  ]------------
>>> [159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
>>> [159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
>>> [159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
>>> W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
>>> [159785.878458] Hardware name: Huawei Technologies Co., Ltd.
>>> Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
>>> [159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
>>> [159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
>>> [159785.900094] pc : sas_port_add_phy+0x188/0x1b8
>>> [159785.904524] lr : sas_port_add_phy+0x188/0x1b8
>>> [159785.908952] sp : ffff0001120e3b80
>>> [159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
>>> [159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
>>> [159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
>>> [159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
>>> [159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
>>> [159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
>>> [159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
>>> [159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
>>> [159785.955417] x13: 726c612073277469 x12: ffff00006887b830
>>> [159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
>>> [159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
>>> [159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
>>> [159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
>>> [159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
>>> [159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
>>> [159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
>>> 0x0000000072dae094)
>>> [159786.000576] Call trace:
>>> [159786.003097]  sas_port_add_phy+0x188/0x1b8
>>> [159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
>>> [159786.012130]  sas_ex_discover_expander+0x128/0x408
>>> [159786.016906]  sas_ex_discover_dev+0x218/0x4c8
>>> [159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
>>> [159786.025852]  sas_discover_root_expander+0x134/0x160
>>> [159786.030802]  sas_discover_domain+0x1b8/0x1e8
>>> [159786.035148]  process_one_work+0x1b4/0x3f8
>>> [159786.039230]  worker_thread+0x54/0x470
>>> [159786.042967]  kthread+0x134/0x138
>>> [159786.046269]  ret_from_fork+0x10/0x18
>>> [159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
>>> [159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
>>> hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
>>> [159786.067202] ---[ end trace 03622b9e2d99e196  ]---
>>> [159786.071893] Kernel panic - not syncing: Fatal exception
>>> [159786.077190] SMP: stopping secondary CPUs
>>> [159786.081192] Kernel Offset: disabled
>>> [159786.084753] CPU features: 0x2,a2a00a38
>>>
>>> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")

Reviewed-by: John Garry <john.garry@huawei.com>

>>> Reported-by: Jian Luo <luojian5@huawei.com>
>>> Signed-off-by: Jason Yan <yanaijie@huawei.com>
>>> CC: John Garry <john.garry@huawei.com>
>>> ---
>>>  drivers/scsi/libsas/sas_expander.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/libsas/sas_expander.c
>>> b/drivers/scsi/libsas/sas_expander.c
>>> index 83f2fd70ce76..9f7e2457360e 100644
>>> --- a/drivers/scsi/libsas/sas_expander.c
>>> +++ b/drivers/scsi/libsas/sas_expander.c
>>> @@ -1019,6 +1019,8 @@ static struct domain_device
>>> *sas_ex_discover_expander(
>>>          list_del(&child->dev_list_node);
>>>          spin_unlock_irq(&parent->port->dev_list_lock);
>>>          sas_put_device(child);
>>> +        sas_port_delete(phy->port);
>>> +        phy->port = NULL;
>>
>> This looks ok.
>>
>> However, I wonder if we miss something else, and I see this code
>> earlier in sas_ex_discover_expander():
>>
>>      kref_get(&parent->kref);
>>      child->parent = parent;
>>      child->port = port;
>>      child->iproto = phy->attached_iproto;
>>      child->tproto = phy->attached_tproto;
>>
>> I assume the kref get is for the child referencing the parent. Do we
>> need the kref put? I couldn't see it.
>>
>> If yes, maybe it should it go in sas_free_device(), along with kfree()
>> for expander device phys.
>>
>
> Yes, the parent's kref will put in sas_free_device(). So the call chain
> will be:
>
> sas_put_device
>   ->sas_free_device
>     ->sas_put_device(dev->parent);
>       ->kref_put
>

Ah, yes.

> Thanks,
> Jason
>
>> Thanks,
>> John
>>
>>>          return NULL;
>>>      }
>>>      list_add_tail(&child->siblings, &parent->ex_dev.children);
>>>
>>
>>
>>
>>
>> .
>>
>
>
> .
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed
@ 2019-05-14 11:46         ` John Garry
  0 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2019-05-14 11:46 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, linux-kernel, hare, dan.j.williams, jthumshirn, hch,
	huangdaode, chenxiang66, miaoxie, zhaohongjiang, Linuxarm

On 14/05/2019 12:37, Jason Yan wrote:
>
>
> On 2019/5/14 18:45, John Garry wrote:
>> On 14/05/2019 03:42, Jason Yan wrote:
>>> The sas_port(phy->port) allocated in sas_ex_discover_expander() will not
>>> be deleted when the expander failed to discover. This will cause
>>> resource leak and a further issue of kernel BUG like below:
>>>
>>> [159785.843156]  port-2:17:29: trying to add phy phy-2:17:29 fails: it's
>>> already part of another port
>>> [159785.852144] ------------[ cut here  ]------------
>>> [159785.856833] kernel BUG at drivers/scsi/scsi_transport_sas.c:1086!
>>> [159785.863000] Internal error: Oops - BUG: 0 [#1] SMP
>>> [159785.867866] CPU: 39 PID: 16993 Comm: kworker/u96:2 Tainted: G
>>> W  OE     4.19.25-vhulk1901.1.0.h111.aarch64 #1
>>> [159785.878458] Hardware name: Huawei Technologies Co., Ltd.
>>> Hi1620EVBCS/Hi1620EVBCS, BIOS Hi1620 CS B070 1P TA 03/21/2019
>>> [159785.889231] Workqueue: 0000:74:02.0_disco_q sas_discover_domain
>>> [159785.895224] pstate: 40c00009 (nZcv daif +PAN +UAO)
>>> [159785.900094] pc : sas_port_add_phy+0x188/0x1b8
>>> [159785.904524] lr : sas_port_add_phy+0x188/0x1b8
>>> [159785.908952] sp : ffff0001120e3b80
>>> [159785.912341] x29: ffff0001120e3b80 x28: 0000000000000000
>>> [159785.917727] x27: ffff802ade8f5400 x26: ffff0000681b7560
>>> [159785.923111] x25: ffff802adf11a800 x24: ffff0000680e8000
>>> [159785.928496] x23: ffff802ade8f5728 x22: ffff802ade8f5708
>>> [159785.933880] x21: ffff802adea2db40 x20: ffff802ade8f5400
>>> [159785.939264] x19: ffff802adea2d800 x18: 0000000000000010
>>> [159785.944649] x17: 00000000821bf734 x16: ffff00006714faa0
>>> [159785.950033] x15: ffff0000e8ab4ecf x14: 7261702079646165
>>> [159785.955417] x13: 726c612073277469 x12: ffff00006887b830
>>> [159785.960802] x11: ffff00006773eaa0 x10: 7968702079687020
>>> [159785.966186] x9 : 0000000000002453 x8 : 726f702072656874
>>> [159785.971570] x7 : 6f6e6120666f2074 x6 : ffff802bcfb21290
>>> [159785.976955] x5 : ffff802bcfb21290 x4 : 0000000000000000
>>> [159785.982339] x3 : ffff802bcfb298c8 x2 : 337752b234c2ab00
>>> [159785.987723] x1 : 337752b234c2ab00 x0 : 0000000000000000
>>> [159785.993108] Process kworker/u96:2 (pid: 16993, stack limit =
>>> 0x0000000072dae094)
>>> [159786.000576] Call trace:
>>> [159786.003097]  sas_port_add_phy+0x188/0x1b8
>>> [159786.007179]  sas_ex_get_linkrate.isra.5+0x134/0x140
>>> [159786.012130]  sas_ex_discover_expander+0x128/0x408
>>> [159786.016906]  sas_ex_discover_dev+0x218/0x4c8
>>> [159786.021249]  sas_ex_discover_devices+0x9c/0x1a8
>>> [159786.025852]  sas_discover_root_expander+0x134/0x160
>>> [159786.030802]  sas_discover_domain+0x1b8/0x1e8
>>> [159786.035148]  process_one_work+0x1b4/0x3f8
>>> [159786.039230]  worker_thread+0x54/0x470
>>> [159786.042967]  kthread+0x134/0x138
>>> [159786.046269]  ret_from_fork+0x10/0x18
>>> [159786.049918] Code: 91322300 f0004402 91178042 97fe4c9b (d4210000)
>>> [159786.056083] Modules linked in: hns3_enet_ut(OE) hclge(OE) hnae3(OE)
>>> hisi_sas_test_hw(OE) hisi_sas_test_main(OE) serdes(OE)
>>> [159786.067202] ---[ end trace 03622b9e2d99e196  ]---
>>> [159786.071893] Kernel panic - not syncing: Fatal exception
>>> [159786.077190] SMP: stopping secondary CPUs
>>> [159786.081192] Kernel Offset: disabled
>>> [159786.084753] CPU features: 0x2,a2a00a38
>>>
>>> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")

Reviewed-by: John Garry <john.garry@huawei.com>

>>> Reported-by: Jian Luo <luojian5@huawei.com>
>>> Signed-off-by: Jason Yan <yanaijie@huawei.com>
>>> CC: John Garry <john.garry@huawei.com>
>>> ---
>>>  drivers/scsi/libsas/sas_expander.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/libsas/sas_expander.c
>>> b/drivers/scsi/libsas/sas_expander.c
>>> index 83f2fd70ce76..9f7e2457360e 100644
>>> --- a/drivers/scsi/libsas/sas_expander.c
>>> +++ b/drivers/scsi/libsas/sas_expander.c
>>> @@ -1019,6 +1019,8 @@ static struct domain_device
>>> *sas_ex_discover_expander(
>>>          list_del(&child->dev_list_node);
>>>          spin_unlock_irq(&parent->port->dev_list_lock);
>>>          sas_put_device(child);
>>> +        sas_port_delete(phy->port);
>>> +        phy->port = NULL;
>>
>> This looks ok.
>>
>> However, I wonder if we miss something else, and I see this code
>> earlier in sas_ex_discover_expander():
>>
>>      kref_get(&parent->kref);
>>      child->parent = parent;
>>      child->port = port;
>>      child->iproto = phy->attached_iproto;
>>      child->tproto = phy->attached_tproto;
>>
>> I assume the kref get is for the child referencing the parent. Do we
>> need the kref put? I couldn't see it.
>>
>> If yes, maybe it should it go in sas_free_device(), along with kfree()
>> for expander device phys.
>>
>
> Yes, the parent's kref will put in sas_free_device(). So the call chain
> will be:
>
> sas_put_device
>   ->sas_free_device
>     ->sas_put_device(dev->parent);
>       ->kref_put
>

Ah, yes.

> Thanks,
> Jason
>
>> Thanks,
>> John
>>
>>>          return NULL;
>>>      }
>>>      list_add_tail(&child->siblings, &parent->ex_dev.children);
>>>
>>
>>
>>
>>
>> .
>>
>
>
> .
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] scsi: libsas: only clear phy->in_shutdown after shutdown event done
  2019-05-14  2:42 ` Jason Yan
@ 2019-05-30  2:09   ` Martin K. Petersen
  -1 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2019-05-30  2:09 UTC (permalink / raw)
  To: Jason Yan
  Cc: martin.petersen, jejb, linux-scsi, linux-kernel, hare,
	dan.j.williams, jthumshirn, hch, huangdaode, chenxiang66,
	miaoxie, john.garry, zhaohongjiang, Ewan Milne, Tomas Henzl


Jason,

> When the event queue is full of phy up and down events and reached the
> threshold, we will queue a shutdown-event, and set phy->in_shutdown so
> that we will not queue a shutdown-event again. But before the
> shutdown-event can be executed, every phy-down event will clear
> phy->in_shutdown and a new shutdown-event will be queued. The queue will
> be full of these shutdown-events.
>
> Fix this by only clear phy->in_shutdown in sas_phye_shutdown(), that is
> after the first shutdown-event has been executed.

Applied 1+2 to 5.2/scsi-fixes. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] scsi: libsas: only clear phy->in_shutdown after shutdown event done
@ 2019-05-30  2:09   ` Martin K. Petersen
  0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2019-05-30  2:09 UTC (permalink / raw)
  To: Jason Yan
  Cc: martin.petersen, jejb, linux-scsi, linux-kernel, hare,
	dan.j.williams, jthumshirn, hch, huangdaode, chenxiang66,
	miaoxie, john.garry, zhaohongjiang, Ewan Milne, Tomas Henzl


Jason,

> When the event queue is full of phy up and down events and reached the
> threshold, we will queue a shutdown-event, and set phy->in_shutdown so
> that we will not queue a shutdown-event again. But before the
> shutdown-event can be executed, every phy-down event will clear
> phy->in_shutdown and a new shutdown-event will be queued. The queue will
> be full of these shutdown-events.
>
> Fix this by only clear phy->in_shutdown in sas_phye_shutdown(), that is
> after the first shutdown-event has been executed.

Applied 1+2 to 5.2/scsi-fixes. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-30  2:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-14  2:42 [PATCH 1/2] scsi: libsas: only clear phy->in_shutdown after shutdown event done Jason Yan
2019-05-14  2:42 ` Jason Yan
2019-05-14  2:42 ` [PATCH 2/2] scsi: libsas: delete sas port if expander discover failed Jason Yan
2019-05-14  2:42   ` Jason Yan
2019-05-14 10:45   ` John Garry
2019-05-14 10:45     ` John Garry
2019-05-14 11:37     ` Jason Yan
2019-05-14 11:37       ` Jason Yan
2019-05-14 11:46       ` John Garry
2019-05-14 11:46         ` John Garry
2019-05-30  2:09 ` [PATCH 1/2] scsi: libsas: only clear phy->in_shutdown after shutdown event done Martin K. Petersen
2019-05-30  2:09   ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.