All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/3] s390/qeth: fixes 2021-09-21
@ 2021-09-21 14:52 Julian Wiedmann
  2021-09-21 14:52 ` [PATCH net 1/3] s390/qeth: fix NULL deref in qeth_clear_working_pool_list() Julian Wiedmann
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Julian Wiedmann @ 2021-09-21 14:52 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: linux-netdev, linux-s390, Heiko Carstens, Karsten Graul,
	Julian Wiedmann, Alexandra Winter

Hi Dave & Jakub,

please apply the following patch series for qeth to netdev's net tree.

This brings two fixes for deadlocks when a device is removed while it
has certain types of async work pending. And one additional fix for a
missing NULL check in an error case.

Thanks,
Julian

Alexandra Winter (2):
  s390/qeth: Fix deadlock in remove_discipline
  s390/qeth: fix deadlock during failing recovery

Julian Wiedmann (1):
  s390/qeth: fix NULL deref in qeth_clear_working_pool_list()

 arch/s390/include/asm/ccwgroup.h  |  2 +-
 drivers/s390/cio/ccwgroup.c       | 10 ++++++++--
 drivers/s390/net/qeth_core.h      |  1 -
 drivers/s390/net/qeth_core_main.c | 22 +++++++++-------------
 drivers/s390/net/qeth_l2_main.c   |  1 -
 drivers/s390/net/qeth_l3_main.c   |  1 -
 6 files changed, 18 insertions(+), 19 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net 1/3] s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
  2021-09-21 14:52 [PATCH net 0/3] s390/qeth: fixes 2021-09-21 Julian Wiedmann
@ 2021-09-21 14:52 ` Julian Wiedmann
  2021-09-21 14:52 ` [PATCH net 2/3] s390/qeth: Fix deadlock in remove_discipline Julian Wiedmann
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Julian Wiedmann @ 2021-09-21 14:52 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: linux-netdev, linux-s390, Heiko Carstens, Karsten Graul,
	Julian Wiedmann, Alexandra Winter, Stefan Raspl

When qeth_set_online() calls qeth_clear_working_pool_list() to roll
back after an error exit from qeth_hardsetup_card(), we are at risk of
accessing card->qdio.in_q before it was allocated by
qeth_alloc_qdio_queues() via qeth_mpc_initialize().

qeth_clear_working_pool_list() then dereferences NULL, and by writing to
queue->bufs[i].pool_entry scribbles all over the CPU's lowcore.
Resulting in a crash when those lowcore areas are used next (eg. on
the next machine-check interrupt).

Such a scenario would typically happen when the device is first set
online and its queues aren't allocated yet. An early IO error or certain
misconfigs (eg. mismatched transport mode, bad portno) then cause us to
error out from qeth_hardsetup_card() with card->qdio.in_q still being
NULL.

Fix it by checking the pointer for NULL before accessing it.

Note that we also have (rare) paths inside qeth_mpc_initialize() where
a configuration change can cause us to free the existing queues,
expecting that subsequent code will allocate them again. If we then
error out before that re-allocation happens, the same bug occurs.

Fixes: eff73e16ee11 ("s390/qeth: tolerate pre-filled RX buffer")
Reported-by: Stefan Raspl <raspl@linux.ibm.com>
Root-caused-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 41ca6273b750..3fba440a0731 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -202,6 +202,9 @@ static void qeth_clear_working_pool_list(struct qeth_card *card)
 				 &card->qdio.in_buf_pool.entry_list, list)
 		list_del(&pool_entry->list);
 
+	if (!queue)
+		return;
+
 	for (i = 0; i < ARRAY_SIZE(queue->bufs); i++)
 		queue->bufs[i].pool_entry = NULL;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net 2/3] s390/qeth: Fix deadlock in remove_discipline
  2021-09-21 14:52 [PATCH net 0/3] s390/qeth: fixes 2021-09-21 Julian Wiedmann
  2021-09-21 14:52 ` [PATCH net 1/3] s390/qeth: fix NULL deref in qeth_clear_working_pool_list() Julian Wiedmann
@ 2021-09-21 14:52 ` Julian Wiedmann
  2021-09-21 14:52 ` [PATCH net 3/3] s390/qeth: fix deadlock during failing recovery Julian Wiedmann
  2021-09-22  3:10 ` [PATCH net 0/3] s390/qeth: fixes 2021-09-21 patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Julian Wiedmann @ 2021-09-21 14:52 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: linux-netdev, linux-s390, Heiko Carstens, Karsten Graul,
	Julian Wiedmann, Alexandra Winter

From: Alexandra Winter <wintera@linux.ibm.com>

Problem: qeth_close_dev_handler is a worker that tries to acquire
card->discipline_mutex via drv->set_offline() in ccwgroup_set_offline().
Since commit b41b554c1ee7
("s390/qeth: fix locking for discipline setup / removal")
qeth_remove_discipline() is called under card->discipline_mutex and
cancels the work and waits for it to finish.

STOPLAN reception with reason code IPA_RC_VEPA_TO_VEB_TRANSITION is the
only situation that schedules close_dev_work. In that situation scheduling
qeth recovery will also result in an offline interface, when resetting the
isolation mode fails, if the external switch is still set to VEB.
And since commit 0b9902c1fcc5 ("s390/qeth: fix deadlock during recovery")
qeth recovery does not aquire card->discipline_mutex anymore.

So we accept the longer pathlength of qeth_schedule_recovery in this
error situation and re-use the existing function.

As a side-benefit this changes the hwtrap to behave like during recovery
instead of like during a user-triggered set_offline.

Fixes: b41b554c1ee7 ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core.h      |  1 -
 drivers/s390/net/qeth_core_main.c | 16 ++++------------
 drivers/s390/net/qeth_l2_main.c   |  1 -
 drivers/s390/net/qeth_l3_main.c   |  1 -
 4 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 535a60b3946d..a5aa0bdc61d6 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -858,7 +858,6 @@ struct qeth_card {
 	struct napi_struct napi;
 	struct qeth_rx rx;
 	struct delayed_work buffer_reclaim_work;
-	struct work_struct close_dev_work;
 };
 
 static inline bool qeth_card_hw_is_reachable(struct qeth_card *card)
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 3fba440a0731..9f26706051e5 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -70,15 +70,6 @@ static void qeth_issue_next_read_cb(struct qeth_card *card,
 static int qeth_qdio_establish(struct qeth_card *);
 static void qeth_free_qdio_queues(struct qeth_card *card);
 
-static void qeth_close_dev_handler(struct work_struct *work)
-{
-	struct qeth_card *card;
-
-	card = container_of(work, struct qeth_card, close_dev_work);
-	QETH_CARD_TEXT(card, 2, "cldevhdl");
-	ccwgroup_set_offline(card->gdev);
-}
-
 static const char *qeth_get_cardname(struct qeth_card *card)
 {
 	if (IS_VM_NIC(card)) {
@@ -795,10 +786,12 @@ static struct qeth_ipa_cmd *qeth_check_ipa_data(struct qeth_card *card,
 	case IPA_CMD_STOPLAN:
 		if (cmd->hdr.return_code == IPA_RC_VEPA_TO_VEB_TRANSITION) {
 			dev_err(&card->gdev->dev,
-				"Interface %s is down because the adjacent port is no longer in reflective relay mode\n",
+				"Adjacent port of interface %s is no longer in reflective relay mode, trigger recovery\n",
 				netdev_name(card->dev));
-			schedule_work(&card->close_dev_work);
+			/* Set offline, then probably fail to set online: */
+			qeth_schedule_recovery(card);
 		} else {
+			/* stay online for subsequent STARTLAN */
 			dev_warn(&card->gdev->dev,
 				 "The link for interface %s on CHPID 0x%X failed\n",
 				 netdev_name(card->dev), card->info.chpid);
@@ -1540,7 +1533,6 @@ static void qeth_setup_card(struct qeth_card *card)
 	INIT_LIST_HEAD(&card->ipato.entries);
 	qeth_init_qdio_info(card);
 	INIT_DELAYED_WORK(&card->buffer_reclaim_work, qeth_buffer_reclaim_work);
-	INIT_WORK(&card->close_dev_work, qeth_close_dev_handler);
 	hash_init(card->rx_mode_addrs);
 	hash_init(card->local_addrs4);
 	hash_init(card->local_addrs6);
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 72e84ff9fea5..dc6c00768d91 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -2307,7 +2307,6 @@ static void qeth_l2_remove_device(struct ccwgroup_device *gdev)
 	if (gdev->state == CCWGROUP_ONLINE)
 		qeth_set_offline(card, card->discipline, false);
 
-	cancel_work_sync(&card->close_dev_work);
 	if (card->dev->reg_state == NETREG_REGISTERED) {
 		priv = netdev_priv(card->dev);
 		if (priv->brport_features & BR_LEARNING_SYNC) {
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 3a523e700a5a..6fd3e288f059 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -1969,7 +1969,6 @@ static void qeth_l3_remove_device(struct ccwgroup_device *cgdev)
 	if (cgdev->state == CCWGROUP_ONLINE)
 		qeth_set_offline(card, card->discipline, false);
 
-	cancel_work_sync(&card->close_dev_work);
 	if (card->dev->reg_state == NETREG_REGISTERED)
 		unregister_netdev(card->dev);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net 3/3] s390/qeth: fix deadlock during failing recovery
  2021-09-21 14:52 [PATCH net 0/3] s390/qeth: fixes 2021-09-21 Julian Wiedmann
  2021-09-21 14:52 ` [PATCH net 1/3] s390/qeth: fix NULL deref in qeth_clear_working_pool_list() Julian Wiedmann
  2021-09-21 14:52 ` [PATCH net 2/3] s390/qeth: Fix deadlock in remove_discipline Julian Wiedmann
@ 2021-09-21 14:52 ` Julian Wiedmann
  2021-09-22  3:10 ` [PATCH net 0/3] s390/qeth: fixes 2021-09-21 patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Julian Wiedmann @ 2021-09-21 14:52 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski
  Cc: linux-netdev, linux-s390, Heiko Carstens, Karsten Graul,
	Julian Wiedmann, Alexandra Winter

From: Alexandra Winter <wintera@linux.ibm.com>

Commit 0b9902c1fcc5 ("s390/qeth: fix deadlock during recovery") removed
taking discipline_mutex inside qeth_do_reset(), fixing potential
deadlocks. An error path was missed though, that still takes
discipline_mutex and thus has the original deadlock potential.

Intermittent deadlocks were seen when a qeth channel path is configured
offline, causing a race between qeth_do_reset and ccwgroup_remove.
Call qeth_set_offline() directly in the qeth_do_reset() error case and
then a new variant of ccwgroup_set_offline(), without taking
discipline_mutex.

Fixes: b41b554c1ee7 ("s390/qeth: fix locking for discipline setup / removal")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 arch/s390/include/asm/ccwgroup.h  |  2 +-
 drivers/s390/cio/ccwgroup.c       | 10 ++++++++--
 drivers/s390/net/qeth_core_main.c |  3 ++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/s390/include/asm/ccwgroup.h b/arch/s390/include/asm/ccwgroup.h
index 36dbf5043fc0..aa995d91cd1d 100644
--- a/arch/s390/include/asm/ccwgroup.h
+++ b/arch/s390/include/asm/ccwgroup.h
@@ -55,7 +55,7 @@ int ccwgroup_create_dev(struct device *root, struct ccwgroup_driver *gdrv,
 			int num_devices, const char *buf);
 
 extern int ccwgroup_set_online(struct ccwgroup_device *gdev);
-extern int ccwgroup_set_offline(struct ccwgroup_device *gdev);
+int ccwgroup_set_offline(struct ccwgroup_device *gdev, bool call_gdrv);
 
 extern int ccwgroup_probe_ccwdev(struct ccw_device *cdev);
 extern void ccwgroup_remove_ccwdev(struct ccw_device *cdev);
diff --git a/drivers/s390/cio/ccwgroup.c b/drivers/s390/cio/ccwgroup.c
index 2ec741106cb6..f0538609dfe4 100644
--- a/drivers/s390/cio/ccwgroup.c
+++ b/drivers/s390/cio/ccwgroup.c
@@ -77,12 +77,13 @@ EXPORT_SYMBOL(ccwgroup_set_online);
 /**
  * ccwgroup_set_offline() - disable a ccwgroup device
  * @gdev: target ccwgroup device
+ * @call_gdrv: Call the registered gdrv set_offline function
  *
  * This function attempts to put the ccwgroup device into the offline state.
  * Returns:
  *  %0 on success and a negative error value on failure.
  */
-int ccwgroup_set_offline(struct ccwgroup_device *gdev)
+int ccwgroup_set_offline(struct ccwgroup_device *gdev, bool call_gdrv)
 {
 	struct ccwgroup_driver *gdrv = to_ccwgroupdrv(gdev->dev.driver);
 	int ret = -EINVAL;
@@ -91,11 +92,16 @@ int ccwgroup_set_offline(struct ccwgroup_device *gdev)
 		return -EAGAIN;
 	if (gdev->state == CCWGROUP_OFFLINE)
 		goto out;
+	if (!call_gdrv) {
+		ret = 0;
+		goto offline;
+	}
 	if (gdrv->set_offline)
 		ret = gdrv->set_offline(gdev);
 	if (ret)
 		goto out;
 
+offline:
 	gdev->state = CCWGROUP_OFFLINE;
 out:
 	atomic_set(&gdev->onoff, 0);
@@ -124,7 +130,7 @@ static ssize_t ccwgroup_online_store(struct device *dev,
 	if (value == 1)
 		ret = ccwgroup_set_online(gdev);
 	else if (value == 0)
-		ret = ccwgroup_set_offline(gdev);
+		ret = ccwgroup_set_offline(gdev, true);
 	else
 		ret = -EINVAL;
 out:
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 9f26706051e5..e9807d2996a9 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5514,7 +5514,8 @@ static int qeth_do_reset(void *data)
 		dev_info(&card->gdev->dev,
 			 "Device successfully recovered!\n");
 	} else {
-		ccwgroup_set_offline(card->gdev);
+		qeth_set_offline(card, disc, true);
+		ccwgroup_set_offline(card->gdev, false);
 		dev_warn(&card->gdev->dev,
 			 "The qeth device driver failed to recover an error on the device\n");
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net 0/3] s390/qeth: fixes 2021-09-21
  2021-09-21 14:52 [PATCH net 0/3] s390/qeth: fixes 2021-09-21 Julian Wiedmann
                   ` (2 preceding siblings ...)
  2021-09-21 14:52 ` [PATCH net 3/3] s390/qeth: fix deadlock during failing recovery Julian Wiedmann
@ 2021-09-22  3:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-09-22  3:10 UTC (permalink / raw)
  To: Julian Wiedmann; +Cc: davem, kuba, netdev, linux-s390, hca, kgraul, wintera

Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Tue, 21 Sep 2021 16:52:14 +0200 you wrote:
> Hi Dave & Jakub,
> 
> please apply the following patch series for qeth to netdev's net tree.
> 
> This brings two fixes for deadlocks when a device is removed while it
> has certain types of async work pending. And one additional fix for a
> missing NULL check in an error case.
> 
> [...]

Here is the summary with links:
  - [net,1/3] s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
    https://git.kernel.org/netdev/net/c/248f064af222
  - [net,2/3] s390/qeth: Fix deadlock in remove_discipline
    https://git.kernel.org/netdev/net/c/ee909d0b1dac
  - [net,3/3] s390/qeth: fix deadlock during failing recovery
    https://git.kernel.org/netdev/net/c/d2b59bd4b06d

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-22  3:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 14:52 [PATCH net 0/3] s390/qeth: fixes 2021-09-21 Julian Wiedmann
2021-09-21 14:52 ` [PATCH net 1/3] s390/qeth: fix NULL deref in qeth_clear_working_pool_list() Julian Wiedmann
2021-09-21 14:52 ` [PATCH net 2/3] s390/qeth: Fix deadlock in remove_discipline Julian Wiedmann
2021-09-21 14:52 ` [PATCH net 3/3] s390/qeth: fix deadlock during failing recovery Julian Wiedmann
2021-09-22  3:10 ` [PATCH net 0/3] s390/qeth: fixes 2021-09-21 patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.