All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
To: dev@dpdk.org
Cc: shahafs@mellanox.com, yskoh@mellanox.com
Subject: [dpdk-dev] [PATCH] net/mlx5: fix event handler uninstall
Date: Sat, 25 May 2019 09:26:05 +0000	[thread overview]
Message-ID: <1558776365-28511-1-git-send-email-viacheslavo@mellanox.com> (raw)

When device is being closed and tries to unregister interrupt callback,
there is a chance the handler is still active (called in context of
eal_intr_thread_main thread). If so the rte_intr_callback_unregister
returns -EAGAIN and keeps the handler registered, causing crash when
underlaying resourse is gone away.

This race condition may happen if event handling in application takes
a long time. We should check the return code of unregistering routine
and try again to unregister the handler. The diagnostic messages are
shown once a second, while trying to unregister.

Fixes: 028b2a28c3cb ("net/mlx5: update event handler for multiport IB devices")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 79 +++++++++++++++++++++++++++++++++++++++---
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9f5ec97..2344cb4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -313,7 +313,7 @@ struct mlx5_dev_spawn_data {
 	 **/
 	assert(!sh->intr_cnt);
 	if (sh->intr_cnt)
-		rte_intr_callback_unregister
+		mlx5_intr_callback_unregister
 			(&sh->intr_handle, mlx5_dev_interrupt_handler, sh);
 	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3eaaafd..5b5b93d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -437,6 +437,8 @@ void mlx5_nl_check_switch_info(bool nun_vf_set,
 			       struct mlx5_switch_info *switch_info);
 void mlx5_translate_port_name(const char *port_name_in,
 			      struct mlx5_switch_info *port_info_out);
+void mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
+				   rte_intr_callback_fn cb_fn, void *cb_arg);
 
 /* mlx5_mac.c */
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index a8a7ece..f47297c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1226,9 +1226,80 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 }
 
+/*
+ * Unregister callback handler safely. The handler may be active
+ * while we are trying to unregister it, in this case code -EAGAIN
+ * is returned by rte_intr_callback_unregister(). This routine checks
+ * the return code and tries to unregister handler again.
+ *
+ * @param handle
+ *   interrupt handle
+ * @param cb_fn
+ *   pointer to callback routine
+ * @cb_arg
+ *   opaque callback parameter
+ */
+void
+mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
+			      rte_intr_callback_fn cb_fn, void *cb_arg)
+{
+	/*
+	 * Try to reduce timeout management overhead by not calling
+	 * the timer related routines on the first iteration. If the
+	 * unregistering succeeds on first call there will be no
+	 * timer calls at all.
+	 */
+	uint64_t twait = 0;
+	uint64_t start = 0;
+
+	do {
+		int ret;
+
+		ret = rte_intr_callback_unregister(handle, cb_fn, cb_arg);
+		if (ret >= 0)
+			return;
+		if (ret != -EAGAIN) {
+			DRV_LOG(INFO, "failed to unregister interrupt"
+				      " handler (error: %d)", ret);
+			assert(false);
+			return;
+		}
+		if (twait) {
+			struct timespec onems;
+
+			/* Wait one millisecond and try again. */
+			onems.tv_sec = 0;
+			onems.tv_nsec = NS_PER_S / MS_PER_S;
+			nanosleep(&onems, 0);
+			/* Check whether one second elapsed. */
+			if ((rte_get_timer_cycles() - start) <= twait)
+				continue;
+		} else {
+			/*
+			 * We get the amount of timer ticks for one second.
+			 * If this amount elapsed it means we spent one
+			 * second in waiting. This branch is executed once
+			 * on first iteration.
+			 */
+			twait = rte_get_timer_hz();
+			assert(twait);
+		}
+		/*
+		 * Timeout elapsed, show message (once a second) and retry.
+		 * We have no other acceptable option here, if we ignore
+		 * the unregistering return code the handler will not
+		 * be unregistered, fd will be closed and we may get the
+		 * crush. Hanging and messaging in the loop seems not to be
+		 * the worst choice.
+		 */
+		DRV_LOG(INFO, "Retrying to unregister interrupt handler");
+		start = rte_get_timer_cycles();
+	} while (true);
+}
+
 /**
  * Uninstall shared asynchronous device events handler.
- * This function is implemeted to support event sharing
+ * This function is implemented to support event sharing
  * between multiple ports of single IB device.
  *
  * @param dev
@@ -1254,7 +1325,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
 	if (!sh->intr_cnt || --sh->intr_cnt)
 		goto exit;
-	rte_intr_callback_unregister(&sh->intr_handle,
+	mlx5_intr_callback_unregister(&sh->intr_handle,
 				     mlx5_dev_interrupt_handler, sh);
 	sh->intr_handle.fd = 0;
 	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -1263,8 +1334,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Install shared asyncronous device events handler.
- * This function is implemeted to support event sharing
+ * Install shared asynchronous device events handler.
+ * This function is implemented to support event sharing
  * between multiple ports of single IB device.
  *
  * @param dev
-- 
1.8.3.1


             reply	other threads:[~2019-05-25  9:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-25  9:26 Viacheslav Ovsiienko [this message]
2019-05-26 19:16 ` [dpdk-dev] [PATCH] net/mlx5: fix event handler uninstall Shahaf Shuler
2019-05-27  4:58 ` [dpdk-dev] [PATCH v2] " Viacheslav Ovsiienko
2019-06-02  7:53   ` Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1558776365-28511-1-git-send-email-viacheslavo@mellanox.com \
    --to=viacheslavo@mellanox.com \
    --cc=dev@dpdk.org \
    --cc=shahafs@mellanox.com \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.