lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 20/20] lnet: o2iblnd: 'Timed out tx' error message
Date: Sat, 13 Jun 2020 12:27:16 -0400	[thread overview]
Message-ID: <1592065636-28333-21-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1592065636-28333-1-git-send-email-jsimmons@infradead.org>

From: Sonia Sharma <sharmaso@whamcloud.com>

Fix the error message in kiblnd_check_txs_locked()
to report the total RDMA time outstanding rather
than the number of seconds past the deadline.

This patch also adds time_on_activeq to struct kib_tx
so the time spent by tx in internal queue and active
queue can be tracked and reported. This would help
in diagnosing the issue.

WC-bug-id: https://jira.whamcloud.com/browse/LU-1742
Lustre-commit: 7308662efc02f ("LU-1742 o2iblnd: 'Timed out tx' error message")
Signed-off-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/33235
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephen Champion <stephen.champion@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  2 ++
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 26 ++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index f60a69d..dc09e5e 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -541,6 +541,8 @@ struct kib_tx {					/* transmit message */
 	bool			tx_gaps;
 	struct kib_fmr		tx_fmr;		/* FMR */
 	int			tx_dmadir;	/* dma direction */
+	/* time when tx added on ibc_active_txs */
+	ktime_t			tx_on_activeq;
 };
 
 struct kib_connvars {
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 40e196d..f421cdf 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -821,6 +821,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	 */
 	tx->tx_sending++;
 	list_add(&tx->tx_list, &conn->ibc_active_txs);
+	tx->tx_on_activeq = ktime_get();
 
 	/* I'm still holding ibc_lock! */
 	if (conn->ibc_state != IBLND_CONN_ESTABLISHED) {
@@ -3169,6 +3170,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 static int
 kiblnd_check_txs_locked(struct kib_conn *conn, struct list_head *txs)
 {
+	bool active_txs = strcmp(kiblnd_queue2str(conn, txs),
+				 "active_txs") == 0;
 	struct kib_tx *tx;
 
 	list_for_each_entry(tx, txs, tx_list) {
@@ -3179,13 +3182,28 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 			LASSERT(tx->tx_waiting || tx->tx_sending);
 		}
 
-		if (ktime_compare(ktime_get(), tx->tx_deadline) >= 0) {
-			CERROR("Timed out tx: %s, %lld seconds\n",
+		if (ktime_compare(ktime_get(), tx->tx_deadline) < 0)
+			continue;
+
+		if (!active_txs) {
+			CERROR("Timed out tx: %s, outstanding RDMA time: %lld sec\n",
 			       kiblnd_queue2str(conn, txs),
+			       *kiblnd_tunables.kib_timeout +
+			       (ktime_ms_delta(ktime_get(),
+					      tx->tx_deadline) / MSEC_PER_SEC));
+		} else {
+			CERROR("Timed out tx: %s, time in internal queue: %lld sec, time in active queue: %lld sec, outstanding RDMA time: %lld sec\n",
+			       kiblnd_queue2str(conn, txs),
+			       ktime_ms_delta(tx->tx_deadline,
+					      tx->tx_on_activeq) / MSEC_PER_SEC,
 			       ktime_ms_delta(ktime_get(),
-					      tx->tx_deadline) / MSEC_PER_SEC);
-			return 1;
+					      tx->tx_on_activeq) / MSEC_PER_SEC,
+			       *kiblnd_tunables.kib_timeout +
+			       (ktime_ms_delta(ktime_get(),
+					       tx->tx_deadline) / MSEC_PER_SEC));
 		}
+
+		return 1;
 	}
 
 	return 0;
-- 
1.8.3.1

      parent reply	other threads:[~2020-06-13 16:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-13 16:26 [lustre-devel] [PATCH 00/20] lustre: patches landed for week of June 8 2020 James Simmons
2020-06-13 16:26 ` [lustre-devel] [PATCH 01/20] lnet: fix kmalloc size in config.c James Simmons
2020-06-13 16:26 ` [lustre-devel] [PATCH 02/20] lnet: test against LNET_STATE_RUNNING rather than LNET_STATE_SHUTDOWN James Simmons
2020-06-13 16:26 ` [lustre-devel] [PATCH 03/20] lnet: use lnet_md_free in lnet_res_container_cleanup() James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 04/20] lustre: obdclass: discard process_quota_config James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 05/20] lnet: socklnd: remove comments about "darwin" James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 06/20] lustre: uapi: change LUSTRE_*_FL defines to enum James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 07/20] lustre: SEL: Add flag & setstripe support James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 08/20] lustre: lmv: check stripe FID sanity James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 09/20] lustre: ptlrpc: Clear bd_registered in ptlrpc_unregister_bulk James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 10/20] lustre: dne: directory restripe and auto split James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 11/20] lustre: sec: documentation for client-side encryption James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 12/20] lustre: sec: enable client side encryption James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 13/20] lustre: ptlrpc: separate number MD and refrences for bulk James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 14/20] lustre: ptlrpc: fill md correctly James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 15/20] lustre: llite: don't check mirror info for page discard James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 16/20] lustre: sec: control client side encryption James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 17/20] lnet: fix uninitialize var in choose_ipv4_src() James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 18/20] lustre: obd: Rename OS_STATE flags to OS_STATFS James Simmons
2020-06-13 16:27 ` [lustre-devel] [PATCH 19/20] lustre: mdc: allow setting max_mod_rpcs_in_flight larger James Simmons
2020-06-13 16:27 ` James Simmons [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1592065636-28333-21-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).