All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Serguei Smirnov <ssmirnov@whamcloud.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 18/20] lnet: o2iblnd: fix deadline for tx on peer queue
Date: Fri, 14 Oct 2022 17:38:09 -0400	[thread overview]
Message-ID: <1665783491-13827-19-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1665783491-13827-1-git-send-email-jsimmons@infradead.org>

From: Serguei Smirnov <ssmirnov@whamcloud.com>

In o2iblnd, deadline is checked for txs on peer queue,
but not set prior to adding the tx to the queue. This
may cause the tx to be dropped unnecessarily with
"Timed out tx for ..." warning.

Fix it by setting the tx_deadline when adding tx to peer queue.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16184
Lustre-commit: 4c89ee7d7b098c7f1 ("LU-16184 o2iblnd: fix deadline for tx on peer queue")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48640
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 919b83d5c6e2..6f040964121c 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1422,6 +1422,7 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)
 	int rc;
 	int i;
 	struct lnet_ioctl_config_o2iblnd_tunables *tunables;
+	s64 timeout_ns;
 
 	/*
 	 * If I get here, I've committed to send, so I complete the tx with
@@ -1450,6 +1451,7 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)
 		return;
 	}
 
+	timeout_ns = kiblnd_timeout() * NSEC_PER_SEC;
 	read_unlock(g_lock);
 	/* Re-try with a write lock */
 	write_lock(g_lock);
@@ -1459,9 +1461,12 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)
 		if (list_empty(&peer_ni->ibp_conns)) {
 			/* found a peer_ni, but it's still connecting... */
 			LASSERT(kiblnd_peer_connecting(peer_ni));
-			if (tx)
+			if (tx) {
+				tx->tx_deadline = ktime_add_ns(ktime_get(),
+							       timeout_ns);
 				list_add_tail(&tx->tx_list,
 					      &peer_ni->ibp_tx_queue);
+			}
 			write_unlock_irqrestore(g_lock, flags);
 		} else {
 			conn = kiblnd_get_conn_locked(peer_ni);
@@ -1498,9 +1503,12 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)
 		if (list_empty(&peer2->ibp_conns)) {
 			/* found a peer_ni, but it's still connecting... */
 			LASSERT(kiblnd_peer_connecting(peer2));
-			if (tx)
+			if (tx) {
+				tx->tx_deadline = ktime_add_ns(ktime_get(),
+							       timeout_ns);
 				list_add_tail(&tx->tx_list,
 					      &peer2->ibp_tx_queue);
+			}
 			write_unlock_irqrestore(g_lock, flags);
 		} else {
 			conn = kiblnd_get_conn_locked(peer2);
@@ -1525,8 +1533,10 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)
 	/* always called with a ref on ni, which prevents ni being shutdown */
 	LASSERT(!((struct kib_net *)ni->ni_data)->ibn_shutdown);
 
-	if (tx)
+	if (tx) {
+		tx->tx_deadline = ktime_add_ns(ktime_get(), timeout_ns);
 		list_add_tail(&tx->tx_list, &peer_ni->ibp_tx_queue);
+	}
 
 	kiblnd_peer_addref(peer_ni);
 	hash_add(kiblnd_data.kib_peers, &peer_ni->ibp_list, nid);
-- 
2.27.0

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2022-10-14 21:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 21:37 [lustre-devel] [PATCH 00/20] lustre: backport OpenSFS work as of Oct 14, 2022 James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 01/20] lustre: ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs() James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 02/20] lustre: obdclass: set OBD_MD_FLGROUP for ladvise RPC James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 03/20] lustre: obdclass: free inst_name correctly James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 04/20] lustre: osc: take ldlm lock when queue sync pages James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 05/20] lnet: track pinginfo size in bytes, not nis James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 06/20] lnet: add iface index to struct lnet_inetdev James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 07/20] lnet: ksocklnd: support IPv6 in ksocknal_ip2index() James Simmons
2022-10-14 21:37 ` [lustre-devel] [PATCH 08/20] lnet: only use PUBLIC IP6 addresses for connections James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 09/20] lustre: osc: Remove oap_magic James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 10/20] lustre: ptlrpc: add assert for ptlrpc_service_purge_all James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 11/20] lustre: ptlrpc: lower the message level in no resend case James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 12/20] lustre: obdclass: user netlink to collect devices information James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 13/20] lnet: use %pISc for formatting IP addresses James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 14/20] lustre: llog: correct llog FID and path output James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 15/20] lnet: o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 16/20] lnet: socklnd: remove remnants of tcp bonding James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 17/20] lnet: Router test interop check and aarch fix James Simmons
2022-10-14 21:38 ` James Simmons [this message]
2022-10-14 21:38 ` [lustre-devel] [PATCH 19/20] lnet: o2iblnd: detect link state to set fatal error on ni James Simmons
2022-10-14 21:38 ` [lustre-devel] [PATCH 20/20] lnet: socklnd: limit retries on conns_per_peer mismatch James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1665783491-13827-19-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    --cc=ssmirnov@whamcloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.