From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 20/22] lnet: lnd: gracefully handle unexpected events
Date: Tue, 2 Jun 2020 20:59:59 -0400 [thread overview]
Message-ID: <1591146001-27171-21-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1591146001-27171-1-git-send-email-jsimmons@infradead.org>
From: Amir Shehata <ashehata@whamcloud.com>
When a tx completes kiblnd_tx_complete() callback is invoked.
We ensure:
LASSERT (tx->tx_sending > 0);
However this assert is being triggered in some rare scenarios.
The reason tx_sending would be 0 at this point is because:
1. ib_post_send() failed but OFED stack is still sending
a tx complete event.
2. We're getting two different events for the same tx
Instead of asserting, ignore that tx_complete event and print
the tx pointer and its status.
WC-bug-id: https://jira.whamcloud.com/browse/LU-13553
Lustre-commit: 60f9f539e686f ("LU-13553 lnd: gracefully handle unexpected events")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38669
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 09a46d6..40e196d 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -969,7 +969,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
struct kib_conn *conn = tx->tx_conn;
int idle;
- LASSERT(tx->tx_sending > 0);
+ if (tx->tx_sending <= 0) {
+ CERROR("Received an event on a freed tx: %p status %d\n",
+ tx, tx->tx_status);
+ return;
+ }
if (failed) {
if (conn->ibc_state == IBLND_CONN_ESTABLISHED)
--
1.8.3.1
next prev parent reply other threads:[~2020-06-03 0:59 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-03 0:59 [lustre-devel] [PATCH 00/22] lustre: OpenSFS backport patches for May 29 2020 James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 01/22] lnet: libcfs: fix CPT handling for UP systems James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 02/22] lustre: use BIT() macro where appropriate in include James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 03/22] lustre: use BIT() macro where appropriate James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 04/22] lustre: ptlrpc: change LONG_UNLINK to PTLRPC_REQ_LONG_UNLINK James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 05/22] lustre: llite: use %pd to report dentry names James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 06/22] lnet: tidy lnet_discover and fix mem accounting bug James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 07/22] lustre: llite: prevent MAX_DIO_SIZE 32-bit truncation James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 08/22] lustre: llite: integrate statx() API with Lustre James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 09/22] lustre: ldlm: no current source if lu_ref_del not in same tsk James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 10/22] lnet: always pass struct lnet_md by reference James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 11/22] lustre: llite: fix read if readahead window smaller than rpc size James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 12/22] lustre: obdclass: bind zombie export cleanup workqueue James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 13/22] lnet: handle discovery off properly James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 14/22] lnet: Force full discovery cycle James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 15/22] lnet: set route aliveness properly James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 16/22] lnet: Correct the default LND timeout James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 17/22] lnet: Add lnet_lnd_timeout to sysfs James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 18/22] lnet: lnd: Allow independent ko2iblnd timeout James Simmons
2020-06-03 0:59 ` [lustre-devel] [PATCH 19/22] lnet: lnd: Allow independent socklnd timeout James Simmons
2020-06-03 0:59 ` James Simmons [this message]
2020-06-03 1:00 ` [lustre-devel] [PATCH 21/22] lustre: update version to 2.13.54 James Simmons
2020-06-03 1:00 ` [lustre-devel] [PATCH 22/22] lnet: procs: print new line based on distro James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1591146001-27171-21-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=lustre-devel@lists.lustre.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.