All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] [PATCH 09/25] lustre: lnet: prevent assert on ln_state
Date: Tue, 25 Sep 2018 22:48:01 -0400	[thread overview]
Message-ID: <1537930097-11624-10-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org>

From: Amir Shehata <ashehata@whamcloud.com>

lnet_peer_primary_nid() is called from lnet_parse. It checks
ln_state outside the net lock, causing a race condition
during shutdown where the code expects the state to be
running, but it's stopping or shutdown.

Fixed the issue by renaming lnet_peer_primary_nid() to
lnet_peer_primary_nid_locked(). This function is now called
when lnet_net_lock is held in lnet_parse().

In lnet_create_reply_msg() we already have access to the
msg_txpeer, so we lookup the primary_nid directly

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-9549
Reviewed-on: https://review.whamcloud.com/27262
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 drivers/staging/lustre/include/linux/lnet/lib-lnet.h | 2 +-
 drivers/staging/lustre/lnet/lnet/lib-move.c          | 7 +++----
 drivers/staging/lustre/lnet/lnet/peer.c              | 5 +----
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index f510b9e..6bfdc9b 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -653,7 +653,7 @@ struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer,
 struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt);
 struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid);
 void lnet_peer_net_added(struct lnet_net *net);
-lnet_nid_t lnet_peer_primary_nid(lnet_nid_t nid);
+lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid);
 void lnet_peer_tables_cleanup(struct lnet_net *net);
 void lnet_peer_uninit(void);
 int lnet_peer_tables_create(void);
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index d533b8e..2cf9c89 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -2338,8 +2338,6 @@
 		msg->msg_hdr.dest_pid	= dest_pid;
 		msg->msg_hdr.payload_length = payload_length;
 	}
-	/* Multi-Rail: Primary NID of source. */
-	msg->msg_initiator = lnet_peer_primary_nid(src_nid);
 
 	lnet_net_lock(cpt);
 	lpni = lnet_nid2peerni_locked(from_nid, cpt);
@@ -2357,6 +2355,8 @@
 	msg->msg_rxpeer = lpni;
 	msg->msg_rxni = ni;
 	lnet_ni_addref_locked(ni, cpt);
+	/* Multi-Rail: Primary NID of source. */
+	msg->msg_initiator = lnet_peer_primary_nid_locked(src_nid);
 
 	if (lnet_isrouter(msg->msg_rxpeer)) {
 		lnet_peer_set_alive(msg->msg_rxpeer);
@@ -2658,8 +2658,7 @@ struct lnet_msg *
 	       libcfs_nid2str(ni->ni_nid), libcfs_id2str(peer_id), getmd);
 
 	/* setup information for lnet_build_msg_event */
-	msg->msg_initiator = lnet_peer_primary_nid(peer_id.nid);
-	/* Cheaper: msg->msg_initiator = getmsg->msg_txpeer->lp_nid; */
+	msg->msg_initiator = getmsg->msg_txpeer->lpni_peer_net->lpn_peer->lp_primary_nid;
 	msg->msg_from = peer_id.nid;
 	msg->msg_type = LNET_MSG_GET; /* flag this msg as an "optimized" GET */
 	msg->msg_hdr.src_nid = peer_id.nid;
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index 2fbf93a..ebb8435 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -593,19 +593,16 @@ struct lnet_peer_ni *
 }
 
 lnet_nid_t
-lnet_peer_primary_nid(lnet_nid_t nid)
+lnet_peer_primary_nid_locked(lnet_nid_t nid)
 {
 	struct lnet_peer_ni *lpni;
 	lnet_nid_t primary_nid = nid;
-	int cpt;
 
-	cpt = lnet_net_lock_current();
 	lpni = lnet_find_peer_ni_locked(nid);
 	if (lpni) {
 		primary_nid = lpni->lpni_peer_net->lpn_peer->lp_primary_nid;
 		lnet_peer_ni_decref_locked(lpni);
 	}
-	lnet_net_unlock(cpt);
 
 	return primary_nid;
 }
-- 
1.8.3.1

  parent reply	other threads:[~2018-09-26  2:48 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-26  2:47 [lustre-devel] [PATCH 00/25] lustre: lnet: remaining fixes for multi-rail James Simmons
2018-09-26  2:47 ` [lustre-devel] [PATCH 01/25] lustre: lnet: remove ni from lnet_finalize James Simmons
2018-09-26 23:57   ` NeilBrown
2018-09-30  2:19     ` James Simmons
2018-10-02  4:24       ` NeilBrown
2018-09-26  2:47 ` [lustre-devel] [PATCH 02/25] lustre: lnet: Allow min stats to be reset in peers and nis James Simmons
2018-09-26 23:59   ` NeilBrown
2018-09-26  2:47 ` [lustre-devel] [PATCH 03/25] lustre: lnet: remove debug ioctl James Simmons
2018-09-26  2:47 ` [lustre-devel] [PATCH 04/25] lustre: lnet: Normalize ioctl interface James Simmons
2018-09-26  2:47 ` [lustre-devel] [PATCH 05/25] lustre: lnet: fix race in lnet shutdown path James Simmons
2018-09-27  0:03   ` NeilBrown
2018-09-27  1:14     ` NeilBrown
2018-09-26  2:47 ` [lustre-devel] [PATCH 06/25] lustre: lnet: loopback NID in lnet_select_pathway() James Simmons
2018-09-26  2:47 ` [lustre-devel] [PATCH 07/25] lustre: lnet: rename LNET_MAX_INTERFACES James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 08/25] lustre: lnet: selftest MR fix James Simmons
2018-09-26  2:48 ` James Simmons [this message]
2018-09-26  2:48 ` [lustre-devel] [PATCH 10/25] lustre: lnet: increment per NI stats James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 11/25] lustre: lnet: Fix lost lock James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 12/25] lustre: lnet: correct locking in legacy add net James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 13/25] lustre: lnet: fix lnet_cpt_of_md() James Simmons
2018-09-27  1:03   ` NeilBrown
2018-09-27  1:17     ` NeilBrown
2018-09-26  2:48 ` [lustre-devel] [PATCH 14/25] lustre: lnet: safe access to msg James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 15/25] lustre: o2iblnd: reconnect peer for REJ_INVALID_SERVICE_ID James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 16/25] lustre: o2iblnd: kill timedout txs from ibp_tx_queue James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 17/25] lustre: o2iblnd: multiple sges for work request James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 18/25] lustre: lnd: Turn on 2 sges by default James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 19/25] lustre: lnd: Don't Assert On Reconnect with MultiQP James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 20/25] lustre: lnet: handle empty CPTs James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 21/25] lustre: lnet: set LND tunables properly James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 22/25] lustre: lnd: Don't Page Align remote_addr with FastReg James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 23/25] lustre: lnd: pending transmits dropped silently James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 24/25] lustre: socklnd: propagate errors on send failure James Simmons
2018-09-26  2:48 ` [lustre-devel] [PATCH 25/25] lustre: ko2iblnd: allow for discontiguous fragments James Simmons
2018-09-27  1:19 ` [lustre-devel] [PATCH 00/25] lustre: lnet: remaining fixes for multi-rail NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1537930097-11624-10-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.