All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre
@ 2018-09-07  0:49 NeilBrown
  2018-09-07  0:49 ` [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf NeilBrown
                   ` (34 more replies)
  0 siblings, 35 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

The following series implements the first patch in the
multi-rail series:
Commit: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")

I split that commit up into 40 individual commits which can be found
at
  https://github.com/neilbrown/lustre/commits/multirail
though you need to scroll down a bit, as that contains all the
multi-rail series.

I then ported most of these patches to my mainline tree.
Some that I haven't included are:
- lnet: Move lnet_msg_alloc/free down a bit.
    lnet_msg_alloc/free don't exist any more
- lnet: lib-types: change some tabs to spaces
- lnet - assorted whitespace changes.
- lnet: change ni_last_alive from time64_t to long
- lnet: add lnet_net_state
    net_state is never used.
- lnet: remove 'static' from lnet_get_net_config()

I've also made a couple of minor changes to individual patches not
strictly related to porting (the net_prio field is never used, so I
never added it - I should have made that a separate patch).

This series compiles, but doesn't work.  I get a NULL pointer
reference, then an assertion failure.  If I fix those, it hangs.
The NULL pointer ref and the failing assertion are gone with
later patches, so I hope the other problems are too.

Some of these patches have very poor descriptions, such as "I have no
idea what this does".  If someone would like to explain - or maybe say
"Oh, we really shouldn't have done that", I'd be very happy to
receive that, and update the description or patch accordingly.

These will all appear in my lustre-testing branch, but won't migrate
to 'lustre' until I, at least, have enough other patches that I can
get a successful test run.

Review and comments always welcome.

Thanks,
NeilBrown


---

Amir Shehata (1):
      Completely re-write lnet_parse_networks().

NeilBrown (33):
      struct lnet_ni - reformat comments.
      lnet: Create struct lnet_net
      lnet: struct lnet_ni: move ni_lnd to lnet_net
      lnet: embed lnd_tunables in lnet_ni
      lnet: begin separating "networks" from "network interfaces".
      lnet: store separate xmit/recv net-interface in each message.
      lnet: change lnet_peer to reference the net, rather than ni.
      lnet: add cpt to lnet_match_info.
      lnet: add list of cpts to lnet_net.
      lnet: add ni arg to lnet_cpt_of_nid()
      lnet: pass tun to lnet_startup_lndni, instead of full conf
      lnet: split lnet_startup_lndni
      lnet: reverse order of lnet_startup_lnd{net,ni}
      lnet: rename lnet_find_net_locked to lnet_find_rnet_locked
      lnet: extend zombie handling to nets and nis
      lnet: lnet_shutdown_lndnets - remove some cleanup code.
      lnet: move lnet_shutdown_lndnets down to after first use
      lnet: add ni_state
      lnet: simplify lnet_islocalnet()
      lnet: discard ni_cpt_list
      lnet: add net_ni_added
      lnet: don't take reference in lnet_XX2ni_locked()
      lnet: don't need lock to test ln_shutdown.
      lnet: don't take lock over lnet_net_unique()
      lnet: swap 'then' and 'else' branches in lnet_startup_lndnet
      lnet: only valid lnd_type when net_id is unique.
      lnet: make it possible to add a new interface to a network
      lnet: add checks to ensure network interface names are unique.
      lnet: track tunables in lnet_startup_lndnet()
      lnet: fix typo
      lnet: lnet_dyn_add_ni: fix ping_info count
      lnet: lnet_dyn_del_ni: fix ping_info count
      lnet: introduce use_tcp_bonding mod param


 .../staging/lustre/include/linux/lnet/lib-lnet.h   |   31 -
 .../staging/lustre/include/linux/lnet/lib-types.h  |  142 ++-
 .../lustre/include/uapi/linux/lnet/lnet-dlc.h      |   18 
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |   10 
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    6 
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   12 
 .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   74 +-
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |   25 -
 drivers/staging/lustre/lnet/lnet/acceptor.c        |    8 
 drivers/staging/lustre/lnet/lnet/api-ni.c          |  939 +++++++++++++-------
 drivers/staging/lustre/lnet/lnet/config.c          |  688 +++++++++++----
 drivers/staging/lustre/lnet/lnet/lib-move.c        |  132 ++-
 drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    6 
 drivers/staging/lustre/lnet/lnet/lo.c              |    2 
 drivers/staging/lustre/lnet/lnet/net_fault.c       |    3 
 drivers/staging/lustre/lnet/lnet/peer.c            |   31 -
 drivers/staging/lustre/lnet/lnet/router.c          |   51 +
 drivers/staging/lustre/lnet/lnet/router_proc.c     |   24 -
 drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 
 drivers/staging/lustre/lnet/selftest/framework.c   |    3 
 drivers/staging/lustre/lnet/selftest/selftest.h    |    2 
 21 files changed, 1507 insertions(+), 702 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (13 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid() NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 22:49   ` Doug Oucharek
  2018-09-10 23:17   ` James Simmons
  2018-09-07  0:49 ` [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni NeilBrown
                   ` (19 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This is part of

8cbb8cd3e771e7f7e0f99cafc19fad32770dc015 LU-7734 lnet: Multi-Rail
local NI split
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |   38 +++++++++++++++-----
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 6d4106fd9039..078bc97a9ebf 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -263,18 +263,38 @@ struct lnet_ni {
 	int			  ni_peerrtrcredits;
 	/* seconds to consider peer dead */
 	int			  ni_peertimeout;
-	int			  ni_ncpts;	/* number of CPTs */
-	__u32			 *ni_cpts;	/* bond NI on some CPTs */
-	lnet_nid_t		  ni_nid;	/* interface's NID */
-	void			 *ni_data;	/* instance-specific data */
+	/* number of CPTs */
+	int			ni_ncpts;
+
+	/* bond NI on some CPTs */
+	__u32			*ni_cpts;
+
+	/* interface's NID */
+	lnet_nid_t		ni_nid;
+
+	/* instance-specific data */
+	void			*ni_data;
+
 	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
-	struct lnet_tx_queue	**ni_tx_queues;	/* percpt TX queues */
-	int			**ni_refs;	/* percpt reference count */
-	time64_t		  ni_last_alive;/* when I was last alive */
-	struct lnet_ni_status	 *ni_status;	/* my health status */
+
+	/* percpt TX queues */
+	struct lnet_tx_queue	**ni_tx_queues;
+
+	/* percpt reference count */
+	int			**ni_refs;
+
+	/* when I was last alive */
+	time64_t		ni_last_alive;
+
+	/* my health status */
+	struct lnet_ni_status	*ni_status;
+
 	/* per NI LND tunables */
 	struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
-	/* equivalent interfaces to use */
+	/*
+	 * equivalent interfaces to use
+	 * This is an array because socklnd bonding can still be configured
+	 */
 	char			 *ni_interfaces[LNET_MAX_INTERFACES];
 	/* original net namespace */
 	struct net		 *ni_net_ns;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (8 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 22:56   ` Doug Oucharek
  2018-09-10 23:23   ` James Simmons
  2018-09-07  0:49 ` [lustre-devel] [PATCH 04/34] lnet: embed lnd_tunables in lnet_ni NeilBrown
                   ` (24 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This will contain some fields from lnet_ni, to be shared
between multiple ni on the one network.

For now, only tunables are moved across, using
 struct lnet_ioctl_config_lnd_cmn_tunables
which is changed to use signed values so -1 can be stored.
-1 means "no value"
If the tunables haven't been initialised, then net_tunables_set is
false.  Previously a NULL pointer had this meaning.

A 'struct lnet_net' is allocated as part of lnet_ni_alloc(), and freed
by lnet_ni_free().

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |   25 ++++++--
 .../lustre/include/uapi/linux/lnet/lnet-dlc.h      |    8 +--
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 -
 .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   61 +++++++++++---------
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |   19 ++++--
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   45 +++++++++------
 drivers/staging/lustre/lnet/lnet/config.c          |   24 ++++++--
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    5 +-
 drivers/staging/lustre/lnet/lnet/peer.c            |    9 ++-
 drivers/staging/lustre/lnet/lnet/router.c          |    8 ++-
 drivers/staging/lustre/lnet/lnet/router_proc.c     |    6 +-
 11 files changed, 129 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 078bc97a9ebf..ead8a4e1125a 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -43,6 +43,7 @@
 
 #include <uapi/linux/lnet/lnet-types.h>
 #include <uapi/linux/lnet/lnetctl.h>
+#include <uapi/linux/lnet/lnet-dlc.h>
 
 /* Max payload size */
 #define LNET_MAX_PAYLOAD      CONFIG_LNET_MAX_PAYLOAD
@@ -252,17 +253,22 @@ struct lnet_tx_queue {
 	struct list_head	tq_delayed;	/* delayed TXs */
 };
 
+struct lnet_net {
+	/* network tunables */
+	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
+
+	/*
+	 * boolean to indicate that the tunables have been set and
+	 * shouldn't be reset
+	 */
+	bool			  net_tunables_set;
+};
+
 struct lnet_ni {
 	spinlock_t		  ni_lock;
 	struct list_head	  ni_list;	/* chain on ln_nis */
 	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
-	int			  ni_maxtxcredits; /* # tx credits  */
-	/* # per-peer send credits */
-	int			  ni_peertxcredits;
-	/* # per-peer router buffer credits */
-	int			  ni_peerrtrcredits;
-	/* seconds to consider peer dead */
-	int			  ni_peertimeout;
+
 	/* number of CPTs */
 	int			ni_ncpts;
 
@@ -286,6 +292,9 @@ struct lnet_ni {
 	/* when I was last alive */
 	time64_t		ni_last_alive;
 
+	/* pointer to parent network */
+	struct lnet_net		*ni_net;
+
 	/* my health status */
 	struct lnet_ni_status	*ni_status;
 
@@ -397,7 +406,7 @@ struct lnet_peer_table {
  * lnet_ni::ni_peertimeout has been set to a positive value
  */
 #define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing && \
-					 (lp)->lp_ni->ni_peertimeout > 0)
+					 (lp)->lp_ni->ni_net->net_tunables.lct_peer_timeout > 0)
 
 struct lnet_route {
 	struct list_head	 lr_list;	/* chain on net */
diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
index c1619f411d81..a8eb3b8f9fd7 100644
--- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
+++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
@@ -39,10 +39,10 @@
 
 struct lnet_ioctl_config_lnd_cmn_tunables {
 	__u32 lct_version;
-	__u32 lct_peer_timeout;
-	__u32 lct_peer_tx_credits;
-	__u32 lct_peer_rtr_credits;
-	__u32 lct_max_tx_credits;
+	__s32 lct_peer_timeout;
+	__s32 lct_peer_tx_credits;
+	__s32 lct_peer_rtr_credits;
+	__s32 lct_max_tx_credits;
 };
 
 struct lnet_ioctl_config_o2iblnd_tunables {
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index f496e6fcc416..0d17e22c4401 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -337,7 +337,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer **peerp,
 	peer->ibp_error = 0;
 	peer->ibp_last_alive = 0;
 	peer->ibp_max_frags = kiblnd_cfg_rdma_frags(peer->ibp_ni);
-	peer->ibp_queue_depth = ni->ni_peertxcredits;
+	peer->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits;
 	atomic_set(&peer->ibp_refcount, 1);  /* 1 ref for caller */
 
 	INIT_LIST_HEAD(&peer->ibp_list);     /* not in the peer table yet */
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
index 39d07926d603..a1aca4dda38f 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
@@ -171,7 +171,7 @@ int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
 	if (version == IBLND_MSG_VERSION_1)
 		return IBLND_MSG_QUEUE_SIZE_V1;
 	else if (ni)
-		return ni->ni_peertxcredits;
+		return ni->ni_net->net_tunables.lct_peer_tx_credits;
 	else
 		return peer_credits;
 }
@@ -179,6 +179,7 @@ int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
 int kiblnd_tunables_setup(struct lnet_ni *ni)
 {
 	struct lnet_ioctl_config_o2iblnd_tunables *tunables;
+	struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables;
 
 	/*
 	 * if there was no tunables specified, setup the tunables to be
@@ -204,35 +205,39 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
 		return -EINVAL;
 	}
 
-	if (!ni->ni_peertimeout)
-		ni->ni_peertimeout = peer_timeout;
+	net_tunables = &ni->ni_net->net_tunables;
 
-	if (!ni->ni_maxtxcredits)
-		ni->ni_maxtxcredits = credits;
+	if (net_tunables->lct_peer_timeout == -1)
+		net_tunables->lct_peer_timeout = peer_timeout;
 
-	if (!ni->ni_peertxcredits)
-		ni->ni_peertxcredits = peer_credits;
+	if (net_tunables->lct_max_tx_credits == -1)
+		net_tunables->lct_max_tx_credits = credits;
 
-	if (!ni->ni_peerrtrcredits)
-		ni->ni_peerrtrcredits = peer_buffer_credits;
+	if (net_tunables->lct_peer_tx_credits == -1)
+		net_tunables->lct_peer_tx_credits = peer_credits;
 
-	if (ni->ni_peertxcredits < IBLND_CREDITS_DEFAULT)
-		ni->ni_peertxcredits = IBLND_CREDITS_DEFAULT;
+	if (net_tunables->lct_peer_rtr_credits == -1)
+		net_tunables->lct_peer_rtr_credits = peer_buffer_credits;
 
-	if (ni->ni_peertxcredits > IBLND_CREDITS_MAX)
-		ni->ni_peertxcredits = IBLND_CREDITS_MAX;
+	if (net_tunables->lct_peer_tx_credits < IBLND_CREDITS_DEFAULT)
+		net_tunables->lct_peer_tx_credits = IBLND_CREDITS_DEFAULT;
 
-	if (ni->ni_peertxcredits > credits)
-		ni->ni_peertxcredits = credits;
+	if (net_tunables->lct_peer_tx_credits > IBLND_CREDITS_MAX)
+		net_tunables->lct_peer_tx_credits = IBLND_CREDITS_MAX;
+
+	if (net_tunables->lct_peer_tx_credits >
+	    net_tunables->lct_max_tx_credits)
+		net_tunables->lct_peer_tx_credits =
+			net_tunables->lct_max_tx_credits;
 
 	if (!tunables->lnd_peercredits_hiw)
 		tunables->lnd_peercredits_hiw = peer_credits_hiw;
 
-	if (tunables->lnd_peercredits_hiw < ni->ni_peertxcredits / 2)
-		tunables->lnd_peercredits_hiw = ni->ni_peertxcredits / 2;
+	if (tunables->lnd_peercredits_hiw < net_tunables->lct_peer_tx_credits / 2)
+		tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits / 2;
 
-	if (tunables->lnd_peercredits_hiw >= ni->ni_peertxcredits)
-		tunables->lnd_peercredits_hiw = ni->ni_peertxcredits - 1;
+	if (tunables->lnd_peercredits_hiw >= net_tunables->lct_peer_tx_credits)
+		tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits - 1;
 
 	if (tunables->lnd_map_on_demand <= 0 ||
 	    tunables->lnd_map_on_demand > IBLND_MAX_RDMA_FRAGS) {
@@ -252,21 +257,23 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
 		if (tunables->lnd_map_on_demand > 0 &&
 		    tunables->lnd_map_on_demand <= IBLND_MAX_RDMA_FRAGS / 8) {
 			tunables->lnd_concurrent_sends =
-						ni->ni_peertxcredits * 2;
+					net_tunables->lct_peer_tx_credits * 2;
 		} else {
-			tunables->lnd_concurrent_sends = ni->ni_peertxcredits;
+			tunables->lnd_concurrent_sends =
+				net_tunables->lct_peer_tx_credits;
 		}
 	}
 
-	if (tunables->lnd_concurrent_sends > ni->ni_peertxcredits * 2)
-		tunables->lnd_concurrent_sends = ni->ni_peertxcredits * 2;
+	if (tunables->lnd_concurrent_sends > net_tunables->lct_peer_tx_credits * 2)
+		tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits * 2;
 
-	if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits / 2)
-		tunables->lnd_concurrent_sends = ni->ni_peertxcredits / 2;
+	if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits / 2)
+		tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits / 2;
 
-	if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits) {
+	if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits) {
 		CWARN("Concurrent sends %d is lower than message queue size: %d, performance may drop slightly.\n",
-		      tunables->lnd_concurrent_sends, ni->ni_peertxcredits);
+		      tunables->lnd_concurrent_sends,
+		      net_tunables->lct_peer_tx_credits);
 	}
 
 	if (!tunables->lnd_fmr_pool_size)
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 4dde158451ea..4ad885f10235 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -2739,12 +2739,19 @@ ksocknal_startup(struct lnet_ni *ni)
 		goto fail_0;
 
 	spin_lock_init(&net->ksnn_lock);
-	net->ksnn_incarnation = ktime_get_real_ns();
-	ni->ni_data = net;
-	ni->ni_peertimeout    = *ksocknal_tunables.ksnd_peertimeout;
-	ni->ni_maxtxcredits   = *ksocknal_tunables.ksnd_credits;
-	ni->ni_peertxcredits  = *ksocknal_tunables.ksnd_peertxcredits;
-	ni->ni_peerrtrcredits = *ksocknal_tunables.ksnd_peerrtrcredits;
+        net->ksnn_incarnation = ktime_get_real_ns();
+        ni->ni_data = net;
+	if (!ni->ni_net->net_tunables_set) {
+		ni->ni_net->net_tunables.lct_peer_timeout =
+			*ksocknal_tunables.ksnd_peertimeout;
+		ni->ni_net->net_tunables.lct_max_tx_credits =
+			*ksocknal_tunables.ksnd_credits;
+		ni->ni_net->net_tunables.lct_peer_tx_credits =
+			*ksocknal_tunables.ksnd_peertxcredits;
+		ni->ni_net->net_tunables.lct_peer_rtr_credits =
+			*ksocknal_tunables.ksnd_peerrtrcredits;
+		ni->ni_net->net_tunables_set = true;
+	}
 
 	net->ksnn_ninterfaces = 0;
 	if (!ni->ni_interfaces[0]) {
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index f9fcce2a5643..cd4189fa7acb 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1036,11 +1036,11 @@ lnet_ni_tq_credits(struct lnet_ni *ni)
 	LASSERT(ni->ni_ncpts >= 1);
 
 	if (ni->ni_ncpts == 1)
-		return ni->ni_maxtxcredits;
+		return ni->ni_net->net_tunables.lct_max_tx_credits;
 
-	credits = ni->ni_maxtxcredits / ni->ni_ncpts;
-	credits = max(credits, 8 * ni->ni_peertxcredits);
-	credits = min(credits, ni->ni_maxtxcredits);
+	credits = ni->ni_net->net_tunables.lct_max_tx_credits / ni->ni_ncpts;
+	credits = max(credits, 8 * ni->ni_net->net_tunables.lct_peer_tx_credits);
+	credits = min(credits, ni->ni_net->net_tunables.lct_max_tx_credits);
 
 	return credits;
 }
@@ -1271,16 +1271,16 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 	 */
 	if (conf) {
 		if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
-			ni->ni_peerrtrcredits =
+			ni->ni_net->net_tunables.lct_peer_rtr_credits =
 				conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
 		if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
-			ni->ni_peertimeout =
+			ni->ni_net->net_tunables.lct_peer_timeout =
 				conf->cfg_config_u.cfg_net.net_peer_timeout;
 		if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
-			ni->ni_peertxcredits =
+			ni->ni_net->net_tunables.lct_peer_tx_credits =
 				conf->cfg_config_u.cfg_net.net_peer_tx_credits;
 		if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
-			ni->ni_maxtxcredits =
+			ni->ni_net->net_tunables.lct_max_tx_credits =
 				conf->cfg_config_u.cfg_net.net_max_tx_credits;
 	}
 
@@ -1297,8 +1297,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 		goto failed0;
 	}
 
-	LASSERT(ni->ni_peertimeout <= 0 || lnd->lnd_query);
-
 	lnet_net_lock(LNET_LOCK_EX);
 	/* refcount for ln_nis */
 	lnet_ni_addref_locked(ni, 0);
@@ -1314,13 +1312,18 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 		lnet_ni_addref(ni);
 		LASSERT(!the_lnet.ln_loni);
 		the_lnet.ln_loni = ni;
+		ni->ni_net->net_tunables.lct_peer_tx_credits = 0;
+		ni->ni_net->net_tunables.lct_peer_rtr_credits = 0;
+		ni->ni_net->net_tunables.lct_max_tx_credits = 0;
+		ni->ni_net->net_tunables.lct_peer_timeout = 0;
 		return 0;
 	}
 
-	if (!ni->ni_peertxcredits || !ni->ni_maxtxcredits) {
+	if (!ni->ni_net->net_tunables.lct_peer_tx_credits ||
+	    !ni->ni_net->net_tunables.lct_max_tx_credits) {
 		LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
 				   libcfs_lnd2str(lnd->lnd_type),
-				   !ni->ni_peertxcredits ?
+				   !ni->ni_net->net_tunables.lct_peer_tx_credits ?
 				   "" : "per-peer ");
 		/*
 		 * shutdown the NI since if we get here then it must've already
@@ -1343,9 +1346,11 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 	add_device_randomness(&seed, sizeof(seed));
 
 	CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
-	       libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
+	       libcfs_nid2str(ni->ni_nid),
+		ni->ni_net->net_tunables.lct_peer_tx_credits,
 	       lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
-	       ni->ni_peerrtrcredits, ni->ni_peertimeout);
+	       ni->ni_net->net_tunables.lct_peer_rtr_credits,
+		ni->ni_net->net_tunables.lct_peer_timeout);
 
 	return 0;
 failed0:
@@ -1667,10 +1672,14 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
 	}
 
 	config->cfg_nid = ni->ni_nid;
-	config->cfg_config_u.cfg_net.net_peer_timeout = ni->ni_peertimeout;
-	config->cfg_config_u.cfg_net.net_max_tx_credits = ni->ni_maxtxcredits;
-	config->cfg_config_u.cfg_net.net_peer_tx_credits = ni->ni_peertxcredits;
-	config->cfg_config_u.cfg_net.net_peer_rtr_credits = ni->ni_peerrtrcredits;
+	config->cfg_config_u.cfg_net.net_peer_timeout =
+		ni->ni_net->net_tunables.lct_peer_timeout;
+	config->cfg_config_u.cfg_net.net_max_tx_credits =
+		ni->ni_net->net_tunables.lct_max_tx_credits;
+	config->cfg_config_u.cfg_net.net_peer_tx_credits =
+		ni->ni_net->net_tunables.lct_peer_tx_credits;
+	config->cfg_config_u.cfg_net.net_peer_rtr_credits =
+		ni->ni_net->net_tunables.lct_peer_rtr_credits;
 
 	net_config->ni_status = ni->ni_status->ns_status;
 
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 091c4f714e84..86a53854e427 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -114,29 +114,38 @@ lnet_ni_free(struct lnet_ni *ni)
 	if (ni->ni_net_ns)
 		put_net(ni->ni_net_ns);
 
+	kvfree(ni->ni_net);
 	kfree(ni);
 }
 
 struct lnet_ni *
-lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
+lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
 {
 	struct lnet_tx_queue *tq;
 	struct lnet_ni *ni;
 	int rc;
 	int i;
+	struct lnet_net		*net;
 
-	if (!lnet_net_unique(net, nilist)) {
+	if (!lnet_net_unique(net_id, nilist)) {
 		LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
-				   libcfs_net2str(net));
+				   libcfs_net2str(net_id));
 		return NULL;
 	}
 
 	ni = kzalloc(sizeof(*ni), GFP_NOFS);
-	if (!ni) {
+	net = kzalloc(sizeof(*net), GFP_NOFS);
+	if (!ni || !net) {
+		kfree(ni); kfree(net);
 		CERROR("Out of memory creating network %s\n",
-		       libcfs_net2str(net));
+		       libcfs_net2str(net_id));
 		return NULL;
 	}
+	/* initialize global paramters to undefiend */
+	net->net_tunables.lct_peer_timeout = -1;
+	net->net_tunables.lct_max_tx_credits = -1;
+	net->net_tunables.lct_peer_tx_credits = -1;
+	net->net_tunables.lct_peer_rtr_credits = -1;
 
 	spin_lock_init(&ni->ni_lock);
 	INIT_LIST_HEAD(&ni->ni_cptlist);
@@ -160,7 +169,7 @@ lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
 		rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
 		if (rc <= 0) {
 			CERROR("Failed to set CPTs for NI %s: %d\n",
-			       libcfs_net2str(net), rc);
+			       libcfs_net2str(net_id), rc);
 			goto failed;
 		}
 
@@ -173,8 +182,9 @@ lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
 		ni->ni_ncpts = rc;
 	}
 
+	ni->ni_net = net;
 	/* LND will fill in the address part of the NID */
-	ni->ni_nid = LNET_MKNID(net, 0);
+	ni->ni_nid = LNET_MKNID(net_id, 0);
 
 	/* Store net namespace in which current ni is being created */
 	if (current->nsproxy->net_ns)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index edcafac055ed..f186e6a16d34 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -524,7 +524,8 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
 	    lp->lp_timestamp >= lp->lp_last_alive)
 		return 0;
 
-	deadline = lp->lp_last_alive + lp->lp_ni->ni_peertimeout;
+	deadline = lp->lp_last_alive +
+		lp->lp_ni->ni_net->net_tunables.lct_peer_timeout;
 	alive = deadline > now;
 
 	/* Update obsolete lp_alive except for routers assumed to be dead
@@ -569,7 +570,7 @@ lnet_peer_alive_locked(struct lnet_peer *lp)
 				      libcfs_nid2str(lp->lp_nid),
 				      now, next_query,
 				      lnet_queryinterval,
-				      lp->lp_ni->ni_peertimeout);
+				      lp->lp_ni->ni_net->net_tunables.lct_peer_timeout);
 			return 0;
 		}
 	}
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index d9452c322e4d..b76ac3e051d9 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -342,8 +342,8 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
 		goto out;
 	}
 
-	lp->lp_txcredits = lp->lp_ni->ni_peertxcredits;
-	lp->lp_mintxcredits = lp->lp_ni->ni_peertxcredits;
+	lp->lp_txcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
+	lp->lp_mintxcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
 	lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
 	lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
 
@@ -383,7 +383,7 @@ lnet_debug_peer(lnet_nid_t nid)
 
 	CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
 	       libcfs_nid2str(lp->lp_nid), lp->lp_refcount,
-	       aliveness, lp->lp_ni->ni_peertxcredits,
+	       aliveness, lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits,
 	       lp->lp_rtrcredits, lp->lp_minrtrcredits,
 	       lp->lp_txcredits, lp->lp_mintxcredits, lp->lp_txqnob);
 
@@ -438,7 +438,8 @@ lnet_get_peer_info(__u32 peer_index, __u64 *nid,
 
 			*nid = lp->lp_nid;
 			*refcount = lp->lp_refcount;
-			*ni_peer_tx_credits = lp->lp_ni->ni_peertxcredits;
+			*ni_peer_tx_credits =
+				lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
 			*peer_tx_credits = lp->lp_txcredits;
 			*peer_rtr_credits = lp->lp_rtrcredits;
 			*peer_min_rtr_credits = lp->lp_mintxcredits;
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 02241fbc9eaa..7d61c5d71426 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -57,9 +57,11 @@ MODULE_PARM_DESC(auto_down, "Automatically mark peers down on comms error");
 int
 lnet_peer_buffer_credits(struct lnet_ni *ni)
 {
+	struct lnet_net *net = ni->ni_net;
+
 	/* NI option overrides LNet default */
-	if (ni->ni_peerrtrcredits > 0)
-		return ni->ni_peerrtrcredits;
+	if (net->net_tunables.lct_peer_rtr_credits > 0)
+		return net->net_tunables.lct_peer_rtr_credits;
 	if (peer_buffer_credits > 0)
 		return peer_buffer_credits;
 
@@ -67,7 +69,7 @@ lnet_peer_buffer_credits(struct lnet_ni *ni)
 	 * As an approximation, allow this peer the same number of router
 	 * buffers as it is allowed outstanding sends
 	 */
-	return ni->ni_peertxcredits;
+	return net->net_tunables.lct_peer_tx_credits;
 }
 
 /* forward ref's */
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index 31f4982f7f17..19cea7076057 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -489,7 +489,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write,
 			int nrefs = peer->lp_refcount;
 			time64_t lastalive = -1;
 			char *aliveness = "NA";
-			int maxcr = peer->lp_ni->ni_peertxcredits;
+			int maxcr = peer->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
 			int txcr = peer->lp_txcredits;
 			int mintxcr = peer->lp_mintxcredits;
 			int rtrcr = peer->lp_rtrcredits;
@@ -704,8 +704,8 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
 					      "%-24s %6s %5lld %4d %4d %4d %5d %5d %5d\n",
 					      libcfs_nid2str(ni->ni_nid), stat,
 					      last_alive, *ni->ni_refs[i],
-					      ni->ni_peertxcredits,
-					      ni->ni_peerrtrcredits,
+					      ni->ni_net->net_tunables.lct_peer_tx_credits,
+					      ni->ni_net->net_tunables.lct_peer_rtr_credits,
 					      tq->tq_credits_max,
 					      tq->tq_credits,
 					      tq->tq_credits_min);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (6 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:04   ` Doug Oucharek
                     ` (2 more replies)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis NeilBrown
                   ` (26 subsequent siblings)
  34 siblings, 3 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Also make some other minor changes to the structures.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
 drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
 drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
 drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
 drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
 drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
 9 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index ead8a4e1125a..e170eb07a5bf 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -262,12 +262,17 @@ struct lnet_net {
 	 * shouldn't be reset
 	 */
 	bool			  net_tunables_set;
+	/* procedural interface */
+	struct lnet_lnd		*net_lnd;
 };
 
 struct lnet_ni {
-	spinlock_t		  ni_lock;
-	struct list_head	  ni_list;	/* chain on ln_nis */
-	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
+	/* chain on ln_nis */
+	struct list_head	  ni_list;
+	/* chain on ln_nis_cpt */
+	struct list_head	ni_cptlist;
+
+	spinlock_t		ni_lock;
 
 	/* number of CPTs */
 	int			ni_ncpts;
@@ -281,8 +286,6 @@ struct lnet_ni {
 	/* instance-specific data */
 	void			*ni_data;
 
-	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
-
 	/* percpt TX queues */
 	struct lnet_tx_queue	**ni_tx_queues;
 
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 0d17e22c4401..5e1592b398c1 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
 	int rc;
 	int newdev;
 
-	LASSERT(ni->ni_lnd == &the_o2iblnd);
+	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
 
 	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
 		rc = kiblnd_base_startup();
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 4ad885f10235..2036a0ae5917 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
 	int rc;
 	int i;
 
-	LASSERT(ni->ni_lnd == &the_ksocklnd);
+	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
 
 	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
 		rc = ksocknal_base_startup();
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 3ae3ca1311a1..f8c921f0221c 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
 		return -EPERM;
 	}
 
-	if (!ni->ni_lnd->lnd_accept) {
+	if (!ni->ni_net->net_lnd->lnd_accept) {
 		/* This catches a request for the loopback LND */
 		lnet_ni_decref(ni);
 		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
@@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
 	CDEBUG(D_NET, "Accept %s from %pI4h\n",
 	       libcfs_nid2str(cr.acr_nid), &peer_ip);
 
-	rc = ni->ni_lnd->lnd_accept(ni, sock);
+	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
 
 	lnet_ni_decref(ni);
 	return rc;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index cd4189fa7acb..0896e75bc3d7 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
 
 	cpt = lnet_net_lock_current();
 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (ni->ni_lnd->lnd_accept)
+		if (ni->ni_net->net_lnd->lnd_accept)
 			count++;
 	}
 
@@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
 			continue;
 		}
 
-		ni->ni_lnd->lnd_refcount--;
+		ni->ni_net->net_lnd->lnd_refcount--;
 		lnet_net_unlock(LNET_LOCK_EX);
 
-		islo = ni->ni_lnd->lnd_type == LOLND;
+		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
 
 		LASSERT(!in_interrupt());
-		ni->ni_lnd->lnd_shutdown(ni);
+		ni->ni_net->net_lnd->lnd_shutdown(ni);
 
 		/*
 		 * can't deref lnd anymore now; it might have unregistered
@@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 	lnd->lnd_refcount++;
 	lnet_net_unlock(LNET_LOCK_EX);
 
-	ni->ni_lnd = lnd;
+	ni->ni_net->net_lnd = lnd;
 
 	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
@@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	if (rc)
 		goto failed1;
 
-	if (ni->ni_lnd->lnd_accept) {
+	if (ni->ni_net->net_lnd->lnd_accept) {
 		rc = lnet_acceptor_start();
 		if (rc < 0) {
 			/* shutdown the ni that we just started */
@@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
 		if (!ni)
 			return -EINVAL;
 
-		if (!ni->ni_lnd->lnd_ctl)
+		if (!ni->ni_net->net_lnd->lnd_ctl)
 			rc = -EINVAL;
 		else
-			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
+			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
 
 		lnet_ni_decref(ni);
 		return rc;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index f186e6a16d34..1bf12af87a20 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
 		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
 		iov_iter_advance(&to, offset);
 	}
-	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
+	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
 	if (rc < 0)
 		lnet_finalize(ni, msg, rc);
 }
@@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
 	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
 		(msg->msg_txcredit && msg->msg_peertxcredit));
 
-	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
+	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
 	if (rc < 0)
 		lnet_finalize(ni, msg, rc);
 }
@@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
 	LASSERT(!msg->msg_sending);
 	LASSERT(msg->msg_receiving);
 	LASSERT(!msg->msg_rx_ready_delay);
-	LASSERT(ni->ni_lnd->lnd_eager_recv);
+	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
 
 	msg->msg_rx_ready_delay = 1;
-	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
+	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
 					&msg->msg_private);
 	if (rc) {
 		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
@@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
 	time64_t last_alive = 0;
 
 	LASSERT(lnet_peer_aliveness_enabled(lp));
-	LASSERT(ni->ni_lnd->lnd_query);
+	LASSERT(ni->ni_net->net_lnd->lnd_query);
 
 	lnet_net_unlock(lp->lp_cpt);
-	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
+	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
 	lnet_net_lock(lp->lp_cpt);
 
 	lp->lp_last_query = ktime_get_seconds();
@@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
 	info.mi_roffset	= hdr->msg.put.offset;
 	info.mi_mbits	= hdr->msg.put.match_bits;
 
-	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
+	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
 	ready_delay = msg->msg_rx_ready_delay;
 
  again:
@@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
 
 	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
 	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
-		if (!ni->ni_lnd->lnd_eager_recv) {
+		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
 			msg->msg_rx_ready_delay = 1;
 		} else {
 			lnet_net_unlock(msg->msg_rx_cpt);
diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
index eb14146bd879..8167980c2323 100644
--- a/drivers/staging/lustre/lnet/lnet/lo.c
+++ b/drivers/staging/lustre/lnet/lnet/lo.c
@@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
 static int
 lolnd_startup(struct lnet_ni *ni)
 {
-	LASSERT(ni->ni_lnd == &the_lolnd);
+	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
 	LASSERT(!lolnd_instanced);
 	lolnd_instanced = 1;
 
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 7d61c5d71426..0c0ec0b27982 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
 		lp->lp_notifylnd = 0;
 		lp->lp_notify    = 0;
 
-		if (notifylnd && ni->ni_lnd->lnd_notify) {
+		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
 			lnet_net_unlock(lp->lp_cpt);
 
 			/*
 			 * A new notification could happen now; I'll handle it
 			 * when control returns to me
 			 */
-			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
+			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
 
 			lnet_net_lock(lp->lp_cpt);
 		}
@@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
 		lnet_net_unlock(LNET_LOCK_EX);
 
 		/* XXX Assume alive */
-		if (ni->ni_lnd->lnd_notify)
-			ni->ni_lnd->lnd_notify(ni, gateway, 1);
+		if (ni->ni_net->net_lnd->lnd_notify)
+			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
 
 		lnet_net_lock(LNET_LOCK_EX);
 	}
@@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
 
 	now = ktime_get_real_seconds();
 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (ni->ni_lnd->lnd_type == LOLND)
+		if (ni->ni_net->net_lnd->lnd_type == LOLND)
 			continue;
 
 		if (now < ni->ni_last_alive + timeout)
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index 19cea7076057..f3ccd6a2b70e 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
 				last_alive = now - ni->ni_last_alive;
 
 			/* @lo forever alive */
-			if (ni->ni_lnd->lnd_type == LOLND)
+			if (ni->ni_net->net_lnd->lnd_type == LOLND)
 				last_alive = 0;
 
 			lnet_ni_lock(ni);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 04/34] lnet: embed lnd_tunables in lnet_ni
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (9 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:08   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces" NeilBrown
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Instead of a pointer, embed the data struct.
Also other related changes.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    6 ++++
 .../lustre/include/uapi/linux/lnet/lnet-dlc.h      |   10 +++++--
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    6 ++--
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    8 +++---
 .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   13 +++-------
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   27 +++++++++-----------
 drivers/staging/lustre/lnet/lnet/config.c          |    2 -
 8 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index e170eb07a5bf..c5e3363de727 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -302,7 +302,11 @@ struct lnet_ni {
 	struct lnet_ni_status	*ni_status;
 
 	/* per NI LND tunables */
-	struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
+	struct lnet_lnd_tunables ni_lnd_tunables;
+
+	/* lnd tunables set explicitly */
+	bool ni_lnd_tunables_set;
+
 	/*
 	 * equivalent interfaces to use
 	 * This is an array because socklnd bonding can still be configured
diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
index a8eb3b8f9fd7..ac29f9d24d5d 100644
--- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
+++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
@@ -57,11 +57,15 @@ struct lnet_ioctl_config_o2iblnd_tunables {
 	__u16 pad;
 };
 
+struct lnet_lnd_tunables {
+	union {
+		struct lnet_ioctl_config_o2iblnd_tunables lnd_o2ib;
+	} lnd_tun_u;
+};
+
 struct lnet_ioctl_config_lnd_tunables {
 	struct lnet_ioctl_config_lnd_cmn_tunables lt_cmn;
-	union {
-		struct lnet_ioctl_config_o2iblnd_tunables lt_o2ib;
-	} lt_tun_u;
+	struct lnet_lnd_tunables lt_tun;
 };
 
 struct lnet_ioctl_net_config {
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 5e1592b398c1..ade566d20c69 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2122,7 +2122,7 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni,
 	int rc;
 	int i;
 
-	tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
 
 	if (tunables->lnd_fmr_pool_size < *kiblnd_tunables.kib_ntx / 4) {
 		CERROR("Can't set fmr pool size (%d) < ntx / 4(%d)\n",
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index 42dc15cef194..522eb150d9a6 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -608,7 +608,7 @@ kiblnd_cfg_rdma_frags(struct lnet_ni *ni)
 	struct lnet_ioctl_config_o2iblnd_tunables *tunables;
 	int mod;
 
-	tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
 	mod = tunables->lnd_map_on_demand;
 	return mod ? mod : IBLND_MAX_RDMA_FRAGS >> IBLND_FRAG_SHIFT;
 }
@@ -627,7 +627,7 @@ kiblnd_concurrent_sends(int version, struct lnet_ni *ni)
 	struct lnet_ioctl_config_o2iblnd_tunables *tunables;
 	int concurrent_sends;
 
-	tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
 	concurrent_sends = tunables->lnd_concurrent_sends;
 
 	if (version == IBLND_MSG_VERSION_1) {
@@ -777,7 +777,7 @@ kiblnd_need_noop(struct kib_conn *conn)
 	struct lnet_ni *ni = conn->ibc_peer->ibp_ni;
 
 	LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
-	tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
 
 	if (conn->ibc_outstanding_credits <
 	    IBLND_CREDITS_HIGHWATER(tunables, conn->ibc_version) &&
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index a8d2b4911dab..c266940cb2ae 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1452,7 +1452,7 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)
 
 	/* Brand new peer */
 	LASSERT(!peer->ibp_connecting);
-	tunables = &peer->ibp_ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+	tunables = &peer->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
 	peer->ibp_connecting = tunables->lnd_conns_per_peer;
 
 	/* always called with a ref on ni, which prevents ni being shutdown */
@@ -2592,14 +2592,14 @@ kiblnd_check_reconnect(struct kib_conn *conn, int version,
 		break;
 
 	case IBLND_REJECT_RDMA_FRAGS: {
-		struct lnet_ioctl_config_lnd_tunables *tunables;
+		struct lnet_ioctl_config_o2iblnd_tunables *tunables;
 
 		if (!cp) {
 			reason = "can't negotiate max frags";
 			goto out;
 		}
-		tunables = peer->ibp_ni->ni_lnd_tunables;
-		if (!tunables->lt_tun_u.lt_o2ib.lnd_map_on_demand) {
+		tunables = &peer->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
+		if (!tunables->lnd_map_on_demand) {
 			reason = "map_on_demand must be enabled";
 			goto out;
 		}
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
index a1aca4dda38f..5117594f38fb 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
@@ -185,16 +185,11 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
 	 * if there was no tunables specified, setup the tunables to be
 	 * defaulted
 	 */
-	if (!ni->ni_lnd_tunables) {
-		ni->ni_lnd_tunables = kzalloc(sizeof(*ni->ni_lnd_tunables),
-					      GFP_NOFS);
-		if (!ni->ni_lnd_tunables)
-			return -ENOMEM;
-
-		memcpy(&ni->ni_lnd_tunables->lt_tun_u.lt_o2ib,
+	if (!ni->ni_lnd_tunables_set)
+		memcpy(&ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib,
 		       &default_tunables, sizeof(*tunables));
-	}
-	tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
 
 	/* Current API version */
 	tunables->lnd_version = 0;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0896e75bc3d7..c944fbb155c8 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1198,6 +1198,7 @@ static int
 lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 {
 	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
+	struct lnet_lnd_tunables *tun = NULL;
 	int rc = -EINVAL;
 	int lnd_type;
 	struct lnet_lnd *lnd;
@@ -1250,19 +1251,15 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 
 	ni->ni_net->net_lnd = lnd;
 
-	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
+	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf)) {
 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
+		tun = &lnd_tunables->lt_tun;
+	}
 
-	if (lnd_tunables) {
-		ni->ni_lnd_tunables = kzalloc(sizeof(*ni->ni_lnd_tunables),
-					      GFP_NOFS);
-		if (!ni->ni_lnd_tunables) {
-			mutex_unlock(&the_lnet.ln_lnd_mutex);
-			rc = -ENOMEM;
-			goto failed0;
-		}
-		memcpy(ni->ni_lnd_tunables, lnd_tunables,
-		       sizeof(*ni->ni_lnd_tunables));
+	if (tun) {
+		memcpy(&ni->ni_lnd_tunables, tun,
+		       sizeof(*tun));
+		ni->ni_lnd_tunables_set = true;
 	}
 
 	/*
@@ -1702,15 +1699,15 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
 		tunable_size = config->cfg_hdr.ioc_len - min_size;
 
 	/* Don't copy to much data to user space */
-	min_size = min(tunable_size, sizeof(*ni->ni_lnd_tunables));
+	min_size = min(tunable_size, sizeof(ni->ni_lnd_tunables));
 	lnd_cfg = (struct lnet_ioctl_config_lnd_tunables *)net_config->cfg_bulk;
 
-	if (ni->ni_lnd_tunables && lnd_cfg && min_size) {
-		memcpy(lnd_cfg, ni->ni_lnd_tunables, min_size);
+	if (lnd_cfg && min_size) {
+		memcpy(&lnd_cfg->lt_tun, &ni->ni_lnd_tunables, min_size);
 		config->cfg_config_u.cfg_net.net_interface_count = 1;
 
 		/* Tell user land that kernel side has less data */
-		if (tunable_size > sizeof(*ni->ni_lnd_tunables)) {
+		if (tunable_size > sizeof(ni->ni_lnd_tunables)) {
 			min_size = tunable_size - sizeof(ni->ni_lnd_tunables);
 			config->cfg_hdr.ioc_len -= min_size;
 		}
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 86a53854e427..5646feeb433e 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -105,8 +105,6 @@ lnet_ni_free(struct lnet_ni *ni)
 	if (ni->ni_cpts)
 		cfs_expr_list_values_free(ni->ni_cpts, ni->ni_ncpts);
 
-	kfree(ni->ni_lnd_tunables);
-
 	for (i = 0; i < LNET_MAX_INTERFACES && ni->ni_interfaces[i]; i++)
 		kfree(ni->ni_interfaces[i]);
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces".
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (10 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 04/34] lnet: embed lnd_tunables in lnet_ni NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:18   ` Doug Oucharek
  2018-09-10 23:27   ` James Simmons
  2018-09-07  0:49 ` [lustre-devel] [PATCH 13/34] lnet: reverse order of lnet_startup_lnd{net, ni} NeilBrown
                   ` (22 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

We already have "struct lnet_net" separate from "struct lnet_ni",
but they are currently allocated together and freed together and
it is assumed that they are 1-to-1.

This patch starts breaking that assumption.  We have separate
lnet_net_alloc() and lnet_net_free() to alloc/free the new lnet_net,
though they is currently called only when lnet_ni_alloc/free are
called.

The netid is now stored in the lnet_net and fetched directly from
there, rather than extracting it from the net-interface-id ni_nid.

The linkage between these two structures is now richer, lnet_net
can link to a list of lnet_ni.  lnet_net now has a list of lnet_net,
so to find all the lnet_ni, we need to walk a list of lists.
This need to walk a list-of-lists occurs in several places, and new
helpers like lnet_get_ni_idx_locked() and lnet_get_next_ni_locked are
introduced.

Previously a list_head was passed to lnet_ni_alloc() for the new
lnet_ni to be attached to.
Now a list is passed to lnet_net_alloc() for the net to be attached
to, and a lnet_net is passed to lnet_ni_alloc() for the ni to attach
to.
lnet_ni_alloc() also receives an interface name, but this is currently
unused.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |   15 +
 .../staging/lustre/include/linux/lnet/lib-types.h  |   23 +-
 drivers/staging/lustre/lnet/lnet/acceptor.c        |    2 
 drivers/staging/lustre/lnet/lnet/api-ni.c          |  255 ++++++++++++++------
 drivers/staging/lustre/lnet/lnet/config.c          |  135 +++++++----
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    6 
 drivers/staging/lustre/lnet/lnet/router.c          |   15 -
 drivers/staging/lustre/lnet/lnet/router_proc.c     |   16 -
 8 files changed, 308 insertions(+), 159 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 0fecf0d32c58..4440b87299c4 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -369,8 +369,14 @@ lnet_ni_decref(struct lnet_ni *ni)
 }
 
 void lnet_ni_free(struct lnet_ni *ni);
+void lnet_net_free(struct lnet_net *net);
+
+struct lnet_net *
+lnet_net_alloc(__u32 net_type, struct list_head *netlist);
+
 struct lnet_ni *
-lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
+lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el,
+	      char *iface);
 
 static inline int
 lnet_nid2peerhash(lnet_nid_t nid)
@@ -412,6 +418,9 @@ void lnet_destroy_routes(void);
 int lnet_get_route(int idx, __u32 *net, __u32 *hops,
 		   lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
 int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
+struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet,
+					struct lnet_ni *prev);
+struct lnet_ni *lnet_get_ni_idx_locked(int idx);
 
 void lnet_router_debugfs_init(void);
 void lnet_router_debugfs_fini(void);
@@ -584,7 +593,7 @@ int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
 		 __u32 local_ip, __u32 peer_ip, int peer_port);
 void lnet_connect_console_error(int rc, lnet_nid_t peer_nid,
 				__u32 peer_ip, int port);
-int lnet_count_acceptor_nis(void);
+int lnet_count_acceptor_nets(void);
 int lnet_acceptor_timeout(void);
 int lnet_acceptor_port(void);
 
@@ -618,7 +627,7 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
 int lnet_parse_ip2nets(char **networksp, char *ip2nets);
 int lnet_parse_routes(char *route_str, int *im_a_router);
 int lnet_parse_networks(struct list_head *nilist, char *networks);
-int lnet_net_unique(__u32 net, struct list_head *nilist);
+bool lnet_net_unique(__u32 net, struct list_head *nilist);
 
 int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
 struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index c5e3363de727..5f0d4703bf86 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -254,6 +254,15 @@ struct lnet_tx_queue {
 };
 
 struct lnet_net {
+	/* chain on the ln_nets */
+	struct list_head	net_list;
+
+	/* net ID, which is compoed of
+	 * (net_type << 16) | net_num.
+	 * net_type can be one of the enumarated types defined in
+	 * lnet/include/lnet/nidstr.h */
+	__u32			net_id;
+
 	/* network tunables */
 	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
 
@@ -264,11 +273,13 @@ struct lnet_net {
 	bool			  net_tunables_set;
 	/* procedural interface */
 	struct lnet_lnd		*net_lnd;
+	/* list of NIs on this net */
+	struct list_head	net_ni_list;
 };
 
 struct lnet_ni {
-	/* chain on ln_nis */
-	struct list_head	  ni_list;
+	/* chain on the lnet_net structure */
+	struct list_head	  ni_netlist;
 	/* chain on ln_nis_cpt */
 	struct list_head	ni_cptlist;
 
@@ -626,14 +637,16 @@ struct lnet {
 	/* failure simulation */
 	struct list_head		  ln_test_peers;
 	struct list_head		  ln_drop_rules;
-	struct list_head		  ln_delay_rules;
+	struct list_head		ln_delay_rules;
 
-	struct list_head		  ln_nis;	/* LND instances */
+	/* LND instances */
+	struct list_head		ln_nets;
 	/* NIs bond on specific CPT(s) */
 	struct list_head		  ln_nis_cpt;
 	/* dying LND instances */
 	struct list_head		  ln_nis_zombie;
-	struct lnet_ni			 *ln_loni;	/* the loopback NI */
+	/* the loopback NI */
+	struct lnet_ni			*ln_loni;
 
 	/* remote networks with routes to them */
 	struct list_head		 *ln_remote_nets_hash;
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index f8c921f0221c..88b90c1fdbaf 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -454,7 +454,7 @@ lnet_acceptor_start(void)
 	if (rc <= 0)
 		return rc;
 
-	if (!lnet_count_acceptor_nis())  /* not required */
+	if (lnet_count_acceptor_nets() == 0)  /* not required */
 		return 0;
 
 	task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure,
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c944fbb155c8..05687278334a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -537,7 +537,7 @@ lnet_prepare(lnet_pid_t requested_pid)
 	the_lnet.ln_pid = requested_pid;
 
 	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
-	INIT_LIST_HEAD(&the_lnet.ln_nis);
+	INIT_LIST_HEAD(&the_lnet.ln_nets);
 	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
 	INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
 	INIT_LIST_HEAD(&the_lnet.ln_routers);
@@ -616,7 +616,7 @@ lnet_unprepare(void)
 
 	LASSERT(!the_lnet.ln_refcount);
 	LASSERT(list_empty(&the_lnet.ln_test_peers));
-	LASSERT(list_empty(&the_lnet.ln_nis));
+	LASSERT(list_empty(&the_lnet.ln_nets));
 	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
 	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
 
@@ -648,14 +648,17 @@ lnet_unprepare(void)
 }
 
 struct lnet_ni  *
-lnet_net2ni_locked(__u32 net, int cpt)
+lnet_net2ni_locked(__u32 net_id, int cpt)
 {
-	struct lnet_ni *ni;
+	struct lnet_ni   *ni;
+	struct lnet_net  *net;
 
 	LASSERT(cpt != LNET_LOCK_EX);
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (LNET_NIDNET(ni->ni_nid) == net) {
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		if (net->net_id == net_id) {
+			ni = list_entry(net->net_ni_list.next, struct lnet_ni,
+					ni_netlist);
 			lnet_ni_addref_locked(ni, cpt);
 			return ni;
 		}
@@ -760,14 +763,17 @@ lnet_islocalnet(__u32 net)
 struct lnet_ni  *
 lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
 {
-	struct lnet_ni *ni;
+	struct lnet_net  *net;
+	struct lnet_ni	 *ni;
 
 	LASSERT(cpt != LNET_LOCK_EX);
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (ni->ni_nid == nid) {
-			lnet_ni_addref_locked(ni, cpt);
-			return ni;
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			if (ni->ni_nid == nid) {
+				lnet_ni_addref_locked(ni, cpt);
+				return ni;
+			}
 		}
 	}
 
@@ -790,16 +796,18 @@ lnet_islocalnid(lnet_nid_t nid)
 }
 
 int
-lnet_count_acceptor_nis(void)
+lnet_count_acceptor_nets(void)
 {
 	/* Return the # of NIs that need the acceptor. */
-	int count = 0;
-	struct lnet_ni *ni;
-	int cpt;
+	int		 count = 0;
+	struct lnet_net  *net;
+	int		 cpt;
 
 	cpt = lnet_net_lock_current();
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (ni->ni_net->net_lnd->lnd_accept)
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		/* all socklnd type networks should have the acceptor
+		 * thread started */
+		if (net->net_lnd->lnd_accept)
 			count++;
 	}
 
@@ -832,13 +840,16 @@ lnet_ping_info_create(int num_ni)
 static inline int
 lnet_get_ni_count(void)
 {
-	struct lnet_ni *ni;
-	int count = 0;
+	struct lnet_ni	*ni;
+	struct lnet_net *net;
+	int		count = 0;
 
 	lnet_net_lock(0);
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list)
-		count++;
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
+			count++;
+	}
 
 	lnet_net_unlock(0);
 
@@ -854,14 +865,17 @@ lnet_ping_info_free(struct lnet_ping_info *pinfo)
 static void
 lnet_ping_info_destroy(void)
 {
+	struct lnet_net *net;
 	struct lnet_ni *ni;
 
 	lnet_net_lock(LNET_LOCK_EX);
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		lnet_ni_lock(ni);
-		ni->ni_status = NULL;
-		lnet_ni_unlock(ni);
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			lnet_ni_lock(ni);
+			ni->ni_status = NULL;
+			lnet_ni_unlock(ni);
+		}
 	}
 
 	lnet_ping_info_free(the_lnet.ln_ping_info);
@@ -963,24 +977,28 @@ lnet_ping_md_unlink(struct lnet_ping_info *pinfo,
 static void
 lnet_ping_info_install_locked(struct lnet_ping_info *ping_info)
 {
+	int i = 0;
 	struct lnet_ni_status *ns;
 	struct lnet_ni *ni;
-	int i = 0;
+	struct lnet_net *net;
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		LASSERT(i < ping_info->pi_nnis);
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			LASSERT(i < ping_info->pi_nnis);
 
-		ns = &ping_info->pi_ni[i];
+			ns = &ping_info->pi_ni[i];
 
-		ns->ns_nid = ni->ni_nid;
+			ns->ns_nid = ni->ni_nid;
 
-		lnet_ni_lock(ni);
-		ns->ns_status = (ni->ni_status) ?
-				 ni->ni_status->ns_status : LNET_NI_STATUS_UP;
-		ni->ni_status = ns;
-		lnet_ni_unlock(ni);
+			lnet_ni_lock(ni);
+			ns->ns_status = ni->ni_status ?
+					ni->ni_status->ns_status :
+						LNET_NI_STATUS_UP;
+			ni->ni_status = ns;
+			lnet_ni_unlock(ni);
 
-		i++;
+			i++;
+		}
 	}
 }
 
@@ -1054,9 +1072,9 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
 	}
 
 	/* move it to zombie list and nobody can find it anymore */
-	LASSERT(!list_empty(&ni->ni_list));
-	list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
-	lnet_ni_decref_locked(ni, 0);	/* drop ln_nis' ref */
+	LASSERT(!list_empty(&ni->ni_netlist));
+	list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
+	lnet_ni_decref_locked(ni, 0);
 }
 
 static void
@@ -1076,17 +1094,17 @@ lnet_clear_zombies_nis_locked(void)
 		int j;
 
 		ni = list_entry(the_lnet.ln_nis_zombie.next,
-				struct lnet_ni, ni_list);
-		list_del_init(&ni->ni_list);
+				struct lnet_ni, ni_netlist);
+		list_del_init(&ni->ni_netlist);
 		cfs_percpt_for_each(ref, j, ni->ni_refs) {
 			if (!*ref)
 				continue;
 			/* still busy, add it back to zombie list */
-			list_add(&ni->ni_list, &the_lnet.ln_nis_zombie);
+			list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
 			break;
 		}
 
-		if (!list_empty(&ni->ni_list)) {
+		if (!list_empty(&ni->ni_netlist)) {
 			lnet_net_unlock(LNET_LOCK_EX);
 			++i;
 			if ((i & (-i)) == i) {
@@ -1126,6 +1144,7 @@ lnet_shutdown_lndnis(void)
 {
 	struct lnet_ni *ni;
 	int i;
+	struct lnet_net *net;
 
 	/* NB called holding the global mutex */
 
@@ -1138,10 +1157,14 @@ lnet_shutdown_lndnis(void)
 	the_lnet.ln_shutdown = 1;	/* flag shutdown */
 
 	/* Unlink NIs from the global table */
-	while (!list_empty(&the_lnet.ln_nis)) {
-		ni = list_entry(the_lnet.ln_nis.next,
-				struct lnet_ni, ni_list);
-		lnet_ni_unlink_locked(ni);
+	while (!list_empty(&the_lnet.ln_nets)) {
+		net = list_entry(the_lnet.ln_nets.next,
+				 struct lnet_net, net_list);
+		while (!list_empty(&net->net_ni_list)) {
+			ni = list_entry(net->net_ni_list.next,
+					struct lnet_ni, ni_netlist);
+			lnet_ni_unlink_locked(ni);
+		}
 	}
 
 	/* Drop the cached loopback NI. */
@@ -1212,7 +1235,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 
 	/* Make sure this new NI is unique. */
 	lnet_net_lock(LNET_LOCK_EX);
-	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis);
+	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nets);
 	lnet_net_unlock(LNET_LOCK_EX);
 	if (!rc) {
 		if (lnd_type == LOLND) {
@@ -1297,7 +1320,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 	lnet_net_lock(LNET_LOCK_EX);
 	/* refcount for ln_nis */
 	lnet_ni_addref_locked(ni, 0);
-	list_add_tail(&ni->ni_list, &the_lnet.ln_nis);
+	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
 	if (ni->ni_cpts) {
 		lnet_ni_addref_locked(ni, 0);
 		list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
@@ -1363,8 +1386,8 @@ lnet_startup_lndnis(struct list_head *nilist)
 	int ni_count = 0;
 
 	while (!list_empty(nilist)) {
-		ni = list_entry(nilist->next, struct lnet_ni, ni_list);
-		list_del(&ni->ni_list);
+		ni = list_entry(nilist->next, struct lnet_ni, ni_netlist);
+		list_del(&ni->ni_netlist);
 		rc = lnet_startup_lndni(ni, NULL);
 
 		if (rc < 0)
@@ -1486,6 +1509,7 @@ LNetNIInit(lnet_pid_t requested_pid)
 	struct lnet_ping_info *pinfo;
 	struct lnet_handle_md md_handle;
 	struct list_head net_head;
+	struct lnet_net		*net;
 
 	INIT_LIST_HEAD(&net_head);
 
@@ -1505,8 +1529,15 @@ LNetNIInit(lnet_pid_t requested_pid)
 		return rc;
 	}
 
-	/* Add in the loopback network */
-	if (!lnet_ni_alloc(LNET_MKNET(LOLND, 0), NULL, &net_head)) {
+	/* create a network for Loopback network */
+	net = lnet_net_alloc(LNET_MKNET(LOLND, 0), &net_head);
+	if (net == NULL) {
+		rc = -ENOMEM;
+		goto err_empty_list;
+	}
+
+	/* Add in the loopback NI */
+	if (lnet_ni_alloc(net, NULL, NULL) == NULL) {
 		rc = -ENOMEM;
 		goto err_empty_list;
 	}
@@ -1584,11 +1615,11 @@ LNetNIInit(lnet_pid_t requested_pid)
 	LASSERT(rc < 0);
 	mutex_unlock(&the_lnet.ln_api_mutex);
 	while (!list_empty(&net_head)) {
-		struct lnet_ni *ni;
+		struct lnet_net *net;
 
-		ni = list_entry(net_head.next, struct lnet_ni, ni_list);
-		list_del_init(&ni->ni_list);
-		lnet_ni_free(ni);
+		net = list_entry(net_head.next, struct lnet_net, net_list);
+		list_del_init(&net->net_list);
+		lnet_net_free(net);
 	}
 	return rc;
 }
@@ -1714,25 +1745,83 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
 	}
 }
 
+struct lnet_ni *
+lnet_get_ni_idx_locked(int idx)
+{
+	struct lnet_ni		*ni;
+	struct lnet_net		*net;
+
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			if (idx-- == 0)
+				return ni;
+		}
+	}
+
+	return NULL;
+}
+
+struct lnet_ni *
+lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev)
+{
+	struct lnet_ni		*ni;
+	struct lnet_net		*net = mynet;
+
+	if (prev == NULL) {
+		if (net == NULL)
+			net = list_entry(the_lnet.ln_nets.next, struct lnet_net,
+					net_list);
+		ni = list_entry(net->net_ni_list.next, struct lnet_ni,
+				ni_netlist);
+
+		return ni;
+	}
+
+	if (prev->ni_netlist.next == &prev->ni_net->net_ni_list) {
+		/* if you reached the end of the ni list and the net is
+		 * specified, then there are no more nis in that net */
+		if (net != NULL)
+			return NULL;
+
+		/* we reached the end of this net ni list. move to the
+		 * next net */
+		if (prev->ni_net->net_list.next == &the_lnet.ln_nets)
+			/* no more nets and no more NIs. */
+			return NULL;
+
+		/* get the next net */
+		net = list_entry(prev->ni_net->net_list.next, struct lnet_net,
+				 net_list);
+		/* get the ni on it */
+		ni = list_entry(net->net_ni_list.next, struct lnet_ni,
+				ni_netlist);
+
+		return ni;
+	}
+
+	/* there are more nis left */
+	ni = list_entry(prev->ni_netlist.next, struct lnet_ni, ni_netlist);
+
+	return ni;
+}
+
 static int
 lnet_get_net_config(struct lnet_ioctl_config_data *config)
 {
 	struct lnet_ni *ni;
+	int cpt;
 	int idx = config->cfg_count;
-	int cpt, i = 0;
 	int rc = -ENOENT;
 
 	cpt = lnet_net_lock_current();
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (i++ != idx)
-			continue;
+	ni = lnet_get_ni_idx_locked(idx);
 
+	if (ni != NULL) {
+		rc = 0;
 		lnet_ni_lock(ni);
 		lnet_fill_ni_info(ni, config);
 		lnet_ni_unlock(ni);
-		rc = 0;
-		break;
 	}
 
 	lnet_net_unlock(cpt);
@@ -1745,6 +1834,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	char *nets = conf->cfg_config_u.cfg_net.net_intf;
 	struct lnet_ping_info *pinfo;
 	struct lnet_handle_md md_handle;
+	struct lnet_net		*net;
 	struct lnet_ni *ni;
 	struct list_head net_head;
 	struct lnet_remotenet *rnet;
@@ -1752,7 +1842,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 
 	INIT_LIST_HEAD(&net_head);
 
-	/* Create a ni structure for the network string */
+	/* Create a net/ni structures for the network string */
 	rc = lnet_parse_networks(&net_head, nets);
 	if (rc <= 0)
 		return !rc ? -EINVAL : rc;
@@ -1760,14 +1850,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	mutex_lock(&the_lnet.ln_api_mutex);
 
 	if (rc > 1) {
-		rc = -EINVAL; /* only add one interface per call */
+		rc = -EINVAL; /* only add one network per call */
 		goto failed0;
 	}
 
-	ni = list_entry(net_head.next, struct lnet_ni, ni_list);
+	net = list_entry(net_head.next, struct lnet_net, net_list);
 
 	lnet_net_lock(LNET_LOCK_EX);
-	rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid));
+	rnet = lnet_find_net_locked(net->net_id);
 	lnet_net_unlock(LNET_LOCK_EX);
 	/*
 	 * make sure that the net added doesn't invalidate the current
@@ -1785,8 +1875,8 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	if (rc)
 		goto failed0;
 
-	list_del_init(&ni->ni_list);
-
+	list_del_init(&net->net_list);
+	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
 	rc = lnet_startup_lndni(ni, conf);
 	if (rc)
 		goto failed1;
@@ -1812,9 +1902,9 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 failed0:
 	mutex_unlock(&the_lnet.ln_api_mutex);
 	while (!list_empty(&net_head)) {
-		ni = list_entry(net_head.next, struct lnet_ni, ni_list);
-		list_del_init(&ni->ni_list);
-		lnet_ni_free(ni);
+		net = list_entry(net_head.next, struct lnet_net, net_list);
+		list_del_init(&net->net_list);
+		lnet_net_free(net);
 	}
 	return rc;
 }
@@ -1849,7 +1939,7 @@ lnet_dyn_del_ni(__u32 net)
 
 	lnet_shutdown_lndni(ni);
 
-	if (!lnet_count_acceptor_nis())
+	if (!lnet_count_acceptor_nets())
 		lnet_acceptor_stop();
 
 	lnet_ping_target_update(pinfo, md_handle);
@@ -2103,7 +2193,8 @@ EXPORT_SYMBOL(LNetDebugPeer);
 int
 LNetGetId(unsigned int index, struct lnet_process_id *id)
 {
-	struct lnet_ni *ni;
+	struct lnet_ni	 *ni;
+	struct lnet_net  *net;
 	int cpt;
 	int rc = -ENOENT;
 
@@ -2111,14 +2202,16 @@ LNetGetId(unsigned int index, struct lnet_process_id *id)
 
 	cpt = lnet_net_lock_current();
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
-		if (index--)
-			continue;
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			if (index-- != 0)
+				continue;
 
-		id->nid = ni->ni_nid;
-		id->pid = the_lnet.ln_pid;
-		rc = 0;
-		break;
+			id->nid = ni->ni_nid;
+			id->pid = the_lnet.ln_pid;
+			rc = 0;
+			break;
+		}
 	}
 
 	lnet_net_unlock(cpt);
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 5646feeb433e..e83bdbec11e3 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -78,17 +78,17 @@ lnet_issep(char c)
 	}
 }
 
-int
-lnet_net_unique(__u32 net, struct list_head *nilist)
+bool
+lnet_net_unique(__u32 net, struct list_head *netlist)
 {
-	struct lnet_ni *ni;
+	struct lnet_net	 *net_l;
 
-	list_for_each_entry(ni, nilist, ni_list) {
-		if (LNET_NIDNET(ni->ni_nid) == net)
-			return 0;
+	list_for_each_entry(net_l, netlist, net_list) {
+		if (net_l->net_id == net)
+			return false;
 	}
 
-	return 1;
+	return true;
 }
 
 void
@@ -112,41 +112,78 @@ lnet_ni_free(struct lnet_ni *ni)
 	if (ni->ni_net_ns)
 		put_net(ni->ni_net_ns);
 
-	kvfree(ni->ni_net);
 	kfree(ni);
 }
 
-struct lnet_ni *
-lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
+void
+lnet_net_free(struct lnet_net *net)
 {
-	struct lnet_tx_queue *tq;
+	struct list_head *tmp, *tmp2;
 	struct lnet_ni *ni;
-	int rc;
-	int i;
+
+	/* delete any nis which have been started. */
+	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
+		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
+		list_del_init(&ni->ni_netlist);
+		lnet_ni_free(ni);
+	}
+
+	kfree(net);
+}
+
+struct lnet_net *
+lnet_net_alloc(__u32 net_id, struct list_head *net_list)
+{
 	struct lnet_net		*net;
 
-	if (!lnet_net_unique(net_id, nilist)) {
-		LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
-				   libcfs_net2str(net_id));
+	if (!lnet_net_unique(net_id, net_list)) {
+		CERROR("Duplicate net %s. Ignore\n",
+		       libcfs_net2str(net_id));
 		return NULL;
 	}
 
-	ni = kzalloc(sizeof(*ni), GFP_NOFS);
 	net = kzalloc(sizeof(*net), GFP_NOFS);
-	if (!ni || !net) {
-		kfree(ni); kfree(net);
+	if (!net) {
 		CERROR("Out of memory creating network %s\n",
 		       libcfs_net2str(net_id));
 		return NULL;
 	}
+
+	INIT_LIST_HEAD(&net->net_list);
+	INIT_LIST_HEAD(&net->net_ni_list);
+
+	net->net_id = net_id;
+
 	/* initialize global paramters to undefiend */
 	net->net_tunables.lct_peer_timeout = -1;
 	net->net_tunables.lct_max_tx_credits = -1;
 	net->net_tunables.lct_peer_tx_credits = -1;
 	net->net_tunables.lct_peer_rtr_credits = -1;
 
+	list_add_tail(&net->net_list, net_list);
+
+	return net;
+}
+
+struct lnet_ni *
+lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
+{
+	struct lnet_tx_queue	*tq;
+	struct lnet_ni		*ni;
+	int			rc;
+	int			i;
+
+	ni = kzalloc(sizeof(*ni), GFP_KERNEL);
+	if (ni == NULL) {
+		CERROR("Out of memory creating network interface %s%s\n",
+		       libcfs_net2str(net->net_id),
+		       (iface != NULL) ? iface : "");
+		return NULL;
+	}
+
 	spin_lock_init(&ni->ni_lock);
 	INIT_LIST_HEAD(&ni->ni_cptlist);
+	INIT_LIST_HEAD(&ni->ni_netlist);
 	ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(),
 				       sizeof(*ni->ni_refs[0]));
 	if (!ni->ni_refs)
@@ -166,8 +203,9 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
 	} else {
 		rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
 		if (rc <= 0) {
-			CERROR("Failed to set CPTs for NI %s: %d\n",
-			       libcfs_net2str(net_id), rc);
+			CERROR("Failed to set CPTs for NI %s(%s): %d\n",
+			       libcfs_net2str(net->net_id),
+			       (iface != NULL) ? iface : "", rc);
 			goto failed;
 		}
 
@@ -182,7 +220,7 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
 
 	ni->ni_net = net;
 	/* LND will fill in the address part of the NID */
-	ni->ni_nid = LNET_MKNID(net_id, 0);
+	ni->ni_nid = LNET_MKNID(net->net_id, 0);
 
 	/* Store net namespace in which current ni is being created */
 	if (current->nsproxy->net_ns)
@@ -191,22 +229,24 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
 		ni->ni_net_ns = NULL;
 
 	ni->ni_last_alive = ktime_get_real_seconds();
-	list_add_tail(&ni->ni_list, nilist);
+	list_add_tail(&ni->ni_netlist, &net->net_ni_list);
+
 	return ni;
- failed:
+failed:
 	lnet_ni_free(ni);
 	return NULL;
 }
 
 int
-lnet_parse_networks(struct list_head *nilist, char *networks)
+lnet_parse_networks(struct list_head *netlist, char *networks)
 {
 	struct cfs_expr_list *el = NULL;
 	char *tokens;
 	char *str;
 	char *tmp;
-	struct lnet_ni *ni;
-	__u32 net;
+	struct lnet_net *net;
+	struct lnet_ni *ni = NULL;
+	__u32 net_id;
 	int nnets = 0;
 	struct list_head *temp_node;
 
@@ -275,18 +315,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 
 			if (comma)
 				*comma++ = 0;
-			net = libcfs_str2net(strim(str));
+			net_id = libcfs_str2net(strim(str));
 
-			if (net == LNET_NIDNET(LNET_NID_ANY)) {
+			if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
 				LCONSOLE_ERROR_MSG(0x113,
 						   "Unrecognised network type\n");
 				tmp = str;
 				goto failed_syntax;
 			}
 
-			if (LNET_NETTYP(net) != LOLND && /* LO is implicit */
-			    !lnet_ni_alloc(net, el, nilist))
-				goto failed;
+			if (LNET_NETTYP(net_id) != LOLND) { /* LO is implicit */
+				net = lnet_net_alloc(net_id, netlist);
+				if (!net ||
+				    !lnet_ni_alloc(net, el, NULL))
+					goto failed;
+			}
 
 			if (el) {
 				cfs_expr_list_free(el);
@@ -298,14 +341,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 		}
 
 		*bracket = 0;
-		net = libcfs_str2net(strim(str));
-		if (net == LNET_NIDNET(LNET_NID_ANY)) {
+		net_id = libcfs_str2net(strim(str));
+		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
 			tmp = str;
 			goto failed_syntax;
 		}
 
-		ni = lnet_ni_alloc(net, el, nilist);
-		if (!ni)
+		/* always allocate a net, since we will eventually add an
+		 * interface to it, or we will fail, in which case we'll
+		 * just delete it */
+		net = lnet_net_alloc(net_id, netlist);
+		if (IS_ERR_OR_NULL(net))
+			goto failed;
+
+		ni = lnet_ni_alloc(net, el, NULL);
+		if (IS_ERR_OR_NULL(ni))
 			goto failed;
 
 		if (el) {
@@ -337,7 +387,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 			if (niface == LNET_MAX_INTERFACES) {
 				LCONSOLE_ERROR_MSG(0x115,
 						   "Too many interfaces for net %s\n",
-						   libcfs_net2str(net));
+						   libcfs_net2str(net_id));
 				goto failed;
 			}
 
@@ -378,7 +428,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 		}
 	}
 
-	list_for_each(temp_node, nilist)
+	list_for_each(temp_node, netlist)
 		nnets++;
 
 	kfree(tokens);
@@ -387,11 +437,12 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
  failed_syntax:
 	lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
  failed:
-	while (!list_empty(nilist)) {
-		ni = list_entry(nilist->next, struct lnet_ni, ni_list);
+	/* free the net list and all the nis on each net */
+	while (!list_empty(netlist)) {
+		net = list_entry(netlist->next, struct lnet_net, net_list);
 
-		list_del(&ni->ni_list);
-		lnet_ni_free(ni);
+		list_del_init(&net->net_list);
+		lnet_net_free(net);
 	}
 
 	if (el)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 1bf12af87a20..1c874025fa74 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -2289,7 +2289,7 @@ EXPORT_SYMBOL(LNetGet);
 int
 LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
 {
-	struct lnet_ni *ni;
+	struct lnet_ni *ni = NULL;
 	struct lnet_remotenet *rnet;
 	__u32 dstnet = LNET_NIDNET(dstnid);
 	int hops;
@@ -2307,9 +2307,9 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
 
 	cpt = lnet_net_lock_current();
 
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
+	while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
 		if (ni->ni_nid == dstnid) {
-			if (srcnidp)
+			if (srcnidp != NULL)
 				*srcnidp = dstnid;
 			if (orderp) {
 				if (LNET_NETTYP(LNET_NIDNET(dstnid)) == LOLND)
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 0c0ec0b27982..135dfe793b0b 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -245,13 +245,10 @@ static void lnet_shuffle_seed(void)
 	if (seeded)
 		return;
 
-	/*
-	 * Nodes with small feet have little entropy
-	 * the NID for this node gives the most entropy in the low bits
-	 */
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
+	/* Nodes with small feet have little entropy
+	 * the NID for this node gives the most entropy in the low bits */
+	while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
 		__u32 lnd_type, seed;
-
 		lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
 		if (lnd_type != LOLND) {
 			seed = (LNET_NIDADDR(ni->ni_nid) | lnd_type);
@@ -807,8 +804,8 @@ lnet_router_ni_update_locked(struct lnet_peer *gw, __u32 net)
 static void
 lnet_update_ni_status_locked(void)
 {
-	struct lnet_ni *ni;
-	time64_t now;
+	struct lnet_ni *ni = NULL;
+	time64_t	now;
 	time64_t timeout;
 
 	LASSERT(the_lnet.ln_routing);
@@ -817,7 +814,7 @@ lnet_update_ni_status_locked(void)
 		  max(live_router_check_interval, dead_router_check_interval);
 
 	now = ktime_get_real_seconds();
-	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
+	while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
 		if (ni->ni_net->net_lnd->lnd_type == LOLND)
 			continue;
 
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index f3ccd6a2b70e..2a366e9a8627 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -641,26 +641,12 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
 			      "rtr", "max", "tx", "min");
 		LASSERT(tmpstr + tmpsiz - s > 0);
 	} else {
-		struct list_head *n;
 		struct lnet_ni *ni = NULL;
 		int skip = *ppos - 1;
 
 		lnet_net_lock(0);
 
-		n = the_lnet.ln_nis.next;
-
-		while (n != &the_lnet.ln_nis) {
-			struct lnet_ni *a_ni;
-
-			a_ni = list_entry(n, struct lnet_ni, ni_list);
-			if (!skip) {
-				ni = a_ni;
-				break;
-			}
-
-			skip--;
-			n = n->next;
-		}
+		ni = lnet_get_ni_idx_locked(skip);
 
 		if (ni) {
 			struct lnet_tx_queue *tq;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (5 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:24   ` Doug Oucharek
                     ` (2 more replies)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net NeilBrown
                   ` (27 subsequent siblings)
  34 siblings, 3 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Currently we store the net-interface in the peer, but the
peer should identify just the network, not the particular interface.
To help track which actual interface is used for each
message, store them explicitly.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

and includes commit 63c3e5129873 ("LU-7734 lnet: Fix lnet_msg_free()")

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    3 +++
 drivers/staging/lustre/lnet/lnet/lib-move.c        |   21 ++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 5f0d4703bf86..16a493529a46 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -98,6 +98,9 @@ struct lnet_msg {
 
 	void			*msg_private;
 	struct lnet_libmd	*msg_md;
+	/* the NI the message was sent or received over */
+	struct lnet_ni       *msg_txni;
+	struct lnet_ni       *msg_rxni;
 
 	unsigned int		 msg_len;
 	unsigned int		 msg_wanted;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 1c874025fa74..b2a52ddcefcb 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -782,6 +782,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
 {
 	struct lnet_peer *txpeer = msg->msg_txpeer;
 	struct lnet_msg *msg2;
+	struct lnet_ni	*txni = msg->msg_txni;
 
 	if (msg->msg_txcredit) {
 		struct lnet_ni *ni = txpeer->lp_ni;
@@ -829,6 +830,11 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
 		}
 	}
 
+	if (txni != NULL) {
+		msg->msg_txni = NULL;
+		lnet_ni_decref_locked(txni, msg->msg_tx_cpt);
+	}
+
 	if (txpeer) {
 		msg->msg_txpeer = NULL;
 		lnet_peer_decref_locked(txpeer);
@@ -876,6 +882,7 @@ void
 lnet_return_rx_credits_locked(struct lnet_msg *msg)
 {
 	struct lnet_peer *rxpeer = msg->msg_rxpeer;
+	struct lnet_ni	*rxni = msg->msg_rxni;
 	struct lnet_msg *msg2;
 
 	if (msg->msg_rtrcredit) {
@@ -951,6 +958,10 @@ lnet_return_rx_credits_locked(struct lnet_msg *msg)
 			(void)lnet_post_routed_recv_locked(msg2, 1);
 		}
 	}
+	if (rxni != NULL) {
+		msg->msg_rxni = NULL;
+		lnet_ni_decref_locked(rxni, msg->msg_rx_cpt);
+	}
 	if (rxpeer) {
 		msg->msg_rxpeer = NULL;
 		lnet_peer_decref_locked(rxpeer);
@@ -1218,9 +1229,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 
 	LASSERT(!msg->msg_peertxcredit);
 	LASSERT(!msg->msg_txcredit);
-	LASSERT(!msg->msg_txpeer);
+	LASSERT(msg->msg_txpeer == NULL);
 
-	msg->msg_txpeer = lp;		   /* msg takes my ref on lp */
+	msg->msg_txpeer = lp;                   /* msg takes my ref on lp */
+	/* set the NI for this message */
+	msg->msg_txni = src_ni;
+	lnet_ni_addref_locked(msg->msg_txni, cpt);
 
 	rc = lnet_post_send_locked(msg, 0);
 	lnet_net_unlock(cpt);
@@ -1818,6 +1832,8 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
 			return 0;
 		goto drop;
 	}
+	msg->msg_rxni = ni;
+	lnet_ni_addref_locked(ni, cpt);
 
 	if (lnet_isrouter(msg->msg_rxpeer)) {
 		lnet_peer_set_alive(msg->msg_rxpeer);
@@ -1934,6 +1950,7 @@ lnet_recv_delayed_msg_list(struct list_head *head)
 		LASSERT(msg->msg_rx_delayed);
 		LASSERT(msg->msg_md);
 		LASSERT(msg->msg_rxpeer);
+		LASSERT(msg->msg_rxni);
 		LASSERT(msg->msg_hdr.type == LNET_MSG_PUT);
 
 		CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (14 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:17   ` James Simmons
  2018-09-07  0:49 ` [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info NeilBrown
                   ` (18 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

As a net will soon have multiple ni, a peer should identify
just the net.
Various places that we need the ni, we now use rxni or txni from
the message

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    3 +
 .../staging/lustre/include/linux/lnet/lib-types.h  |    5 +-
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   13 +++++
 drivers/staging/lustre/lnet/lnet/lib-move.c        |   49 +++++++++++---------
 drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 -
 drivers/staging/lustre/lnet/lnet/net_fault.c       |    3 +
 drivers/staging/lustre/lnet/lnet/peer.c            |   26 ++++-------
 drivers/staging/lustre/lnet/lnet/router.c          |   14 +++---
 drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 -
 9 files changed, 67 insertions(+), 50 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 4440b87299c4..34509e52bac7 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -435,6 +435,7 @@ int lnet_dyn_add_ni(lnet_pid_t requested_pid,
 		    struct lnet_ioctl_config_data *conf);
 int lnet_dyn_del_ni(__u32 net);
 int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
+struct lnet_net *lnet_get_net_locked(__u32 net_id);
 
 int lnet_islocalnid(lnet_nid_t nid);
 int lnet_islocalnet(__u32 net);
@@ -617,7 +618,7 @@ int lnet_sock_connect(struct socket **sockp, int *fatal,
 void libcfs_sock_release(struct socket *sock);
 
 int lnet_peers_start_down(void);
-int lnet_peer_buffer_credits(struct lnet_ni *ni);
+int lnet_peer_buffer_credits(struct lnet_net *net);
 
 int lnet_router_checker_start(void);
 void lnet_router_checker_stop(void);
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 16a493529a46..255c6c4bbb89 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -396,7 +396,8 @@ struct lnet_peer {
 	time64_t		 lp_last_query;	/* when lp_ni was queried
 						 * last time
 						 */
-	struct lnet_ni		*lp_ni;		/* interface peer is on */
+	/* network peer is on */
+	struct lnet_net		*lp_net;
 	lnet_nid_t		 lp_nid;	/* peer's NID */
 	int			 lp_refcount;	/* # refs */
 	int			 lp_cpt;	/* CPT this peer attached on */
@@ -427,7 +428,7 @@ struct lnet_peer_table {
  * lnet_ni::ni_peertimeout has been set to a positive value
  */
 #define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing && \
-					 (lp)->lp_ni->ni_net->net_tunables.lct_peer_timeout > 0)
+					 (lp)->lp_net->net_tunables.lct_peer_timeout > 0)
 
 struct lnet_route {
 	struct list_head	 lr_list;	/* chain on net */
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 05687278334a..c21aef32cdde 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -680,6 +680,19 @@ lnet_net2ni(__u32 net)
 }
 EXPORT_SYMBOL(lnet_net2ni);
 
+struct lnet_net *
+lnet_get_net_locked(__u32 net_id)
+{
+	struct lnet_net	 *net;
+
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		if (net->net_id == net_id)
+			return net;
+	}
+
+	return NULL;
+}
+
 static unsigned int
 lnet_nid_cpt_hash(lnet_nid_t nid, unsigned int number)
 {
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index b2a52ddcefcb..b8b15f56a275 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -525,7 +525,7 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
 		return 0;
 
 	deadline = lp->lp_last_alive +
-		lp->lp_ni->ni_net->net_tunables.lct_peer_timeout;
+		lp->lp_net->net_tunables.lct_peer_timeout;
 	alive = deadline > now;
 
 	/* Update obsolete lp_alive except for routers assumed to be dead
@@ -544,7 +544,7 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
  *     may drop the lnet_net_lock
  */
 static int
-lnet_peer_alive_locked(struct lnet_peer *lp)
+lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer *lp)
 {
 	time64_t now = ktime_get_seconds();
 
@@ -570,13 +570,13 @@ lnet_peer_alive_locked(struct lnet_peer *lp)
 				      libcfs_nid2str(lp->lp_nid),
 				      now, next_query,
 				      lnet_queryinterval,
-				      lp->lp_ni->ni_net->net_tunables.lct_peer_timeout);
+				      lp->lp_net->net_tunables.lct_peer_timeout);
 			return 0;
 		}
 	}
 
 	/* query NI for latest aliveness news */
-	lnet_ni_query_locked(lp->lp_ni, lp);
+	lnet_ni_query_locked(ni, lp);
 
 	if (lnet_peer_is_alive(lp, now))
 		return 1;
@@ -600,7 +600,7 @@ static int
 lnet_post_send_locked(struct lnet_msg *msg, int do_send)
 {
 	struct lnet_peer *lp = msg->msg_txpeer;
-	struct lnet_ni *ni = lp->lp_ni;
+	struct lnet_ni *ni = msg->msg_txni;
 	int cpt = msg->msg_tx_cpt;
 	struct lnet_tx_queue *tq = ni->ni_tx_queues[cpt];
 
@@ -611,7 +611,7 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
 
 	/* NB 'lp' is always the next hop */
 	if (!(msg->msg_target.pid & LNET_PID_USERFLAG) &&
-	    !lnet_peer_alive_locked(lp)) {
+	    !lnet_peer_alive_locked(ni, lp)) {
 		the_lnet.ln_counters[cpt]->drop_count++;
 		the_lnet.ln_counters[cpt]->drop_length += msg->msg_len;
 		lnet_net_unlock(cpt);
@@ -770,7 +770,7 @@ lnet_post_routed_recv_locked(struct lnet_msg *msg, int do_recv)
 		int cpt = msg->msg_rx_cpt;
 
 		lnet_net_unlock(cpt);
-		lnet_ni_recv(lp->lp_ni, msg->msg_private, msg, 1,
+		lnet_ni_recv(msg->msg_rxni, msg->msg_private, msg, 1,
 			     0, msg->msg_len, msg->msg_len);
 		lnet_net_lock(cpt);
 	}
@@ -785,7 +785,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
 	struct lnet_ni	*txni = msg->msg_txni;
 
 	if (msg->msg_txcredit) {
-		struct lnet_ni *ni = txpeer->lp_ni;
+		struct lnet_ni *ni = msg->msg_txni;
 		struct lnet_tx_queue *tq = ni->ni_tx_queues[msg->msg_tx_cpt];
 
 		/* give back NI txcredits */
@@ -800,7 +800,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
 					  struct lnet_msg, msg_list);
 			list_del(&msg2->msg_list);
 
-			LASSERT(msg2->msg_txpeer->lp_ni == ni);
+			LASSERT(msg2->msg_txni == ni);
 			LASSERT(msg2->msg_tx_delayed);
 
 			(void)lnet_post_send_locked(msg2, 1);
@@ -869,7 +869,7 @@ lnet_drop_routed_msgs_locked(struct list_head *list, int cpt)
 
 	while(!list_empty(&drop)) {
 		msg = list_first_entry(&drop, struct lnet_msg, msg_list);
-		lnet_ni_recv(msg->msg_rxpeer->lp_ni, msg->msg_private, NULL,
+		lnet_ni_recv(msg->msg_rxni, msg->msg_private, NULL,
 			     0, 0, 0, msg->msg_hdr.payload_length);
 		list_del_init(&msg->msg_list);
 		lnet_finalize(NULL, msg, -ECANCELED);
@@ -1007,7 +1007,7 @@ lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2)
 }
 
 static struct lnet_peer *
-lnet_find_route_locked(struct lnet_ni *ni, lnet_nid_t target,
+lnet_find_route_locked(struct lnet_net *net, lnet_nid_t target,
 		       lnet_nid_t rtr_nid)
 {
 	struct lnet_remotenet *rnet;
@@ -1035,7 +1035,7 @@ lnet_find_route_locked(struct lnet_ni *ni, lnet_nid_t target,
 		if (!lnet_is_route_alive(route))
 			continue;
 
-		if (ni && lp->lp_ni != ni)
+		if (net && lp->lp_net != net)
 			continue;
 
 		if (lp->lp_nid == rtr_nid) /* it's pre-determined router */
@@ -1164,10 +1164,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 			/* ENOMEM or shutting down */
 			return rc;
 		}
-		LASSERT(lp->lp_ni == src_ni);
+		LASSERT(lp->lp_net == src_ni->ni_net);
 	} else {
 		/* sending to a remote network */
-		lp = lnet_find_route_locked(src_ni, dst_nid, rtr_nid);
+		lp = lnet_find_route_locked(src_ni != NULL ?
+					    src_ni->ni_net : NULL,
+					    dst_nid, rtr_nid);
 		if (!lp) {
 			if (src_ni)
 				lnet_ni_decref_locked(src_ni, cpt);
@@ -1203,10 +1205,11 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 		       lnet_msgtyp2str(msg->msg_type), msg->msg_len);
 
 		if (!src_ni) {
-			src_ni = lp->lp_ni;
+			src_ni = lnet_get_next_ni_locked(lp->lp_net, NULL);
+			LASSERT(src_ni != NULL);
 			src_nid = src_ni->ni_nid;
 		} else {
-			LASSERT(src_ni == lp->lp_ni);
+			LASSERT(src_ni->ni_net == lp->lp_net);
 			lnet_ni_decref_locked(src_ni, cpt);
 		}
 
@@ -1918,7 +1921,7 @@ lnet_drop_delayed_msg_list(struct list_head *head, char *reason)
 		 * called lnet_drop_message(), so I just hang onto msg as well
 		 * until that's done
 		 */
-		lnet_drop_message(msg->msg_rxpeer->lp_ni,
+		lnet_drop_message(msg->msg_rxni,
 				  msg->msg_rxpeer->lp_cpt,
 				  msg->msg_private, msg->msg_len);
 		/*
@@ -1926,7 +1929,7 @@ lnet_drop_delayed_msg_list(struct list_head *head, char *reason)
 		 * but we still should give error code so lnet_msg_decommit()
 		 * can skip counters operations and other checks.
 		 */
-		lnet_finalize(msg->msg_rxpeer->lp_ni, msg, -ENOENT);
+		lnet_finalize(msg->msg_rxni, msg, -ENOENT);
 	}
 }
 
@@ -1959,7 +1962,7 @@ lnet_recv_delayed_msg_list(struct list_head *head)
 		       msg->msg_hdr.msg.put.offset,
 		       msg->msg_hdr.payload_length);
 
-		lnet_recv_put(msg->msg_rxpeer->lp_ni, msg);
+		lnet_recv_put(msg->msg_rxni, msg);
 	}
 }
 
@@ -2384,8 +2387,12 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
 
 			LASSERT(shortest);
 			hops = shortest_hops;
-			if (srcnidp)
-				*srcnidp = shortest->lr_gateway->lp_ni->ni_nid;
+			if (srcnidp) {
+				ni = lnet_get_next_ni_locked(
+					shortest->lr_gateway->lp_net,
+					NULL);
+				*srcnidp = ni->ni_nid;
+			}
 			if (orderp)
 				*orderp = order;
 			lnet_net_unlock(cpt);
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index fc47379c5938..4c5737083422 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -946,7 +946,7 @@ lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason)
 		/* grab all messages which are on the NI passed in */
 		list_for_each_entry_safe(msg, tmp, &ptl->ptl_msg_delayed,
 					 msg_list) {
-			if (msg->msg_rxpeer->lp_ni == ni)
+			if (msg->msg_txni == ni || msg->msg_rxni == ni)
 				list_move(&msg->msg_list, &zombies);
 		}
 	} else {
diff --git a/drivers/staging/lustre/lnet/lnet/net_fault.c b/drivers/staging/lustre/lnet/lnet/net_fault.c
index 41d6131ee15a..6c53ae1811e5 100644
--- a/drivers/staging/lustre/lnet/lnet/net_fault.c
+++ b/drivers/staging/lustre/lnet/lnet/net_fault.c
@@ -601,8 +601,9 @@ delayed_msg_process(struct list_head *msg_list, bool drop)
 
 		msg = list_entry(msg_list->next, struct lnet_msg, msg_list);
 		LASSERT(msg->msg_rxpeer);
+		LASSERT(msg->msg_rxni != NULL);
 
-		ni = msg->msg_rxpeer->lp_ni;
+		ni = msg->msg_rxni;
 		cpt = msg->msg_rx_cpt;
 
 		list_del_init(&msg->msg_list);
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index b76ac3e051d9..ed29124ebded 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -112,7 +112,7 @@ lnet_peer_table_cleanup_locked(struct lnet_ni *ni,
 	for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
 		list_for_each_entry_safe(lp, tmp, &ptable->pt_hash[i],
 					 lp_hashlist) {
-			if (ni && ni != lp->lp_ni)
+			if (ni && ni->ni_net != lp->lp_net)
 				continue;
 			list_del_init(&lp->lp_hashlist);
 			/* Lose hash table's ref */
@@ -154,7 +154,7 @@ lnet_peer_table_del_rtrs_locked(struct lnet_ni *ni,
 	for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
 		list_for_each_entry_safe(lp, tmp, &ptable->pt_hash[i],
 					 lp_hashlist) {
-			if (ni != lp->lp_ni)
+			if (ni->ni_net != lp->lp_net)
 				continue;
 
 			if (!lp->lp_rtr_refcount)
@@ -230,8 +230,7 @@ lnet_destroy_peer_locked(struct lnet_peer *lp)
 	LASSERT(ptable->pt_number > 0);
 	ptable->pt_number--;
 
-	lnet_ni_decref_locked(lp->lp_ni, lp->lp_cpt);
-	lp->lp_ni = NULL;
+	lp->lp_net = NULL;
 
 	list_add(&lp->lp_hashlist, &ptable->pt_deathrow);
 	LASSERT(ptable->pt_zombies > 0);
@@ -336,16 +335,11 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
 		goto out;
 	}
 
-	lp->lp_ni = lnet_net2ni_locked(LNET_NIDNET(nid), cpt2);
-	if (!lp->lp_ni) {
-		rc = -EHOSTUNREACH;
-		goto out;
-	}
-
-	lp->lp_txcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
-	lp->lp_mintxcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
-	lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
-	lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
+	lp->lp_net = lnet_get_net_locked(LNET_NIDNET(!lp->lp_nid));
+	lp->lp_txcredits =
+		lp->lp_mintxcredits = lp->lp_net->net_tunables.lct_peer_tx_credits;
+	lp->lp_rtrcredits =
+		lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_net);
 
 	list_add_tail(&lp->lp_hashlist,
 		      &ptable->pt_hash[lnet_nid2peerhash(nid)]);
@@ -383,7 +377,7 @@ lnet_debug_peer(lnet_nid_t nid)
 
 	CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
 	       libcfs_nid2str(lp->lp_nid), lp->lp_refcount,
-	       aliveness, lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits,
+	       aliveness, lp->lp_net->net_tunables.lct_peer_tx_credits,
 	       lp->lp_rtrcredits, lp->lp_minrtrcredits,
 	       lp->lp_txcredits, lp->lp_mintxcredits, lp->lp_txqnob);
 
@@ -439,7 +433,7 @@ lnet_get_peer_info(__u32 peer_index, __u64 *nid,
 			*nid = lp->lp_nid;
 			*refcount = lp->lp_refcount;
 			*ni_peer_tx_credits =
-				lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
+				lp->lp_net->net_tunables.lct_peer_tx_credits;
 			*peer_tx_credits = lp->lp_txcredits;
 			*peer_rtr_credits = lp->lp_rtrcredits;
 			*peer_min_rtr_credits = lp->lp_mintxcredits;
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 135dfe793b0b..72b8ca2b0fc6 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -55,10 +55,8 @@ module_param(auto_down, int, 0444);
 MODULE_PARM_DESC(auto_down, "Automatically mark peers down on comms error");
 
 int
-lnet_peer_buffer_credits(struct lnet_ni *ni)
+lnet_peer_buffer_credits(struct lnet_net *net)
 {
-	struct lnet_net *net = ni->ni_net;
-
 	/* NI option overrides LNet default */
 	if (net->net_tunables.lct_peer_rtr_credits > 0)
 		return net->net_tunables.lct_peer_rtr_credits;
@@ -373,7 +371,7 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
 		lnet_peer_addref_locked(route->lr_gateway); /* +1 for notify */
 		lnet_add_route_to_rnet(rnet2, route);
 
-		ni = route->lr_gateway->lp_ni;
+		ni = lnet_get_next_ni_locked(route->lr_gateway->lp_net, NULL);
 		lnet_net_unlock(LNET_LOCK_EX);
 
 		/* XXX Assume alive */
@@ -428,8 +426,8 @@ lnet_check_routes(void)
 					continue;
 				}
 
-				if (route->lr_gateway->lp_ni ==
-				    route2->lr_gateway->lp_ni)
+				if (route->lr_gateway->lp_net ==
+				    route2->lr_gateway->lp_net)
 					continue;
 
 				nid1 = route->lr_gateway->lp_nid;
@@ -952,6 +950,7 @@ lnet_ping_router_locked(struct lnet_peer *rtr)
 	struct lnet_rc_data *rcd = NULL;
 	time64_t now = ktime_get_seconds();
 	time64_t secs;
+	struct lnet_ni  *ni;
 
 	lnet_peer_addref_locked(rtr);
 
@@ -960,7 +959,8 @@ lnet_ping_router_locked(struct lnet_peer *rtr)
 		lnet_notify_locked(rtr, 1, 0, now);
 
 	/* Run any outstanding notifications */
-	lnet_ni_notify_locked(rtr->lp_ni, rtr);
+	ni = lnet_get_next_ni_locked(rtr->lp_net, NULL);
+	lnet_ni_notify_locked(ni, rtr);
 
 	if (!lnet_isrouter(rtr) ||
 	    the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) {
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index 2a366e9a8627..52714b898aac 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -489,7 +489,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write,
 			int nrefs = peer->lp_refcount;
 			time64_t lastalive = -1;
 			char *aliveness = "NA";
-			int maxcr = peer->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
+			int maxcr = peer->lp_net->net_tunables.lct_peer_tx_credits;
 			int txcr = peer->lp_txcredits;
 			int mintxcr = peer->lp_mintxcredits;
 			int rtrcr = peer->lp_rtrcredits;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (15 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:25   ` Doug Oucharek
                     ` (2 more replies)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 17/34] lnet: move lnet_shutdown_lndnets down to after first use NeilBrown
                   ` (17 subsequent siblings)
  34 siblings, 3 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This seems to be a more direct way to get the cpt
needed in lnet_mt_of_match().

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    1 +
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    1 +
 drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 255c6c4bbb89..2d2c066a11ba 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -511,6 +511,7 @@ enum lnet_match_flags {
 struct lnet_match_info {
 	__u64			mi_mbits;
 	struct lnet_process_id	mi_id;
+	unsigned int		mi_cpt;
 	unsigned int		mi_opc;
 	unsigned int		mi_portal;
 	unsigned int		mi_rlength;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index b8b15f56a275..b6e81a693fc3 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1303,6 +1303,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
 	info.mi_rlength	= hdr->payload_length;
 	info.mi_roffset	= hdr->msg.put.offset;
 	info.mi_mbits	= hdr->msg.put.match_bits;
+	info.mi_cpt	= msg->msg_rxpeer->lp_cpt;
 
 	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
 	ready_delay = msg->msg_rx_ready_delay;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 4c5737083422..90ce51801726 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -292,7 +292,7 @@ lnet_mt_of_match(struct lnet_match_info *info, struct lnet_msg *msg)
 
 	rotor = ptl->ptl_rotor++; /* get round-robin factor */
 	if (portal_rotor == LNET_PTL_ROTOR_HASH_RT && routed)
-		cpt = lnet_cpt_of_nid(msg->msg_hdr.src_nid);
+		cpt = info->mi_cpt;
 	else
 		cpt = rotor % LNET_CPT_NUMBER;
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (4 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 14/34] lnet: rename lnet_find_net_locked to lnet_find_rnet_locked NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:28   ` Doug Oucharek
  2018-09-11  1:02   ` James Simmons
  2018-09-07  0:49 ` [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message NeilBrown
                   ` (28 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

struct lnet_net now has a list of cpts, which is the union
of the cpts for each lnet_ni.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    6 +
 drivers/staging/lustre/lnet/lnet/config.c          |  164 ++++++++++++++++++++
 2 files changed, 170 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 2d2c066a11ba..22957d142cc0 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -266,6 +266,12 @@ struct lnet_net {
 	 * lnet/include/lnet/nidstr.h */
 	__u32			net_id;
 
+	/* total number of CPTs in the array */
+	__u32			net_ncpts;
+
+	/* cumulative CPTs of all NIs in this net */
+	__u32			*net_cpts;
+
 	/* network tunables */
 	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
 
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index e83bdbec11e3..380a3fb1caba 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -91,11 +91,169 @@ lnet_net_unique(__u32 net, struct list_head *netlist)
 	return true;
 }
 
+static bool
+in_array(__u32 *array, __u32 size, __u32 value)
+{
+	int i;
+
+	for (i = 0; i < size; i++) {
+		if (array[i] == value)
+			return false;
+	}
+
+	return true;
+}
+
+static int
+lnet_net_append_cpts(__u32 *cpts, __u32 ncpts, struct lnet_net *net)
+{
+	__u32 *added_cpts = NULL;
+	int i, j = 0, rc = 0;
+
+	/*
+	 * no need to go futher since a subset of the NIs already exist on
+	 * all CPTs
+	 */
+	if (net->net_ncpts == LNET_CPT_NUMBER)
+		return 0;
+
+	if (cpts == NULL) {
+		/* there is an NI which will exist on all CPTs */
+		if (net->net_cpts != NULL)
+			kvfree(net->net_cpts);
+		net->net_cpts = NULL;
+		net->net_ncpts = LNET_CPT_NUMBER;
+		return 0;
+	}
+
+	if (net->net_cpts == NULL) {
+		net->net_cpts = kmalloc_array(ncpts, sizeof(net->net_cpts),
+					      GFP_KERNEL);
+		if (net->net_cpts == NULL)
+			return -ENOMEM;
+		memcpy(net->net_cpts, cpts, ncpts);
+		return 0;
+	}
+
+	added_cpts = kmalloc_array(LNET_CPT_NUMBER, sizeof(*added_cpts),
+				   GFP_KERNEL);
+	if (added_cpts == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < ncpts; i++) {
+		if (!in_array(net->net_cpts, net->net_ncpts, cpts[i])) {
+			added_cpts[j] = cpts[i];
+			j++;
+		}
+	}
+
+	/* append the new cpts if any to the list of cpts in the net */
+	if (j > 0) {
+		__u32 *array = NULL, *loc;
+		__u32 total_entries = j + net->net_ncpts;
+
+		array = kmalloc_array(total_entries, sizeof(*net->net_cpts),
+				      GFP_KERNEL);
+		if (array == NULL) {
+			rc = -ENOMEM;
+			goto failed;
+		}
+
+		memcpy(array, net->net_cpts,
+		       net->net_ncpts * sizeof(*net->net_cpts));
+		loc = array + net->net_ncpts;
+		memcpy(loc, added_cpts, j * sizeof(*net->net_cpts));
+
+		kfree(net->net_cpts);
+		net->net_ncpts = total_entries;
+		net->net_cpts = array;
+	}
+
+failed:
+	kfree(added_cpts);
+
+	return rc;
+}
+
+static void
+lnet_net_remove_cpts(__u32 *cpts, __u32 ncpts, struct lnet_net *net)
+{
+	struct lnet_ni *ni;
+	int rc;
+
+	/*
+	 * Operation Assumption:
+	 *	This function is called after an NI has been removed from
+	 *	its parent net.
+	 *
+	 * if we're removing an NI which exists on all CPTs then
+	 * we have to check if any of the other NIs on this net also
+	 * exists on all CPTs. If none, then we need to build our Net CPT
+	 * list based on the remaining NIs.
+	 *
+	 * If the NI being removed exist on a subset of the CPTs then we
+	 * alo rebuild the Net CPT list based on the remaining NIs, which
+	 * should resutl in the expected Net CPT list.
+	 */
+
+	/*
+	 * sometimes this function can be called due to some failure
+	 * creating an NI, before any of the cpts are allocated, so check
+	 * for that case and don't do anything
+	 */
+	if (ncpts == 0)
+		return;
+
+	if (ncpts == LNET_CPT_NUMBER) {
+		/*
+		 * first iteration through the NI list in the net to see
+		 * if any of the NIs exist on all the CPTs. If one is
+		 * found then our job is done.
+		 */
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			if (ni->ni_ncpts == LNET_CPT_NUMBER)
+				return;
+		}
+	}
+
+	/*
+	 * Rebuild the Net CPT list again, thereby only including only the
+	 * CPTs which the remaining NIs are associated with.
+	 */
+	if (net->net_cpts != NULL) {
+		kfree(net->net_cpts);
+		net->net_cpts = NULL;
+	}
+
+	list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+		rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts,
+					  net);
+		if (rc != 0) {
+			CERROR("Out of Memory\n");
+			/*
+			 * do our best to keep on going. Delete
+			 * the net cpts and set it to NULL. This
+			 * way we can keep on going but less
+			 * efficiently, since memory accesses might be
+			 * accross CPT lines.
+			 */
+			if (net->net_cpts != NULL) {
+				kfree(net->net_cpts);
+				net->net_cpts = NULL;
+				net->net_ncpts = LNET_CPT_NUMBER;
+			}
+			return;
+		}
+	}
+}
+
 void
 lnet_ni_free(struct lnet_ni *ni)
 {
 	int i;
 
+	lnet_net_remove_cpts(ni->ni_cpts, ni->ni_ncpts, ni->ni_net);
+
 	if (ni->ni_refs)
 		cfs_percpt_free(ni->ni_refs);
 
@@ -128,6 +286,9 @@ lnet_net_free(struct lnet_net *net)
 		lnet_ni_free(ni);
 	}
 
+	if (net->net_cpts != NULL)
+		kfree(net->net_cpts);
+
 	kfree(net);
 }
 
@@ -229,6 +390,9 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 		ni->ni_net_ns = NULL;
 
 	ni->ni_last_alive = ktime_get_real_seconds();
+	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
+	if (rc != 0)
+		goto failed;
 	list_add_tail(&ni->ni_netlist, &net->net_ni_list);
 
 	return ni;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid()
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (12 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 13/34] lnet: reverse order of lnet_startup_lnd{net, ni} NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-10 23:32   ` Doug Oucharek
  2018-09-11  1:03   ` James Simmons
  2018-09-07  0:49 ` [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments NeilBrown
                   ` (20 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

When choosing a cpt to use for a given network (identified by nid),
the choice might depend on a particular interface which has
already been identified - different interfaces can have different
sets of cpts.

So add an 'ni' arg to lnet_cpt_of_nid(). If given, choose a cpt
from the cpts of that interface. If not given, choose one from
the set of all cpts associated with any interface on the network.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    4 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    4 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 -
 .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    4 +-
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   41 ++++++++++++--------
 drivers/staging/lustre/lnet/lnet/lib-move.c        |   12 +++---
 drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 -
 drivers/staging/lustre/lnet/lnet/peer.c            |    4 +-
 drivers/staging/lustre/lnet/lnet/router.c          |    4 +-
 drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 -
 drivers/staging/lustre/lnet/selftest/framework.c   |    3 +
 drivers/staging/lustre/lnet/selftest/selftest.h    |    2 -
 12 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 34509e52bac7..e32dbb854d80 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -395,8 +395,8 @@ lnet_net2rnethash(__u32 net)
 extern struct lnet_lnd the_lolnd;
 extern int avoid_asym_router_failure;
 
-int lnet_cpt_of_nid_locked(lnet_nid_t nid);
-int lnet_cpt_of_nid(lnet_nid_t nid);
+int lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni);
+int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
 struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
 struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
 struct lnet_ni *lnet_net2ni(__u32 net);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index ade566d20c69..958ac9a99045 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -320,7 +320,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer **peerp,
 {
 	struct kib_peer *peer;
 	struct kib_net *net = ni->ni_data;
-	int cpt = lnet_cpt_of_nid(nid);
+	int cpt = lnet_cpt_of_nid(nid, ni);
 	unsigned long flags;
 
 	LASSERT(net);
@@ -643,7 +643,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer *peer, struct rdma_cm_id *cm
 
 	dev = net->ibn_dev;
 
-	cpt = lnet_cpt_of_nid(peer->ibp_nid);
+	cpt = lnet_cpt_of_nid(peer->ibp_nid, peer->ibp_ni);
 	sched = kiblnd_data.kib_scheds[cpt];
 
 	LASSERT(sched->ibs_nthreads > 0);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index c266940cb2ae..e64c14914924 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -119,7 +119,7 @@ kiblnd_get_idle_tx(struct lnet_ni *ni, lnet_nid_t target)
 	struct kib_tx *tx;
 	struct kib_tx_poolset *tps;
 
-	tps = net->ibn_tx_ps[lnet_cpt_of_nid(target)];
+	tps = net->ibn_tx_ps[lnet_cpt_of_nid(target, ni)];
 	node = kiblnd_pool_alloc_node(&tps->tps_poolset);
 	if (!node)
 		return NULL;
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 2036a0ae5917..ba68bcee90bc 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -101,7 +101,7 @@ static int
 ksocknal_create_peer(struct ksock_peer **peerp, struct lnet_ni *ni,
 		     struct lnet_process_id id)
 {
-	int cpt = lnet_cpt_of_nid(id.nid);
+	int cpt = lnet_cpt_of_nid(id.nid, ni);
 	struct ksock_net *net = ni->ni_data;
 	struct ksock_peer *peer;
 
@@ -1099,7 +1099,7 @@ ksocknal_create_conn(struct lnet_ni *ni, struct ksock_route *route,
 	LASSERT(conn->ksnc_proto);
 	LASSERT(peerid.nid != LNET_NID_ANY);
 
-	cpt = lnet_cpt_of_nid(peerid.nid);
+	cpt = lnet_cpt_of_nid(peerid.nid, ni);
 
 	if (active) {
 		ksocknal_peer_addref(peer);
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c21aef32cdde..6e0b8310574d 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -713,31 +713,41 @@ lnet_nid_cpt_hash(lnet_nid_t nid, unsigned int number)
 }
 
 int
-lnet_cpt_of_nid_locked(lnet_nid_t nid)
+lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni)
 {
-	struct lnet_ni *ni;
+	struct lnet_net *net;
 
 	/* must called with hold of lnet_net_lock */
 	if (LNET_CPT_NUMBER == 1)
 		return 0; /* the only one */
 
-	/* take lnet_net_lock(any) would be OK */
-	if (!list_empty(&the_lnet.ln_nis_cpt)) {
-		list_for_each_entry(ni, &the_lnet.ln_nis_cpt, ni_cptlist) {
-			if (LNET_NIDNET(ni->ni_nid) != LNET_NIDNET(nid))
-				continue;
+	/*
+	 * If NI is provided then use the CPT identified in the NI cpt
+	 * list if one exists. If one doesn't exist, then that NI is
+	 * associated with all CPTs and it follows that the net it belongs
+	 * to is implicitly associated with all CPTs, so just hash the nid
+	 * and return that.
+	 */
+	if (ni != NULL) {
+		if (ni->ni_cpts != NULL)
+			return ni->ni_cpts[lnet_nid_cpt_hash(nid,
+							     ni->ni_ncpts)];
+		else
+			return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
+	}
 
-			LASSERT(ni->ni_cpts);
-			return ni->ni_cpts[lnet_nid_cpt_hash
-					   (nid, ni->ni_ncpts)];
-		}
+	/* no NI provided so look at the net */
+	net = lnet_get_net_locked(LNET_NIDNET(nid));
+
+	if (net != NULL && net->net_cpts) {
+		return net->net_cpts[lnet_nid_cpt_hash(nid, net->net_ncpts)];
 	}
 
 	return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
 }
 
 int
-lnet_cpt_of_nid(lnet_nid_t nid)
+lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni)
 {
 	int cpt;
 	int cpt2;
@@ -745,11 +755,10 @@ lnet_cpt_of_nid(lnet_nid_t nid)
 	if (LNET_CPT_NUMBER == 1)
 		return 0; /* the only one */
 
-	if (list_empty(&the_lnet.ln_nis_cpt))
-		return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
-
 	cpt = lnet_net_lock_current();
-	cpt2 = lnet_cpt_of_nid_locked(nid);
+
+	cpt2 = lnet_cpt_of_nid_locked(nid, ni);
+
 	lnet_net_unlock(cpt);
 
 	return cpt2;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index b6e81a693fc3..02cd1a5a466f 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1095,7 +1095,9 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 	msg->msg_sending = 1;
 
 	LASSERT(!msg->msg_tx_committed);
-	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid);
+	local_ni = lnet_net2ni(LNET_NIDNET(dst_nid));
+	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid,
+			      local_ni);
  again:
 	lnet_net_lock(cpt);
 
@@ -1188,7 +1190,7 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 		 * was changed when we release the lock
 		 */
 		if (rtr_nid != lp->lp_nid) {
-			cpt2 = lnet_cpt_of_nid_locked(lp->lp_nid);
+			cpt2 = lp->lp_cpt;
 			if (cpt2 != cpt) {
 				if (src_ni)
 					lnet_ni_decref_locked(src_ni, cpt);
@@ -1677,7 +1679,7 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
 	payload_length = le32_to_cpu(hdr->payload_length);
 
 	for_me = (ni->ni_nid == dest_nid);
-	cpt = lnet_cpt_of_nid(from_nid);
+	cpt = lnet_cpt_of_nid(from_nid, ni);
 
 	switch (type) {
 	case LNET_MSG_ACK:
@@ -2149,7 +2151,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg)
 	lnet_msg_attach_md(msg, getmd, getmd->md_offset, getmd->md_length);
 	lnet_res_unlock(cpt);
 
-	cpt = lnet_cpt_of_nid(peer_id.nid);
+	cpt = lnet_cpt_of_nid(peer_id.nid, ni);
 
 	lnet_net_lock(cpt);
 	lnet_msg_commit(msg, cpt);
@@ -2160,7 +2162,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg)
 	return msg;
 
  drop:
-	cpt = lnet_cpt_of_nid(peer_id.nid);
+	cpt = lnet_cpt_of_nid(peer_id.nid, ni);
 
 	lnet_net_lock(cpt);
 	the_lnet.ln_counters[cpt]->drop_count++;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 90ce51801726..c8d8162cc706 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -220,7 +220,7 @@ lnet_match2mt(struct lnet_portal *ptl, struct lnet_process_id id, __u64 mbits)
 
 	/* if it's a unique portal, return match-table hashed by NID */
 	return lnet_ptl_is_unique(ptl) ?
-	       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid)] : NULL;
+	       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid, NULL)] : NULL;
 }
 
 struct lnet_match_table *
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index ed29124ebded..808ce25f1f00 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -270,7 +270,7 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
 		return -ESHUTDOWN;
 
 	/* cpt can be LNET_LOCK_EX if it's called from router functions */
-	cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid);
+	cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid, NULL);
 
 	ptable = the_lnet.ln_peer_tables[cpt2];
 	lp = lnet_find_peer_locked(ptable, nid);
@@ -362,7 +362,7 @@ lnet_debug_peer(lnet_nid_t nid)
 	int rc;
 	int cpt;
 
-	cpt = lnet_cpt_of_nid(nid);
+	cpt = lnet_cpt_of_nid(nid, NULL);
 	lnet_net_lock(cpt);
 
 	rc = lnet_nid2peer_locked(&lp, nid, cpt);
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 72b8ca2b0fc6..5493d13de6d9 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1207,7 +1207,7 @@ lnet_router_checker(void *arg)
 		version = the_lnet.ln_routers_version;
 
 		list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) {
-			cpt2 = lnet_cpt_of_nid_locked(rtr->lp_nid);
+			cpt2 = rtr->lp_cpt;
 			if (cpt != cpt2) {
 				lnet_net_unlock(cpt);
 				cpt = cpt2;
@@ -1693,7 +1693,7 @@ lnet_notify(struct lnet_ni *ni, lnet_nid_t nid, int alive, time64_t when)
 {
 	struct lnet_peer *lp = NULL;
 	time64_t now = ktime_get_seconds();
-	int cpt = lnet_cpt_of_nid(nid);
+	int cpt = lnet_cpt_of_nid(nid, ni);
 
 	LASSERT(!in_interrupt());
 
diff --git a/drivers/staging/lustre/lnet/selftest/brw_test.c b/drivers/staging/lustre/lnet/selftest/brw_test.c
index f1ee219bc8f3..e372ff3044c8 100644
--- a/drivers/staging/lustre/lnet/selftest/brw_test.c
+++ b/drivers/staging/lustre/lnet/selftest/brw_test.c
@@ -124,7 +124,7 @@ brw_client_init(struct sfw_test_instance *tsi)
 		return -EINVAL;
 
 	list_for_each_entry(tsu, &tsi->tsi_units, tsu_list) {
-		bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid),
+		bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid, NULL),
 				       off, npg, len, opc == LST_BRW_READ);
 		if (!bulk) {
 			brw_client_fini(tsi);
diff --git a/drivers/staging/lustre/lnet/selftest/framework.c b/drivers/staging/lustre/lnet/selftest/framework.c
index 944a2a6598fa..a82efc394659 100644
--- a/drivers/staging/lustre/lnet/selftest/framework.c
+++ b/drivers/staging/lustre/lnet/selftest/framework.c
@@ -1013,7 +1013,8 @@ sfw_run_batch(struct sfw_batch *tsb)
 			tsu->tsu_loop = tsi->tsi_loop;
 			wi = &tsu->tsu_worker;
 			swi_init_workitem(wi, sfw_run_test,
-					  lst_test_wq[lnet_cpt_of_nid(tsu->tsu_dest.nid)]);
+					  lst_test_wq[lnet_cpt_of_nid(tsu->tsu_dest.nid,
+							  NULL)]);
 			swi_schedule_workitem(wi);
 		}
 	}
diff --git a/drivers/staging/lustre/lnet/selftest/selftest.h b/drivers/staging/lustre/lnet/selftest/selftest.h
index 9dbb0a51d430..edf783af90e8 100644
--- a/drivers/staging/lustre/lnet/selftest/selftest.h
+++ b/drivers/staging/lustre/lnet/selftest/selftest.h
@@ -527,7 +527,7 @@ srpc_init_client_rpc(struct srpc_client_rpc *rpc, struct lnet_process_id peer,
 
 	INIT_LIST_HEAD(&rpc->crpc_list);
 	swi_init_workitem(&rpc->crpc_wi, srpc_send_rpc,
-			  lst_test_wq[lnet_cpt_of_nid(peer.nid)]);
+			  lst_test_wq[lnet_cpt_of_nid(peer.nid, NULL)]);
 	spin_lock_init(&rpc->crpc_lock);
 	atomic_set(&rpc->crpc_refcount, 1); /* 1 ref for caller */
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-11 18:31   ` Amir Shehata
  2018-09-12  3:30   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 12/34] lnet: split lnet_startup_lndni NeilBrown
                   ` (33 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

I don't understand parts of this change.
Particularly the removal for
       /* If given some LND tunable parameters, parse those now to
        * override the values in the NI structure. */

isn't clear to me.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   41 ++++++++---------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 6e0b8310574d..53ecfd700db3 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1240,10 +1240,8 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
 }
 
 static int
-lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
+lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 {
-	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
-	struct lnet_lnd_tunables *tun = NULL;
 	int rc = -EINVAL;
 	int lnd_type;
 	struct lnet_lnd *lnd;
@@ -1296,36 +1294,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
 
 	ni->ni_net->net_lnd = lnd;
 
-	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf)) {
-		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
-		tun = &lnd_tunables->lt_tun;
-	}
-
 	if (tun) {
 		memcpy(&ni->ni_lnd_tunables, tun,
 		       sizeof(*tun));
 		ni->ni_lnd_tunables_set = true;
 	}
 
-	/*
-	 * If given some LND tunable parameters, parse those now to
-	 * override the values in the NI structure.
-	 */
-	if (conf) {
-		if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
-			ni->ni_net->net_tunables.lct_peer_rtr_credits =
-				conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
-		if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
-			ni->ni_net->net_tunables.lct_peer_timeout =
-				conf->cfg_config_u.cfg_net.net_peer_timeout;
-		if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
-			ni->ni_net->net_tunables.lct_peer_tx_credits =
-				conf->cfg_config_u.cfg_net.net_peer_tx_credits;
-		if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
-			ni->ni_net->net_tunables.lct_max_tx_credits =
-				conf->cfg_config_u.cfg_net.net_max_tx_credits;
-	}
-
 	rc = lnd->lnd_startup(ni);
 
 	mutex_unlock(&the_lnet.ln_lnd_mutex);
@@ -1861,9 +1835,13 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	struct list_head net_head;
 	struct lnet_remotenet *rnet;
 	int rc;
+	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
 
 	INIT_LIST_HEAD(&net_head);
 
+	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
+		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
+
 	/* Create a net/ni structures for the network string */
 	rc = lnet_parse_networks(&net_head, nets);
 	if (rc <= 0)
@@ -1898,9 +1876,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 		goto failed0;
 
 	list_del_init(&net->net_list);
+	if (lnd_tunables)
+		memcpy(&net->net_tunables,
+		       &lnd_tunables->lt_cmn, sizeof(lnd_tunables->lt_cmn));
+
 	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
-	rc = lnet_startup_lndni(ni, conf);
-	if (rc)
+	rc = lnet_startup_lndni(ni, (lnd_tunables ?
+				     &lnd_tunables->lt_tun : NULL));
+	if (rc < 0)
 		goto failed1;
 
 	if (ni->ni_net->net_lnd->lnd_accept) {

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 12/34] lnet: split lnet_startup_lndni
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
  2018-09-07  0:49 ` [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  3:39   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 16/34] lnet: lnet_shutdown_lndnets - remove some cleanup code NeilBrown
                   ` (32 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Split into
  lnet_startup_lndnet
which starts all nis in a net, and
  lnet_startup_lndni
which starts an individual ni.

lnet_startup_lndni()  returns 0 on success, or -ve error.
lnet_startup_lndnis() returned the count of interfaces started.

The new lnet_startup_lndnet() returns the count of started interfaces,

This requires adding lnet_shutdown_lndnet() to handle errors
in lnet_dyn_add_ni(), which now uses the new lnet_startup_lndnet().

We now drop the ln_lnd_mutex near the end of lnet_startup_lndnet(),
and re-claim it for each lnet_startup_lndni().

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |  142 +++++++++++++++++++++++------
 1 file changed, 111 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 53ecfd700db3..8afddf11b5e2 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1239,32 +1239,61 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
+static void
+lnet_shutdown_lndnet(struct lnet_net *net)
+{
+	struct lnet_ni *ni;
+
+	lnet_net_lock(LNET_LOCK_EX);
+
+	list_del_init(&net->net_list);
+
+	while (!list_empty(&net->net_ni_list)) {
+		ni = list_entry(net->net_ni_list.next,
+				struct lnet_ni, ni_netlist);
+		lnet_net_unlock(LNET_LOCK_EX);
+		lnet_shutdown_lndni(ni);
+		lnet_net_lock(LNET_LOCK_EX);
+	}
+
+	/*
+	 * decrement ref count on lnd only when the entire network goes
+	 * away
+	 */
+	net->net_lnd->lnd_refcount--;
+
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	lnet_net_free(net);
+}
+
 static int
-lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
+lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun);
+
+static int
+lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 {
-	int rc = -EINVAL;
-	int lnd_type;
-	struct lnet_lnd *lnd;
-	struct lnet_tx_queue *tq;
-	int i;
-	u32 seed;
+	struct lnet_ni		*ni;
+	__u32			lnd_type;
+	struct lnet_lnd		*lnd;
+	int rc;
 
-	lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
+	lnd_type = LNET_NETTYP(net->net_id);
 
 	LASSERT(libcfs_isknown_lnd(lnd_type));
 
 	/* Make sure this new NI is unique. */
 	lnet_net_lock(LNET_LOCK_EX);
-	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nets);
+	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
 	lnet_net_unlock(LNET_LOCK_EX);
 	if (!rc) {
 		if (lnd_type == LOLND) {
-			lnet_ni_free(ni);
+			lnet_net_free(net);
 			return 0;
 		}
 
 		CERROR("Net %s is not unique\n",
-		       libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
+		       libcfs_net2str(net->net_id));
 		rc = -EEXIST;
 		goto failed0;
 	}
@@ -1291,8 +1320,32 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 	lnet_net_lock(LNET_LOCK_EX);
 	lnd->lnd_refcount++;
 	lnet_net_unlock(LNET_LOCK_EX);
+	net->net_lnd = lnd;
+	mutex_unlock(&the_lnet.ln_lnd_mutex);
 
-	ni->ni_net->net_lnd = lnd;
+	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
+
+	rc = lnet_startup_lndni(ni, tun);
+	if (rc < 0)
+		return rc;
+	return 1;
+
+failed0:
+	lnet_net_free(net);
+
+	return rc;
+}
+
+static int
+lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
+{
+	int			rc = -EINVAL;
+	struct lnet_tx_queue	*tq;
+	int			i;
+	struct lnet_net		*net = ni->ni_net;
+	u32			seed;
+
+	mutex_lock(&the_lnet.ln_lnd_mutex);
 
 	if (tun) {
 		memcpy(&ni->ni_lnd_tunables, tun,
@@ -1300,15 +1353,15 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 		ni->ni_lnd_tunables_set = true;
 	}
 
-	rc = lnd->lnd_startup(ni);
+	rc = net->net_lnd->lnd_startup(ni);
 
 	mutex_unlock(&the_lnet.ln_lnd_mutex);
 
 	if (rc) {
 		LCONSOLE_ERROR_MSG(0x105, "Error %d starting up LNI %s\n",
-				   rc, libcfs_lnd2str(lnd->lnd_type));
+				   rc, libcfs_lnd2str(net->net_lnd->lnd_type));
 		lnet_net_lock(LNET_LOCK_EX);
-		lnd->lnd_refcount--;
+		net->net_lnd->lnd_refcount--;
 		lnet_net_unlock(LNET_LOCK_EX);
 		goto failed0;
 	}
@@ -1324,7 +1377,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 
 	lnet_net_unlock(LNET_LOCK_EX);
 
-	if (lnd->lnd_type == LOLND) {
+	if (net->net_lnd->lnd_type == LOLND) {
 		lnet_ni_addref(ni);
 		LASSERT(!the_lnet.ln_loni);
 		the_lnet.ln_loni = ni;
@@ -1338,7 +1391,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 	if (!ni->ni_net->net_tunables.lct_peer_tx_credits ||
 	    !ni->ni_net->net_tunables.lct_max_tx_credits) {
 		LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
-				   libcfs_lnd2str(lnd->lnd_type),
+				   libcfs_lnd2str(net->net_lnd->lnd_type),
 				   !ni->ni_net->net_tunables.lct_peer_tx_credits ?
 				   "" : "per-peer ");
 		/*
@@ -1375,21 +1428,22 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 }
 
 static int
-lnet_startup_lndnis(struct list_head *nilist)
+lnet_startup_lndnets(struct list_head *netlist)
 {
-	struct lnet_ni *ni;
+	struct lnet_net *net;
 	int rc;
 	int ni_count = 0;
 
-	while (!list_empty(nilist)) {
-		ni = list_entry(nilist->next, struct lnet_ni, ni_netlist);
-		list_del(&ni->ni_netlist);
-		rc = lnet_startup_lndni(ni, NULL);
+	while (!list_empty(netlist)) {
+		net = list_entry(netlist->next, struct lnet_net, net_list);
+		list_del_init(&net->net_list);
+
+		rc = lnet_startup_lndnet(net, NULL);
 
 		if (rc < 0)
 			goto failed;
 
-		ni_count++;
+		ni_count += rc;
 	}
 
 	return ni_count;
@@ -1552,7 +1606,7 @@ LNetNIInit(lnet_pid_t requested_pid)
 			goto err_empty_list;
 	}
 
-	ni_count = lnet_startup_lndnis(&net_head);
+	ni_count = lnet_startup_lndnets(&net_head);
 	if (ni_count < 0) {
 		rc = ni_count;
 		goto err_empty_list;
@@ -1831,10 +1885,11 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	struct lnet_ping_info *pinfo;
 	struct lnet_handle_md md_handle;
 	struct lnet_net		*net;
-	struct lnet_ni *ni;
 	struct list_head net_head;
 	struct lnet_remotenet *rnet;
 	int rc;
+	int			num_acceptor_nets;
+	__u32			net_type;
 	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
 
 	INIT_LIST_HEAD(&net_head);
@@ -1876,22 +1931,47 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 		goto failed0;
 
 	list_del_init(&net->net_list);
+
 	if (lnd_tunables)
 		memcpy(&net->net_tunables,
 		       &lnd_tunables->lt_cmn, sizeof(lnd_tunables->lt_cmn));
 
-	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
-	rc = lnet_startup_lndni(ni, (lnd_tunables ?
+	/*
+	 * before starting this network get a count of the current TCP
+	 * networks which require the acceptor thread running. If that
+	 * count is == 0 before we start up this network, then we'd want to
+	 * start up the acceptor thread after starting up this network
+	 */
+	num_acceptor_nets = lnet_count_acceptor_nets();
+
+	/*
+	 * lnd_startup_lndnet() can deallocate 'net' even if it it returns
+	 * success, because we endded up adding interfaces to an existing
+	 * network. So grab the net_type now
+	 */
+	net_type = LNET_NETTYP(net->net_id);
+
+	rc = lnet_startup_lndnet(net, (lnd_tunables ?
 				     &lnd_tunables->lt_tun : NULL));
 	if (rc < 0)
 		goto failed1;
 
-	if (ni->ni_net->net_lnd->lnd_accept) {
+	/*
+	 * Start the acceptor thread if this is the first network
+	 * being added that requires the thread.
+	 */
+	if (net_type == SOCKLND && num_acceptor_nets == 0) {
 		rc = lnet_acceptor_start();
 		if (rc < 0) {
-			/* shutdown the ni that we just started */
+			/* shutdown the net that we just started */
 			CERROR("Failed to start up acceptor thread\n");
-			lnet_shutdown_lndni(ni);
+			/*
+			 * Note that if we needed to start the acceptor
+			 * thread, then 'net' must have been the first TCP
+			 * network, therefore was unique, and therefore
+			 * wasn't deallocated by lnet_startup_lndnet()
+			 */
+			lnet_shutdown_lndnet(net);
 			goto failed1;
 		}
 	}

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 13/34] lnet: reverse order of lnet_startup_lnd{net, ni}
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (11 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces" NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  3:39   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid() NeilBrown
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Change the order - no other change.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |  135 ++++++++++++++---------------
 1 file changed, 66 insertions(+), 69 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 8afddf11b5e2..09ea7e506128 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1267,75 +1267,6 @@ lnet_shutdown_lndnet(struct lnet_net *net)
 	lnet_net_free(net);
 }
 
-static int
-lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun);
-
-static int
-lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
-{
-	struct lnet_ni		*ni;
-	__u32			lnd_type;
-	struct lnet_lnd		*lnd;
-	int rc;
-
-	lnd_type = LNET_NETTYP(net->net_id);
-
-	LASSERT(libcfs_isknown_lnd(lnd_type));
-
-	/* Make sure this new NI is unique. */
-	lnet_net_lock(LNET_LOCK_EX);
-	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
-	lnet_net_unlock(LNET_LOCK_EX);
-	if (!rc) {
-		if (lnd_type == LOLND) {
-			lnet_net_free(net);
-			return 0;
-		}
-
-		CERROR("Net %s is not unique\n",
-		       libcfs_net2str(net->net_id));
-		rc = -EEXIST;
-		goto failed0;
-	}
-
-	mutex_lock(&the_lnet.ln_lnd_mutex);
-	lnd = lnet_find_lnd_by_type(lnd_type);
-
-	if (!lnd) {
-		mutex_unlock(&the_lnet.ln_lnd_mutex);
-		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
-		mutex_lock(&the_lnet.ln_lnd_mutex);
-
-		lnd = lnet_find_lnd_by_type(lnd_type);
-		if (!lnd) {
-			mutex_unlock(&the_lnet.ln_lnd_mutex);
-			CERROR("Can't load LND %s, module %s, rc=%d\n",
-			       libcfs_lnd2str(lnd_type),
-			       libcfs_lnd2modname(lnd_type), rc);
-			rc = -EINVAL;
-			goto failed0;
-		}
-	}
-
-	lnet_net_lock(LNET_LOCK_EX);
-	lnd->lnd_refcount++;
-	lnet_net_unlock(LNET_LOCK_EX);
-	net->net_lnd = lnd;
-	mutex_unlock(&the_lnet.ln_lnd_mutex);
-
-	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
-
-	rc = lnet_startup_lndni(ni, tun);
-	if (rc < 0)
-		return rc;
-	return 1;
-
-failed0:
-	lnet_net_free(net);
-
-	return rc;
-}
-
 static int
 lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 {
@@ -1427,6 +1358,72 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 	return rc;
 }
 
+static int
+lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
+{
+	struct lnet_ni		*ni;
+	__u32			lnd_type;
+	struct lnet_lnd		*lnd;
+	int			rc;
+
+	lnd_type = LNET_NETTYP(net->net_id);
+
+	LASSERT(libcfs_isknown_lnd(lnd_type));
+
+	/* Make sure this new NI is unique. */
+	lnet_net_lock(LNET_LOCK_EX);
+	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
+	lnet_net_unlock(LNET_LOCK_EX);
+	if (!rc) {
+		if (lnd_type == LOLND) {
+			lnet_net_free(net);
+			return 0;
+		}
+
+		CERROR("Net %s is not unique\n",
+		       libcfs_net2str(net->net_id));
+		rc = -EEXIST;
+		goto failed0;
+	}
+
+	mutex_lock(&the_lnet.ln_lnd_mutex);
+	lnd = lnet_find_lnd_by_type(lnd_type);
+
+	if (!lnd) {
+		mutex_unlock(&the_lnet.ln_lnd_mutex);
+		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
+		mutex_lock(&the_lnet.ln_lnd_mutex);
+
+		lnd = lnet_find_lnd_by_type(lnd_type);
+		if (!lnd) {
+			mutex_unlock(&the_lnet.ln_lnd_mutex);
+			CERROR("Can't load LND %s, module %s, rc=%d\n",
+			       libcfs_lnd2str(lnd_type),
+			       libcfs_lnd2modname(lnd_type), rc);
+			rc = -EINVAL;
+			goto failed0;
+		}
+	}
+
+	lnet_net_lock(LNET_LOCK_EX);
+	lnd->lnd_refcount++;
+	lnet_net_unlock(LNET_LOCK_EX);
+	net->net_lnd = lnd;
+	mutex_unlock(&the_lnet.ln_lnd_mutex);
+
+	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
+
+	rc = lnet_startup_lndni(ni, tun);
+	if (rc < 0)
+		return rc;
+	return 1;
+
+failed0:
+	lnet_net_free(net);
+
+	return rc;
+}
+
 static int
 lnet_startup_lndnets(struct list_head *netlist)
 {

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 14/34] lnet: rename lnet_find_net_locked to lnet_find_rnet_locked
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (3 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 18/34] lnet: add ni_state NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  3:40   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net NeilBrown
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    2 +-
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    2 +-
 drivers/staging/lustre/lnet/lnet/lib-move.c        |    2 +-
 drivers/staging/lustre/lnet/lnet/router.c          |    4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index e32dbb854d80..faa3f19dd844 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -430,7 +430,7 @@ int lnet_rtrpools_adjust(int tiny, int small, int large);
 int lnet_rtrpools_enable(void);
 void lnet_rtrpools_disable(void);
 void lnet_rtrpools_free(int keep_pools);
-struct lnet_remotenet *lnet_find_net_locked(__u32 net);
+struct lnet_remotenet *lnet_find_rnet_locked(__u32 net);
 int lnet_dyn_add_ni(lnet_pid_t requested_pid,
 		    struct lnet_ioctl_config_data *conf);
 int lnet_dyn_del_ni(__u32 net);
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 09ea7e506128..c3c568e63342 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1909,7 +1909,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	net = list_entry(net_head.next, struct lnet_net, net_list);
 
 	lnet_net_lock(LNET_LOCK_EX);
-	rnet = lnet_find_net_locked(net->net_id);
+	rnet = lnet_find_rnet_locked(net->net_id);
 	lnet_net_unlock(LNET_LOCK_EX);
 	/*
 	 * make sure that the net added doesn't invalidate the current
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 02cd1a5a466f..00a89221c9b3 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1022,7 +1022,7 @@ lnet_find_route_locked(struct lnet_net *net, lnet_nid_t target,
 	 * If @rtr_nid is not LNET_NID_ANY, return the gateway with
 	 * rtr_nid nid, otherwise find the best gateway I can use
 	 */
-	rnet = lnet_find_net_locked(LNET_NIDNET(target));
+	rnet = lnet_find_rnet_locked(LNET_NIDNET(target));
 	if (!rnet)
 		return NULL;
 
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 5493d13de6d9..1fce991fcb0e 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -220,7 +220,7 @@ lnet_rtr_decref_locked(struct lnet_peer *lp)
 }
 
 struct lnet_remotenet *
-lnet_find_net_locked(__u32 net)
+lnet_find_rnet_locked(__u32 net)
 {
 	struct lnet_remotenet *rnet;
 	struct list_head *rn_list;
@@ -347,7 +347,7 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
 
 	LASSERT(!the_lnet.ln_shutdown);
 
-	rnet2 = lnet_find_net_locked(net);
+	rnet2 = lnet_find_rnet_locked(net);
 	if (!rnet2) {
 		/* new network */
 		list_add_tail(&rnet->lrn_list, lnet_net2rnethash(net));

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (7 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  3:53   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net NeilBrown
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

A zombie lnet_ni is now attached to the lnet_net rather than the
global the_lnet.  The zombie lnet_net are attached to the_lnet.

For some reason, we don't drop the refcount on the lnd before shutting
it down now.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    9 ++-
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   65 ++++++++++----------
 drivers/staging/lustre/lnet/lnet/config.c          |    3 +
 3 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 22957d142cc0..1d372672e2de 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -284,6 +284,9 @@ struct lnet_net {
 	struct lnet_lnd		*net_lnd;
 	/* list of NIs on this net */
 	struct list_head	net_ni_list;
+
+	/* dying LND instances */
+	struct list_head	net_ni_zombie;
 };
 
 struct lnet_ni {
@@ -653,11 +656,11 @@ struct lnet {
 	/* LND instances */
 	struct list_head		ln_nets;
 	/* NIs bond on specific CPT(s) */
-	struct list_head		  ln_nis_cpt;
-	/* dying LND instances */
-	struct list_head		  ln_nis_zombie;
+	struct list_head		ln_nis_cpt;
 	/* the loopback NI */
 	struct lnet_ni			*ln_loni;
+	/* network zombie list */
+	struct list_head		ln_net_zombie;
 
 	/* remote networks with routes to them */
 	struct list_head		 *ln_remote_nets_hash;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c3c568e63342..18d111cb826b 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -539,7 +539,6 @@ lnet_prepare(lnet_pid_t requested_pid)
 	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
 	INIT_LIST_HEAD(&the_lnet.ln_nets);
 	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
-	INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
 	INIT_LIST_HEAD(&the_lnet.ln_routers);
 	INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
 	INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
@@ -618,7 +617,6 @@ lnet_unprepare(void)
 	LASSERT(list_empty(&the_lnet.ln_test_peers));
 	LASSERT(list_empty(&the_lnet.ln_nets));
 	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
-	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
 
 	lnet_portals_destroy();
 
@@ -1095,34 +1093,35 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
 
 	/* move it to zombie list and nobody can find it anymore */
 	LASSERT(!list_empty(&ni->ni_netlist));
-	list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
+	list_move(&ni->ni_netlist, &ni->ni_net->net_ni_zombie);
 	lnet_ni_decref_locked(ni, 0);
 }
 
 static void
-lnet_clear_zombies_nis_locked(void)
+lnet_clear_zombies_nis_locked(struct lnet_net *net)
 {
 	int i;
 	int islo;
 	struct lnet_ni *ni;
+	struct list_head *zombie_list = &net->net_ni_zombie;
 
 	/*
-	 * Now wait for the NI's I just nuked to show up on ln_zombie_nis
-	 * and shut them down in guaranteed thread context
+	 * Now wait for the NIs I just nuked to show up on the zombie
+	 * list and shut them down in guaranteed thread context
 	 */
 	i = 2;
-	while (!list_empty(&the_lnet.ln_nis_zombie)) {
+	while (!list_empty(zombie_list)) {
 		int *ref;
 		int j;
 
-		ni = list_entry(the_lnet.ln_nis_zombie.next,
+		ni = list_entry(zombie_list->next,
 				struct lnet_ni, ni_netlist);
 		list_del_init(&ni->ni_netlist);
 		cfs_percpt_for_each(ref, j, ni->ni_refs) {
 			if (!*ref)
 				continue;
 			/* still busy, add it back to zombie list */
-			list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
+			list_add(&ni->ni_netlist, zombie_list);
 			break;
 		}
 
@@ -1138,18 +1137,13 @@ lnet_clear_zombies_nis_locked(void)
 			continue;
 		}
 
-		ni->ni_net->net_lnd->lnd_refcount--;
 		lnet_net_unlock(LNET_LOCK_EX);
 
 		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
 
 		LASSERT(!in_interrupt());
-		ni->ni_net->net_lnd->lnd_shutdown(ni);
+		net->net_lnd->lnd_shutdown(ni);
 
-		/*
-		 * can't deref lnd anymore now; it might have unregistered
-		 * itself...
-		 */
 		if (!islo)
 			CDEBUG(D_LNI, "Removed LNI %s\n",
 			       libcfs_nid2str(ni->ni_nid));
@@ -1162,9 +1156,11 @@ lnet_clear_zombies_nis_locked(void)
 }
 
 static void
-lnet_shutdown_lndnis(void)
+lnet_shutdown_lndnet(struct lnet_net *net);
+
+static void
+lnet_shutdown_lndnets(void)
 {
-	struct lnet_ni *ni;
 	int i;
 	struct lnet_net *net;
 
@@ -1173,30 +1169,35 @@ lnet_shutdown_lndnis(void)
 	/* All quiet on the API front */
 	LASSERT(!the_lnet.ln_shutdown);
 	LASSERT(!the_lnet.ln_refcount);
-	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
 
 	lnet_net_lock(LNET_LOCK_EX);
 	the_lnet.ln_shutdown = 1;	/* flag shutdown */
 
-	/* Unlink NIs from the global table */
 	while (!list_empty(&the_lnet.ln_nets)) {
+		/*
+		 * move the nets to the zombie list to avoid them being
+		 * picked up for new work. LONET is also included in the
+		 * Nets that will be moved to the zombie list
+		 */
 		net = list_entry(the_lnet.ln_nets.next,
 				 struct lnet_net, net_list);
-		while (!list_empty(&net->net_ni_list)) {
-			ni = list_entry(net->net_ni_list.next,
-					struct lnet_ni, ni_netlist);
-			lnet_ni_unlink_locked(ni);
-		}
+		list_move(&net->net_list, &the_lnet.ln_net_zombie);
 	}
 
-	/* Drop the cached loopback NI. */
+	/* Drop the cached loopback Net. */
 	if (the_lnet.ln_loni) {
 		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
 		the_lnet.ln_loni = NULL;
 	}
-
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	/* iterate through the net zombie list and delete each net */
+	while (!list_empty(&the_lnet.ln_net_zombie)) {
+		net = list_entry(the_lnet.ln_net_zombie.next,
+				 struct lnet_net, net_list);
+		lnet_shutdown_lndnet(net);
+	}
+
 	/*
 	 * Clear lazy portals and drop delayed messages which hold refs
 	 * on their lnet_msg::msg_rxpeer
@@ -1211,8 +1212,6 @@ lnet_shutdown_lndnis(void)
 	lnet_peer_tables_cleanup(NULL);
 
 	lnet_net_lock(LNET_LOCK_EX);
-
-	lnet_clear_zombies_nis_locked();
 	the_lnet.ln_shutdown = 0;
 	lnet_net_unlock(LNET_LOCK_EX);
 }
@@ -1222,6 +1221,7 @@ static void
 lnet_shutdown_lndni(struct lnet_ni *ni)
 {
 	int i;
+	struct lnet_net *net = ni->ni_net;
 
 	lnet_net_lock(LNET_LOCK_EX);
 	lnet_ni_unlink_locked(ni);
@@ -1235,7 +1235,7 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
 	lnet_peer_tables_cleanup(ni);
 
 	lnet_net_lock(LNET_LOCK_EX);
-	lnet_clear_zombies_nis_locked();
+	lnet_clear_zombies_nis_locked(net);
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
@@ -1445,7 +1445,7 @@ lnet_startup_lndnets(struct list_head *netlist)
 
 	return ni_count;
 failed:
-	lnet_shutdown_lndnis();
+	lnet_shutdown_lndnets();
 
 	return rc;
 }
@@ -1492,6 +1492,7 @@ int lnet_lib_init(void)
 	the_lnet.ln_refcount = 0;
 	LNetInvalidateEQHandle(&the_lnet.ln_rc_eqh);
 	INIT_LIST_HEAD(&the_lnet.ln_lnds);
+	INIT_LIST_HEAD(&the_lnet.ln_net_zombie);
 	INIT_LIST_HEAD(&the_lnet.ln_rcd_zombie);
 	INIT_LIST_HEAD(&the_lnet.ln_rcd_deathrow);
 
@@ -1656,7 +1657,7 @@ LNetNIInit(lnet_pid_t requested_pid)
 	if (!the_lnet.ln_nis_from_mod_params)
 		lnet_destroy_routes();
 err_shutdown_lndnis:
-	lnet_shutdown_lndnis();
+	lnet_shutdown_lndnets();
 err_empty_list:
 	lnet_unprepare();
 	LASSERT(rc < 0);
@@ -1703,7 +1704,7 @@ LNetNIFini(void)
 
 		lnet_acceptor_stop();
 		lnet_destroy_routes();
-		lnet_shutdown_lndnis();
+		lnet_shutdown_lndnets();
 		lnet_unprepare();
 	}
 
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 380a3fb1caba..2588d67fea1b 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -279,6 +279,8 @@ lnet_net_free(struct lnet_net *net)
 	struct list_head *tmp, *tmp2;
 	struct lnet_ni *ni;
 
+	LASSERT(list_empty(&net->net_ni_zombie));
+
 	/* delete any nis which have been started. */
 	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
 		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
@@ -312,6 +314,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
 
 	INIT_LIST_HEAD(&net->net_list);
 	INIT_LIST_HEAD(&net->net_ni_list);
+	INIT_LIST_HEAD(&net->net_ni_zombie);
 
 	net->net_id = net_id;
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 16/34] lnet: lnet_shutdown_lndnets - remove some cleanup code.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
  2018-09-07  0:49 ` [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf NeilBrown
  2018-09-07  0:49 ` [lustre-devel] [PATCH 12/34] lnet: split lnet_startup_lndni NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-07  0:49 ` [lustre-devel] [PATCH 18/34] lnet: add ni_state NeilBrown
                   ` (31 subsequent siblings)
  34 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

I don't know what this did, or why it is being removed.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   14 --------------
 1 file changed, 14 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 18d111cb826b..2529a11c6c59 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1161,7 +1161,6 @@ lnet_shutdown_lndnet(struct lnet_net *net);
 static void
 lnet_shutdown_lndnets(void)
 {
-	int i;
 	struct lnet_net *net;
 
 	/* NB called holding the global mutex */
@@ -1198,19 +1197,6 @@ lnet_shutdown_lndnets(void)
 		lnet_shutdown_lndnet(net);
 	}
 
-	/*
-	 * Clear lazy portals and drop delayed messages which hold refs
-	 * on their lnet_msg::msg_rxpeer
-	 */
-	for (i = 0; i < the_lnet.ln_nportals; i++)
-		LNetClearLazyPortal(i);
-
-	/*
-	 * Clear the peer table and wait for all peers to go (they hold refs on
-	 * their NIs)
-	 */
-	lnet_peer_tables_cleanup(NULL);
-
 	lnet_net_lock(LNET_LOCK_EX);
 	the_lnet.ln_shutdown = 0;
 	lnet_net_unlock(LNET_LOCK_EX);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 17/34] lnet: move lnet_shutdown_lndnets down to after first use
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (16 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  3:55   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list NeilBrown
                   ` (16 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   91 ++++++++++++++---------------
 1 file changed, 44 insertions(+), 47 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 2529a11c6c59..46c5ca71bc07 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1155,53 +1155,6 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
 	}
 }
 
-static void
-lnet_shutdown_lndnet(struct lnet_net *net);
-
-static void
-lnet_shutdown_lndnets(void)
-{
-	struct lnet_net *net;
-
-	/* NB called holding the global mutex */
-
-	/* All quiet on the API front */
-	LASSERT(!the_lnet.ln_shutdown);
-	LASSERT(!the_lnet.ln_refcount);
-
-	lnet_net_lock(LNET_LOCK_EX);
-	the_lnet.ln_shutdown = 1;	/* flag shutdown */
-
-	while (!list_empty(&the_lnet.ln_nets)) {
-		/*
-		 * move the nets to the zombie list to avoid them being
-		 * picked up for new work. LONET is also included in the
-		 * Nets that will be moved to the zombie list
-		 */
-		net = list_entry(the_lnet.ln_nets.next,
-				 struct lnet_net, net_list);
-		list_move(&net->net_list, &the_lnet.ln_net_zombie);
-	}
-
-	/* Drop the cached loopback Net. */
-	if (the_lnet.ln_loni) {
-		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
-		the_lnet.ln_loni = NULL;
-	}
-	lnet_net_unlock(LNET_LOCK_EX);
-
-	/* iterate through the net zombie list and delete each net */
-	while (!list_empty(&the_lnet.ln_net_zombie)) {
-		net = list_entry(the_lnet.ln_net_zombie.next,
-				 struct lnet_net, net_list);
-		lnet_shutdown_lndnet(net);
-	}
-
-	lnet_net_lock(LNET_LOCK_EX);
-	the_lnet.ln_shutdown = 0;
-	lnet_net_unlock(LNET_LOCK_EX);
-}
-
 /* shutdown down the NI and release refcount */
 static void
 lnet_shutdown_lndni(struct lnet_ni *ni)
@@ -1253,6 +1206,50 @@ lnet_shutdown_lndnet(struct lnet_net *net)
 	lnet_net_free(net);
 }
 
+static void
+lnet_shutdown_lndnets(void)
+{
+	struct lnet_net *net;
+
+	/* NB called holding the global mutex */
+
+	/* All quiet on the API front */
+	LASSERT(!the_lnet.ln_shutdown);
+	LASSERT(!the_lnet.ln_refcount);
+
+	lnet_net_lock(LNET_LOCK_EX);
+	the_lnet.ln_shutdown = 1;	/* flag shutdown */
+
+	while (!list_empty(&the_lnet.ln_nets)) {
+		/*
+		 * move the nets to the zombie list to avoid them being
+		 * picked up for new work. LONET is also included in the
+		 * Nets that will be moved to the zombie list
+		 */
+		net = list_entry(the_lnet.ln_nets.next,
+				 struct lnet_net, net_list);
+		list_move(&net->net_list, &the_lnet.ln_net_zombie);
+	}
+
+	/* Drop the cached loopback Net. */
+	if (the_lnet.ln_loni) {
+		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
+		the_lnet.ln_loni = NULL;
+	}
+	lnet_net_unlock(LNET_LOCK_EX);
+
+	/* iterate through the net zombie list and delete each net */
+	while (!list_empty(&the_lnet.ln_net_zombie)) {
+		net = list_entry(the_lnet.ln_net_zombie.next,
+				 struct lnet_net, net_list);
+		lnet_shutdown_lndnet(net);
+	}
+
+	lnet_net_lock(LNET_LOCK_EX);
+	the_lnet.ln_shutdown = 0;
+	lnet_net_unlock(LNET_LOCK_EX);
+}
+
 static int
 lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 {

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 18/34] lnet: add ni_state
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (2 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 16/34] lnet: lnet_shutdown_lndnets - remove some cleanup code NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  3:59   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 14/34] lnet: rename lnet_find_net_locked to lnet_find_rnet_locked NeilBrown
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This is barely used.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
 .../staging/lustre/include/linux/lnet/lib-types.h  |   16 ++++++++++++++++
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++++++++++
 drivers/staging/lustre/lnet/lnet/config.c          |    1 +
 4 files changed, 34 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index faa3f19dd844..54a93235834c 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -400,6 +400,7 @@ int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
 struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
 struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
 struct lnet_ni *lnet_net2ni(__u32 net);
+bool lnet_is_ni_healthy_locked(struct lnet_ni *ni);
 
 extern int portal_rotor;
 
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 1d372672e2de..6c34ecf22021 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -256,6 +256,19 @@ struct lnet_tx_queue {
 	struct list_head	tq_delayed;	/* delayed TXs */
 };
 
+enum lnet_ni_state {
+	/* set when NI block is allocated */
+	LNET_NI_STATE_INIT = 0,
+	/* set when NI is started successfully */
+	LNET_NI_STATE_ACTIVE,
+	/* set when LND notifies NI failed */
+	LNET_NI_STATE_FAILED,
+	/* set when LND notifies NI degraded */
+	LNET_NI_STATE_DEGRADED,
+	/* set when shuttding down NI */
+	LNET_NI_STATE_DELETING
+};
+
 struct lnet_net {
 	/* chain on the ln_nets */
 	struct list_head	net_list;
@@ -324,6 +337,9 @@ struct lnet_ni {
 	/* my health status */
 	struct lnet_ni_status	*ni_status;
 
+	/* NI FSM */
+	enum lnet_ni_state	ni_state;
+
 	/* per NI LND tunables */
 	struct lnet_lnd_tunables ni_lnd_tunables;
 
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 46c5ca71bc07..618fdf8141f0 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -780,6 +780,16 @@ lnet_islocalnet(__u32 net)
 	return !!ni;
 }
 
+bool
+lnet_is_ni_healthy_locked(struct lnet_ni *ni)
+{
+	if (ni->ni_state == LNET_NI_STATE_ACTIVE ||
+	    ni->ni_state == LNET_NI_STATE_DEGRADED)
+		return true;
+
+	return false;
+}
+
 struct lnet_ni  *
 lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
 {
@@ -1117,6 +1127,9 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
 		ni = list_entry(zombie_list->next,
 				struct lnet_ni, ni_netlist);
 		list_del_init(&ni->ni_netlist);
+		/* the ni should be in deleting state. If it's not it's
+		 * a bug */
+		LASSERT(ni->ni_state == LNET_NI_STATE_DELETING);
 		cfs_percpt_for_each(ref, j, ni->ni_refs) {
 			if (!*ref)
 				continue;
@@ -1163,6 +1176,7 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
 	struct lnet_net *net = ni->ni_net;
 
 	lnet_net_lock(LNET_LOCK_EX);
+	ni->ni_state = LNET_NI_STATE_DELETING;
 	lnet_ni_unlink_locked(ni);
 	lnet_net_unlock(LNET_LOCK_EX);
 
@@ -1291,6 +1305,8 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	ni->ni_state = LNET_NI_STATE_ACTIVE;
+
 	if (net->net_lnd->lnd_type == LOLND) {
 		lnet_ni_addref(ni);
 		LASSERT(!the_lnet.ln_loni);
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 2588d67fea1b..081812e19b13 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -393,6 +393,7 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 		ni->ni_net_ns = NULL;
 
 	ni->ni_last_alive = ktime_get_real_seconds();
+	ni->ni_state = LNET_NI_STATE_INIT;
 	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
 	if (rc != 0)
 		goto failed;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 19/34] lnet: simplify lnet_islocalnet()
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (29 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 31/34] lnet: lnet_dyn_add_ni: fix ping_info count NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:02   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 25/34] lnet: swap 'then' and 'else' branches in lnet_startup_lndnet NeilBrown
                   ` (3 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Having lnet_get_net_locked() makes this (a little) simpler.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 618fdf8141f0..546d5101360f 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -764,20 +764,16 @@ lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni)
 EXPORT_SYMBOL(lnet_cpt_of_nid);
 
 int
-lnet_islocalnet(__u32 net)
+lnet_islocalnet(__u32 net_id)
 {
-	struct lnet_ni *ni;
-	int cpt;
+	struct lnet_net *net;
+	int		cpt;
 
 	cpt = lnet_net_lock_current();
-
-	ni = lnet_net2ni_locked(net, cpt);
-	if (ni)
-		lnet_ni_decref_locked(ni, cpt);
-
+	net = lnet_get_net_locked(net_id);
 	lnet_net_unlock(cpt);
 
-	return !!ni;
+	return !!net;
 }
 
 bool

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (17 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 17/34] lnet: move lnet_shutdown_lndnets down to after first use NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:07   ` Doug Oucharek
  2018-09-12 16:29   ` Amir Shehata
  2018-09-07  0:49 ` [lustre-devel] [PATCH 34/34] lnet: introduce use_tcp_bonding mod param NeilBrown
                   ` (15 subsequent siblings)
  34 siblings, 2 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This isn't used any more.
The new comment is odd - this is no net_ni_cpt !!
The ni_cptlist linkage is no longer used - should it go too?

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    4 +---
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    7 -------
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 6c34ecf22021..dc15fa75a9d2 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -305,7 +305,7 @@ struct lnet_net {
 struct lnet_ni {
 	/* chain on the lnet_net structure */
 	struct list_head	  ni_netlist;
-	/* chain on ln_nis_cpt */
+	/* chain on net_ni_cpt */
 	struct list_head	ni_cptlist;
 
 	spinlock_t		ni_lock;
@@ -671,8 +671,6 @@ struct lnet {
 
 	/* LND instances */
 	struct list_head		ln_nets;
-	/* NIs bond on specific CPT(s) */
-	struct list_head		ln_nis_cpt;
 	/* the loopback NI */
 	struct lnet_ni			*ln_loni;
 	/* network zombie list */
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 546d5101360f..960f235df5e7 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -538,7 +538,6 @@ lnet_prepare(lnet_pid_t requested_pid)
 
 	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
 	INIT_LIST_HEAD(&the_lnet.ln_nets);
-	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
 	INIT_LIST_HEAD(&the_lnet.ln_routers);
 	INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
 	INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
@@ -616,7 +615,6 @@ lnet_unprepare(void)
 	LASSERT(!the_lnet.ln_refcount);
 	LASSERT(list_empty(&the_lnet.ln_test_peers));
 	LASSERT(list_empty(&the_lnet.ln_nets));
-	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
 
 	lnet_portals_destroy();
 
@@ -1294,11 +1292,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 	/* refcount for ln_nis */
 	lnet_ni_addref_locked(ni, 0);
 	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
-	if (ni->ni_cpts) {
-		lnet_ni_addref_locked(ni, 0);
-		list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
-	}
-
 	lnet_net_unlock(LNET_LOCK_EX);
 
 	ni->ni_state = LNET_NI_STATE_ACTIVE;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 21/34] lnet: add net_ni_added
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (32 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 24/34] lnet: don't take lock over lnet_net_unique() NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:15   ` Doug Oucharek
  2018-09-10 23:10 ` [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre James Simmons
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

When we allocate an ni, it is now added to the new net_ni_added
list of unstarted interfaces.
lnet_startup_lndnet() now starts all those added interfaces.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-types.h  |    3 ++
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   39 +++++++++++++++++---
 drivers/staging/lustre/lnet/lnet/config.c          |   13 ++++++-
 3 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index dc15fa75a9d2..1faa247a93b8 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -298,6 +298,9 @@ struct lnet_net {
 	/* list of NIs on this net */
 	struct list_head	net_ni_list;
 
+	/* list of NIs being added, but not started yet */
+	struct list_head	net_ni_added;
+
 	/* dying LND instances */
 	struct list_head	net_ni_zombie;
 };
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 960f235df5e7..ce3dd0f32e12 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1350,12 +1350,15 @@ static int
 lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 {
 	struct lnet_ni		*ni;
+	struct list_head	local_ni_list;
+	int			rc;
+	int			ni_count = 0;
 	__u32			lnd_type;
 	struct lnet_lnd		*lnd;
-	int			rc;
 
 	lnd_type = LNET_NETTYP(net->net_id);
 
+	INIT_LIST_HEAD(&local_ni_list);
 	LASSERT(libcfs_isknown_lnd(lnd_type));
 
 	/* Make sure this new NI is unique. */
@@ -1399,12 +1402,36 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	net->net_lnd = lnd;
 	mutex_unlock(&the_lnet.ln_lnd_mutex);
 
-	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
+	while (!list_empty(&net->net_ni_added)) {
+		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
+				ni_netlist);
+		list_del_init(&ni->ni_netlist);
 
-	rc = lnet_startup_lndni(ni, tun);
-	if (rc < 0)
-		return rc;
-	return 1;
+		rc = lnet_startup_lndni(ni, tun);
+
+		if (rc < 0)
+			goto failed1;
+
+		list_add_tail(&ni->ni_netlist, &local_ni_list);
+
+		ni_count++;
+	}
+	lnet_net_lock(LNET_LOCK_EX);
+	list_splice_tail(&local_ni_list, &net->net_ni_list);
+	lnet_net_unlock(LNET_LOCK_EX);
+	return ni_count;
+
+failed1:
+	/*
+	 * shutdown the new NIs that are being started up
+	 * free the NET being started
+	 */
+	while (!list_empty(&local_ni_list)) {
+		ni = list_entry(local_ni_list.next, struct lnet_ni,
+				ni_netlist);
+
+		lnet_shutdown_lndni(ni);
+	}
 
 failed0:
 	lnet_net_free(net);
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 081812e19b13..f886dcfc6d6e 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -281,6 +281,16 @@ lnet_net_free(struct lnet_net *net)
 
 	LASSERT(list_empty(&net->net_ni_zombie));
 
+	/*
+	 * delete any nis that haven't been added yet. This could happen
+	 * if there is a failure on net startup
+	 */
+	list_for_each_safe(tmp, tmp2, &net->net_ni_added) {
+		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
+		list_del_init(&ni->ni_netlist);
+		lnet_ni_free(ni);
+	}
+
 	/* delete any nis which have been started. */
 	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
 		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
@@ -314,6 +324,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
 
 	INIT_LIST_HEAD(&net->net_list);
 	INIT_LIST_HEAD(&net->net_ni_list);
+	INIT_LIST_HEAD(&net->net_ni_added);
 	INIT_LIST_HEAD(&net->net_ni_zombie);
 
 	net->net_id = net_id;
@@ -397,7 +408,7 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
 	if (rc != 0)
 		goto failed;
-	list_add_tail(&ni->ni_netlist, &net->net_ni_list);
+	list_add_tail(&ni->ni_netlist, &net->net_ni_added);
 
 	return ni;
 failed:

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 22/34] lnet: don't take reference in lnet_XX2ni_locked()
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (24 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 32/34] lnet: lnet_dyn_del_ni: fix ping_info count NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:18   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 29/34] lnet: track tunables in lnet_startup_lndnet() NeilBrown
                   ` (8 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

lnet_net2ni_locked() and lnet_nid2ni_locked() no longer take
a reference - as the lock is held, a ref isn't always needed.

Instead, introduce lnet_nid2ni_addref() which does take the reference
(but doesn't need the lock).
Various places which called lnet_net2ni_locked() or
lnet_nid2ni_locked() no longer need to drop the ref afterwards.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 +
 drivers/staging/lustre/lnet/lnet/acceptor.c        |    2 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   27 +++++++++++++-------
 drivers/staging/lustre/lnet/lnet/lib-move.c        |   17 +------------
 5 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 54a93235834c..6401d9a37b23 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -398,6 +398,7 @@ extern int avoid_asym_router_failure;
 int lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni);
 int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
 struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
+struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid);
 struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
 struct lnet_ni *lnet_net2ni(__u32 net);
 bool lnet_is_ni_healthy_locked(struct lnet_ni *ni);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index e64c14914924..af8f863b6a68 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -2294,7 +2294,7 @@ kiblnd_passive_connect(struct rdma_cm_id *cmid, void *priv, int priv_nob)
 	}
 
 	nid = reqmsg->ibm_srcnid;
-	ni = lnet_net2ni(LNET_NIDNET(reqmsg->ibm_dstnid));
+	ni = lnet_nid2ni_addref(reqmsg->ibm_dstnid);
 
 	if (ni) {
 		net = (struct kib_net *)ni->ni_data;
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index 88b90c1fdbaf..25205f686801 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -296,7 +296,7 @@ lnet_accept(struct socket *sock, __u32 magic)
 	if (flip)
 		__swab64s(&cr.acr_nid);
 
-	ni = lnet_net2ni(LNET_NIDNET(cr.acr_nid));
+	ni = lnet_nid2ni_addref(cr.acr_nid);
 	if (!ni ||	       /* no matching net */
 	    ni->ni_nid != cr.acr_nid) { /* right NET, wrong NID! */
 		if (ni)
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ce3dd0f32e12..42e775e2a669 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -655,7 +655,6 @@ lnet_net2ni_locked(__u32 net_id, int cpt)
 		if (net->net_id == net_id) {
 			ni = list_entry(net->net_ni_list.next, struct lnet_ni,
 					ni_netlist);
-			lnet_ni_addref_locked(ni, cpt);
 			return ni;
 		}
 	}
@@ -794,16 +793,29 @@ lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
 
 	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
 		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
-			if (ni->ni_nid == nid) {
-				lnet_ni_addref_locked(ni, cpt);
+			if (ni->ni_nid == nid)
 				return ni;
-			}
 		}
 	}
 
 	return NULL;
 }
 
+struct lnet_ni *
+lnet_nid2ni_addref(lnet_nid_t nid)
+{
+	struct lnet_ni *ni;
+
+	lnet_net_lock(0);
+	ni = lnet_nid2ni_locked(nid, 0);
+	if (ni)
+		lnet_ni_addref_locked(ni, 0);
+	lnet_net_unlock(0);
+
+	return ni;
+}
+EXPORT_SYMBOL(lnet_nid2ni_addref);
+
 int
 lnet_islocalnid(lnet_nid_t nid)
 {
@@ -812,8 +824,6 @@ lnet_islocalnid(lnet_nid_t nid)
 
 	cpt = lnet_net_lock_current();
 	ni = lnet_nid2ni_locked(nid, cpt);
-	if (ni)
-		lnet_ni_decref_locked(ni, cpt);
 	lnet_net_unlock(cpt);
 
 	return !!ni;
@@ -1412,6 +1422,7 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 		if (rc < 0)
 			goto failed1;
 
+		lnet_ni_addref(ni);
 		list_add_tail(&ni->ni_netlist, &local_ni_list);
 
 		ni_count++;
@@ -2032,9 +2043,6 @@ lnet_dyn_del_ni(__u32 net)
 		goto failed;
 	}
 
-	/* decrement the reference counter taken by lnet_net2ni() */
-	lnet_ni_decref_locked(ni, 0);
-
 	lnet_shutdown_lndni(ni);
 
 	if (!lnet_count_acceptor_nets())
@@ -2264,7 +2272,6 @@ LNetCtl(unsigned int cmd, void *arg)
 		else
 			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
 
-		lnet_ni_decref(ni);
 		return rc;
 	}
 	/* not reached */
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 00a89221c9b3..60f34c4b85d3 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1127,11 +1127,7 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 		if (!src_ni) {
 			src_ni = local_ni;
 			src_nid = src_ni->ni_nid;
-		} else if (src_ni == local_ni) {
-			lnet_ni_decref_locked(local_ni, cpt);
-		} else {
-			lnet_ni_decref_locked(local_ni, cpt);
-			lnet_ni_decref_locked(src_ni, cpt);
+		} else if (src_ni != local_ni) {
 			lnet_net_unlock(cpt);
 			LCONSOLE_WARN("No route to %s via from %s\n",
 				      libcfs_nid2str(dst_nid),
@@ -1149,16 +1145,10 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 			/* No send credit hassles with LOLND */
 			lnet_net_unlock(cpt);
 			lnet_ni_send(src_ni, msg);
-
-			lnet_net_lock(cpt);
-			lnet_ni_decref_locked(src_ni, cpt);
-			lnet_net_unlock(cpt);
 			return 0;
 		}
 
 		rc = lnet_nid2peer_locked(&lp, dst_nid, cpt);
-		/* lp has ref on src_ni; lose mine */
-		lnet_ni_decref_locked(src_ni, cpt);
 		if (rc) {
 			lnet_net_unlock(cpt);
 			LCONSOLE_WARN("Error %d finding peer %s\n", rc,
@@ -1173,8 +1163,6 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 					    src_ni->ni_net : NULL,
 					    dst_nid, rtr_nid);
 		if (!lp) {
-			if (src_ni)
-				lnet_ni_decref_locked(src_ni, cpt);
 			lnet_net_unlock(cpt);
 
 			LCONSOLE_WARN("No route to %s via %s (all routers down)\n",
@@ -1192,8 +1180,6 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 		if (rtr_nid != lp->lp_nid) {
 			cpt2 = lp->lp_cpt;
 			if (cpt2 != cpt) {
-				if (src_ni)
-					lnet_ni_decref_locked(src_ni, cpt);
 				lnet_net_unlock(cpt);
 
 				rtr_nid = lp->lp_nid;
@@ -1212,7 +1198,6 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 			src_nid = src_ni->ni_nid;
 		} else {
 			LASSERT(src_ni->ni_net == lp->lp_net);
-			lnet_ni_decref_locked(src_ni, cpt);
 		}
 
 		lnet_peer_addref_locked(lp);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (26 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 29/34] lnet: track tunables in lnet_startup_lndnet() NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:27   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 26/34] lnet: only valid lnd_type when net_id is unique NeilBrown
                   ` (6 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

ln_shutdown returns -ESHUTDOWN if ln_shutdown
is already set.
The lock is always taken to set ln_shutdown, but apparently
we don't need to hold the lock for this test.
I guess if it is set immediately after the test, and before
we take the lock then.... can anything bad happen?

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/lib-move.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 60f34c4b85d3..46e593fbb44f 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1099,12 +1099,9 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
 	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid,
 			      local_ni);
  again:
-	lnet_net_lock(cpt);
-
-	if (the_lnet.ln_shutdown) {
-		lnet_net_unlock(cpt);
+	if (the_lnet.ln_shutdown)
 		return -ESHUTDOWN;
-	}
+	lnet_net_lock(cpt);
 
 	if (src_nid == LNET_NID_ANY) {
 		src_ni = NULL;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 24/34] lnet: don't take lock over lnet_net_unique()
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (31 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 25/34] lnet: swap 'then' and 'else' branches in lnet_startup_lndnet NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:29   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 21/34] lnet: add net_ni_added NeilBrown
  2018-09-10 23:10 ` [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre James Simmons
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

holding ln_api_mutex is enough to keep the list
stable.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 42e775e2a669..2b5c25a1dc7c 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1372,9 +1372,7 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	LASSERT(libcfs_isknown_lnd(lnd_type));
 
 	/* Make sure this new NI is unique. */
-	lnet_net_lock(LNET_LOCK_EX);
 	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
-	lnet_net_unlock(LNET_LOCK_EX);
 	if (!rc) {
 		if (lnd_type == LOLND) {
 			lnet_net_free(net);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 25/34] lnet: swap 'then' and 'else' branches in lnet_startup_lndnet
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (30 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 19/34] lnet: simplify lnet_islocalnet() NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:32   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 24/34] lnet: don't take lock over lnet_net_unique() NeilBrown
                   ` (2 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This swap makes the diff for the next patch more readable.
We also stop storing the return value from lnet_net_unique()
as it is never used.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   55 +++++++++++++++--------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 2b5c25a1dc7c..ab4d093c04da 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1372,8 +1372,34 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	LASSERT(libcfs_isknown_lnd(lnd_type));
 
 	/* Make sure this new NI is unique. */
-	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
-	if (!rc) {
+	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets)) {
+		mutex_lock(&the_lnet.ln_lnd_mutex);
+		lnd = lnet_find_lnd_by_type(lnd_type);
+
+		if (lnd == NULL) {
+			mutex_unlock(&the_lnet.ln_lnd_mutex);
+			rc = request_module("%s", libcfs_lnd2modname(lnd_type));
+			mutex_lock(&the_lnet.ln_lnd_mutex);
+
+			lnd = lnet_find_lnd_by_type(lnd_type);
+			if (lnd == NULL) {
+				mutex_unlock(&the_lnet.ln_lnd_mutex);
+				CERROR("Can't load LND %s, module %s, rc=%d\n",
+				libcfs_lnd2str(lnd_type),
+				libcfs_lnd2modname(lnd_type), rc);
+				rc = -EINVAL;
+				goto failed0;
+			}
+		}
+
+		lnet_net_lock(LNET_LOCK_EX);
+		lnd->lnd_refcount++;
+		lnet_net_unlock(LNET_LOCK_EX);
+
+		net->net_lnd = lnd;
+
+		mutex_unlock(&the_lnet.ln_lnd_mutex);
+	} else {
 		if (lnd_type == LOLND) {
 			lnet_net_free(net);
 			return 0;
@@ -1385,31 +1411,6 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 		goto failed0;
 	}
 
-	mutex_lock(&the_lnet.ln_lnd_mutex);
-	lnd = lnet_find_lnd_by_type(lnd_type);
-
-	if (!lnd) {
-		mutex_unlock(&the_lnet.ln_lnd_mutex);
-		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
-		mutex_lock(&the_lnet.ln_lnd_mutex);
-
-		lnd = lnet_find_lnd_by_type(lnd_type);
-		if (!lnd) {
-			mutex_unlock(&the_lnet.ln_lnd_mutex);
-			CERROR("Can't load LND %s, module %s, rc=%d\n",
-			       libcfs_lnd2str(lnd_type),
-			       libcfs_lnd2modname(lnd_type), rc);
-			rc = -EINVAL;
-			goto failed0;
-		}
-	}
-
-	lnet_net_lock(LNET_LOCK_EX);
-	lnd->lnd_refcount++;
-	lnet_net_unlock(LNET_LOCK_EX);
-	net->net_lnd = lnd;
-	mutex_unlock(&the_lnet.ln_lnd_mutex);
-
 	while (!list_empty(&net->net_ni_added)) {
 		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
 				ni_netlist);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 26/34] lnet: only valid lnd_type when net_id is unique.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (27 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:34   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 31/34] lnet: lnet_dyn_add_ni: fix ping_info count NeilBrown
                   ` (5 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

If it isn't unique, we won't add it, so no need to validate.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ab4d093c04da..0dfd3004f735 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1366,13 +1366,14 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	__u32			lnd_type;
 	struct lnet_lnd		*lnd;
 
-	lnd_type = LNET_NETTYP(net->net_id);
-
 	INIT_LIST_HEAD(&local_ni_list);
-	LASSERT(libcfs_isknown_lnd(lnd_type));
 
 	/* Make sure this new NI is unique. */
 	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets)) {
+		lnd_type = LNET_NETTYP(net->net_id);
+
+		LASSERT(libcfs_isknown_lnd(lnd_type));
+
 		mutex_lock(&the_lnet.ln_lnd_mutex);
 		lnd = lnet_find_lnd_by_type(lnd_type);
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 27/34] lnet: make it possible to add a new interface to a network
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (20 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 28/34] lnet: add checks to ensure network interface names are unique NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:38   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 30/34] lnet: fix typo NeilBrown
                   ` (12 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

lnet_startup_lndnet() is enhanced to cope if the net already
exists.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    3 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   69 +++++++++++++++-----
 drivers/staging/lustre/lnet/lnet/config.c          |   12 ++-
 3 files changed, 61 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 6401d9a37b23..905213fc16c7 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -630,7 +630,8 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
 int lnet_parse_ip2nets(char **networksp, char *ip2nets);
 int lnet_parse_routes(char *route_str, int *im_a_router);
 int lnet_parse_networks(struct list_head *nilist, char *networks);
-bool lnet_net_unique(__u32 net, struct list_head *nilist);
+bool lnet_net_unique(__u32 net_id, struct list_head *nilist,
+		     struct lnet_net **net);
 
 int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
 struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0dfd3004f735..042ab0d9e318 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1298,14 +1298,9 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
 		goto failed0;
 	}
 
-	lnet_net_lock(LNET_LOCK_EX);
-	/* refcount for ln_nis */
-	lnet_ni_addref_locked(ni, 0);
-	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
-	lnet_net_unlock(LNET_LOCK_EX);
-
 	ni->ni_state = LNET_NI_STATE_ACTIVE;
 
+	/* We keep a reference on the loopback net through the loopback NI */
 	if (net->net_lnd->lnd_type == LOLND) {
 		lnet_ni_addref(ni);
 		LASSERT(!the_lnet.ln_loni);
@@ -1360,6 +1355,7 @@ static int
 lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 {
 	struct lnet_ni		*ni;
+	struct lnet_net		*net_l = NULL;
 	struct list_head	local_ni_list;
 	int			rc;
 	int			ni_count = 0;
@@ -1368,8 +1364,14 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 
 	INIT_LIST_HEAD(&local_ni_list);
 
-	/* Make sure this new NI is unique. */
-	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets)) {
+	/*
+	 * make sure that this net is unique. If it isn't then
+	 * we are adding interfaces to an already existing network, and
+	 * 'net' is just a convenient way to pass in the list.
+	 * if it is unique we need to find the LND and load it if
+	 * necessary.
+	 */
+	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets, &net_l)) {
 		lnd_type = LNET_NETTYP(net->net_id);
 
 		LASSERT(libcfs_isknown_lnd(lnd_type));
@@ -1400,23 +1402,41 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 		net->net_lnd = lnd;
 
 		mutex_unlock(&the_lnet.ln_lnd_mutex);
-	} else {
-		if (lnd_type == LOLND) {
-			lnet_net_free(net);
-			return 0;
-		}
 
-		CERROR("Net %s is not unique\n",
-		       libcfs_net2str(net->net_id));
-		rc = -EEXIST;
-		goto failed0;
+		net_l = net;
 	}
 
+	/*
+	 * net_l: if the network being added is unique then net_l
+	 *        will point to that network
+	 *        if the network being added is not unique then
+	 *        net_l points to the existing network.
+	 *
+	 * When we enter the loop below, we'll pick NIs off he
+	 * network beign added and start them up, then add them to
+	 * a local ni list. Once we've successfully started all
+	 * the NIs then we join the local NI list (of started up
+	 * networks) with the net_l->net_ni_list, which should
+	 * point to the correct network to add the new ni list to
+	 *
+	 * If any of the new NIs fail to start up, then we want to
+	 * iterate through the local ni list, which should include
+	 * any NIs which were successfully started up, and shut
+	 * them down.
+	 *
+	 * After than we want to delete the network being added,
+	 * to avoid a memory leak.
+	 */
+
 	while (!list_empty(&net->net_ni_added)) {
 		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
 				ni_netlist);
 		list_del_init(&ni->ni_netlist);
 
+		/* adjust the pointer the parent network, just in case it
+		 * the net is a duplicate */
+		ni->ni_net = net_l;
+
 		rc = lnet_startup_lndni(ni, tun);
 
 		if (rc < 0)
@@ -1427,9 +1447,22 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 
 		ni_count++;
 	}
+
 	lnet_net_lock(LNET_LOCK_EX);
-	list_splice_tail(&local_ni_list, &net->net_ni_list);
+	list_splice_tail(&local_ni_list, &net_l->net_ni_list);
 	lnet_net_unlock(LNET_LOCK_EX);
+
+	/* if the network is not unique then we don't want to keep
+	 * it around after we're done. Free it. Otherwise add that
+	 * net to the global the_lnet.ln_nets */
+	if (net_l != net && net_l != NULL) {
+		lnet_net_free(net);
+	} else {
+		lnet_net_lock(LNET_LOCK_EX);
+		list_add_tail(&net->net_list, &the_lnet.ln_nets);
+		lnet_net_unlock(LNET_LOCK_EX);
+	}
+
 	return ni_count;
 
 failed1:
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index f886dcfc6d6e..fcae50676422 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -79,13 +79,17 @@ lnet_issep(char c)
 }
 
 bool
-lnet_net_unique(__u32 net, struct list_head *netlist)
+lnet_net_unique(__u32 net_id, struct list_head *netlist,
+		struct lnet_net **net)
 {
-	struct lnet_net	 *net_l;
+	struct lnet_net  *net_l;
 
 	list_for_each_entry(net_l, netlist, net_list) {
-		if (net_l->net_id == net)
+		if (net_l->net_id == net_id) {
+			if (net != NULL)
+				*net = net_l;
 			return false;
+		}
 	}
 
 	return true;
@@ -309,7 +313,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
 {
 	struct lnet_net		*net;
 
-	if (!lnet_net_unique(net_id, net_list)) {
+	if (!lnet_net_unique(net_id, net_list, NULL)) {
 		CERROR("Duplicate net %s. Ignore\n",
 		       libcfs_net2str(net_id));
 		return NULL;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 28/34] lnet: add checks to ensure network interface names are unique.
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (19 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 34/34] lnet: introduce use_tcp_bonding mod param NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:39   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 27/34] lnet: make it possible to add a new interface to a network NeilBrown
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |    8 ++++++
 drivers/staging/lustre/lnet/lnet/config.c          |   25 ++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 905213fc16c7..ef551b571935 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -632,6 +632,7 @@ int lnet_parse_routes(char *route_str, int *im_a_router);
 int lnet_parse_networks(struct list_head *nilist, char *networks);
 bool lnet_net_unique(__u32 net_id, struct list_head *nilist,
 		     struct lnet_net **net);
+bool lnet_ni_unique_net(struct list_head *nilist, char *iface);
 
 int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
 struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 042ab0d9e318..3f6f5ead8a03 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1433,6 +1433,14 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 				ni_netlist);
 		list_del_init(&ni->ni_netlist);
 
+		/* make sure that the the NI we're about to start
+		 * up is actually unique. if it's not fail. */
+		if (!lnet_ni_unique_net(&net_l->net_ni_list,
+					ni->ni_interfaces[0])) {
+			rc = -EINVAL;
+			goto failed1;
+		}
+
 		/* adjust the pointer the parent network, just in case it
 		 * the net is a duplicate */
 		ni->ni_net = net_l;
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index fcae50676422..11d6dbc80507 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -95,6 +95,25 @@ lnet_net_unique(__u32 net_id, struct list_head *netlist,
 	return true;
 }
 
+/* check that the NI is unique within the list of NIs already added to
+ * a network */
+bool
+lnet_ni_unique_net(struct list_head *nilist, char *iface)
+{
+	struct list_head *tmp;
+	struct lnet_ni *ni;
+
+	list_for_each(tmp, nilist) {
+		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
+
+		if (ni->ni_interfaces[0] != NULL &&
+		    strncmp(ni->ni_interfaces[0], iface, strlen(iface)) == 0)
+			return false;
+	}
+
+	return true;
+}
+
 static bool
 in_array(__u32 *array, __u32 size, __u32 value)
 {
@@ -352,6 +371,12 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 	int			rc;
 	int			i;
 
+	if (iface != NULL)
+		/* make sure that this NI is unique in the net it's
+		 * being added to */
+		if (!lnet_ni_unique_net(&net->net_ni_added, iface))
+			return NULL;
+
 	ni = kzalloc(sizeof(*ni), GFP_KERNEL);
 	if (ni == NULL) {
 		CERROR("Out of memory creating network interface %s%s\n",

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 29/34] lnet: track tunables in lnet_startup_lndnet()
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (25 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 22/34] lnet: don't take reference in lnet_XX2ni_locked() NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:47   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown NeilBrown
                   ` (7 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Not really sure what this is yet.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 3f6f5ead8a03..f4efb48c4cf3 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1361,6 +1361,12 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	int			ni_count = 0;
 	__u32			lnd_type;
 	struct lnet_lnd		*lnd;
+	int			peer_timeout =
+		net->net_tunables.lct_peer_timeout;
+	int			maxtxcredits =
+		net->net_tunables.lct_max_tx_credits;
+	int			peerrtrcredits =
+		net->net_tunables.lct_peer_rtr_credits;
 
 	INIT_LIST_HEAD(&local_ni_list);
 
@@ -1447,6 +1453,9 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 
 		rc = lnet_startup_lndni(ni, tun);
 
+		LASSERT(ni->ni_net->net_tunables.lct_peer_timeout <= 0 ||
+			ni->ni_net->net_lnd->lnd_query != NULL);
+
 		if (rc < 0)
 			goto failed1;
 
@@ -1464,8 +1473,23 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	 * it around after we're done. Free it. Otherwise add that
 	 * net to the global the_lnet.ln_nets */
 	if (net_l != net && net_l != NULL) {
+		/*
+		 * TODO - note. currently the tunables can not be updated
+		 * once added
+		 */
 		lnet_net_free(net);
 	} else {
+		/*
+		 * restore tunables after it has been overwitten by the
+		 * lnd
+		 */
+		if (peer_timeout != -1)
+			net->net_tunables.lct_peer_timeout = peer_timeout;
+		if (maxtxcredits != -1)
+			net->net_tunables.lct_max_tx_credits = maxtxcredits;
+		if (peerrtrcredits != -1)
+			net->net_tunables.lct_peer_rtr_credits = peerrtrcredits;
+
 		lnet_net_lock(LNET_LOCK_EX);
 		list_add_tail(&net->net_list, &the_lnet.ln_nets);
 		lnet_net_unlock(LNET_LOCK_EX);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 30/34] lnet: fix typo
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (21 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 27/34] lnet: make it possible to add a new interface to a network NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:47   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 33/34] Completely re-write lnet_parse_networks() NeilBrown
                   ` (11 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

to -> too

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index f4efb48c4cf3..cf0ffb8ac84b 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1868,7 +1868,7 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
 	if (config->cfg_hdr.ioc_len > min_size)
 		tunable_size = config->cfg_hdr.ioc_len - min_size;
 
-	/* Don't copy to much data to user space */
+	/* Don't copy too much data to user space */
 	min_size = min(tunable_size, sizeof(ni->ni_lnd_tunables));
 	lnd_cfg = (struct lnet_ioctl_config_lnd_tunables *)net_config->cfg_bulk;
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 31/34] lnet: lnet_dyn_add_ni: fix ping_info count
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (28 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 26/34] lnet: only valid lnd_type when net_id is unique NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:48   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 19/34] lnet: simplify lnet_islocalnet() NeilBrown
                   ` (4 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

Use the correct count of interfaces when calling
   lnet_ping_info_setup()
in lnet_dyn_add_ni()

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index cf0ffb8ac84b..2ce0a7212dc2 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -871,6 +871,18 @@ lnet_ping_info_create(int num_ni)
 	return ping_info;
 }
 
+static inline int
+lnet_get_net_ni_count_locked(struct lnet_net *net)
+{
+	struct lnet_ni	*ni;
+	int		count = 0;
+
+	list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
+		count++;
+
+	return count;
+}
+
 static inline int
 lnet_get_ni_count(void)
 {
@@ -1977,6 +1989,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 	struct list_head net_head;
 	struct lnet_remotenet *rnet;
 	int rc;
+	int			net_ni_count;
 	int			num_acceptor_nets;
 	__u32			net_type;
 	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
@@ -2014,7 +2027,19 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 		goto failed0;
 	}
 
-	rc = lnet_ping_info_setup(&pinfo, &md_handle, 1 + lnet_get_ni_count(),
+	/*
+	 * make sure you calculate the correct number of slots in the ping
+	 * info. Since the ping info is a flattened list of all the NIs,
+	 * we should allocate enough slots to accomodate the number of NIs
+	 * which will be added.
+	 *
+	 * We can use lnet_get_net_ni_count_locked() since the net is not
+	 * on a public list yet, so locking is not a problem
+	 */
+	net_ni_count = lnet_get_net_ni_count_locked(net);
+
+	rc = lnet_ping_info_setup(&pinfo, &md_handle,
+				  net_ni_count + lnet_get_ni_count(),
 				  false);
 	if (rc)
 		goto failed0;

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 32/34] lnet: lnet_dyn_del_ni: fix ping_info count
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (23 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 33/34] Completely re-write lnet_parse_networks() NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:49   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 22/34] lnet: don't take reference in lnet_XX2ni_locked() NeilBrown
                   ` (9 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

- use correct interface count for lnet_ping_info_setup().
- also rename 'net' to 'net_id' so the name 'net' is free
  to identify the lnet_net.

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/lnet/lnet/api-ni.c |   35 +++++++++++++++++------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 2ce0a7212dc2..ff5149da2d79 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -2109,40 +2109,45 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 }
 
 int
-lnet_dyn_del_ni(__u32 net)
+lnet_dyn_del_ni(__u32 net_id)
 {
-	struct lnet_ni *ni;
+	struct lnet_net *net;
 	struct lnet_ping_info *pinfo;
 	struct lnet_handle_md md_handle;
 	int rc;
+	int		  net_ni_count;
 
 	/* don't allow userspace to shutdown the LOLND */
-	if (LNET_NETTYP(net) == LOLND)
+	if (LNET_NETTYP(net_id) == LOLND)
 		return -EINVAL;
 
 	mutex_lock(&the_lnet.ln_api_mutex);
+
+	lnet_net_lock(0);
+
+	net = lnet_get_net_locked(net_id);
+	if (net == NULL) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	net_ni_count = lnet_get_net_ni_count_locked(net);
+
+	lnet_net_unlock(0);
+
 	/* create and link a new ping info, before removing the old one */
 	rc = lnet_ping_info_setup(&pinfo, &md_handle,
-				  lnet_get_ni_count() - 1, false);
+				  lnet_get_ni_count() - net_ni_count, false);
 	if (rc)
 		goto out;
 
-	ni = lnet_net2ni(net);
-	if (!ni) {
-		rc = -EINVAL;
-		goto failed;
-	}
-
-	lnet_shutdown_lndni(ni);
+	lnet_shutdown_lndnet(net);
 
 	if (!lnet_count_acceptor_nets())
 		lnet_acceptor_stop();
 
 	lnet_ping_target_update(pinfo, md_handle);
-	goto out;
-failed:
-	lnet_ping_md_unlink(pinfo, &md_handle);
-	lnet_ping_info_free(pinfo);
+
 out:
 	mutex_unlock(&the_lnet.ln_api_mutex);
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 33/34] Completely re-write lnet_parse_networks().
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (22 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 30/34] lnet: fix typo NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:54   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 32/34] lnet: lnet_dyn_del_ni: fix ping_info count NeilBrown
                   ` (10 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

From: Amir Shehata <amir.shehata@intel.com>

Was:

LU-7734 lnet: Multi-Rail local NI split

This patch allows the configuration of multiple NIs under one Net.
It is now possible to have multiple NIDs on the same network:
   Ex: <ip1>@tcp, <ip2>@tcp.
This can be configured using the following syntax:
   Ex: tcp(eth0, eth1)

The data structures for the example above can be visualized
as follows

               NET(tcp)
                |
        -----------------
        |               |
      NI(eth0)        NI(eth1)

For more details refer to the Mult-Rail Requirements and HLD
documents

Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Change-Id: Id7c73b9b811a3082b61e53b9e9f95743188cbd51
Reviewed-on: http://review.whamcloud.com/18274
Tested-by: Jenkins
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Olaf Weber <olaf@sgi.com>
---
 drivers/staging/lustre/lnet/lnet/config.c |  341 ++++++++++++++++++-----------
 1 file changed, 217 insertions(+), 124 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 11d6dbc80507..0571fa6a7249 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -48,8 +48,11 @@ static int lnet_tbnob;			/* track text buf allocation */
 #define LNET_MAX_TEXTBUF_NOB     (64 << 10)	/* bound allocation */
 #define LNET_SINGLE_TEXTBUF_NOB  (4 << 10)
 
+#define SPACESTR " \t\v\r\n"
+#define DELIMITERS ":()[]"
+
 static void
-lnet_syntax(char *name, char *str, int offset, int width)
+lnet_syntax(const char *name, const char *str, int offset, int width)
 {
 	static char dots[LNET_SINGLE_TEXTBUF_NOB];
 	static char dashes[LNET_SINGLE_TEXTBUF_NOB];
@@ -363,6 +366,42 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
 	return net;
 }
 
+static int
+lnet_ni_add_interface(struct lnet_ni *ni, char *iface)
+{
+	int niface = 0;
+
+	if (ni == NULL)
+		return -ENOMEM;
+
+	/* Allocate a separate piece of memory and copy
+	 * into it the string, so we don't have
+	 * a depencency on the tokens string.  This way we
+	 * can free the tokens@the end of the function.
+	 * The newly allocated ni_interfaces[] can be
+	 * freed when freeing the NI */
+	while (niface < LNET_MAX_INTERFACES &&
+	       ni->ni_interfaces[niface] != NULL)
+		niface++;
+
+	if (niface >= LNET_MAX_INTERFACES) {
+		LCONSOLE_ERROR_MSG(0x115, "Too many interfaces "
+				   "for net %s\n",
+				   libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
+		return -EINVAL;
+	}
+
+	ni->ni_interfaces[niface] = kstrdup(iface, GFP_KERNEL);
+
+	if (ni->ni_interfaces[niface] == NULL) {
+		CERROR("Can't allocate net interface name\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+/* allocate and add to the provided network */
 struct lnet_ni *
 lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 {
@@ -439,24 +478,33 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 		goto failed;
 	list_add_tail(&ni->ni_netlist, &net->net_ni_added);
 
+	/* if an interface name is provided then make sure to add in that
+	 * interface name in NI */
+	if (iface != NULL)
+		if (lnet_ni_add_interface(ni, iface) != 0)
+			goto failed;
+
 	return ni;
 failed:
 	lnet_ni_free(ni);
 	return NULL;
 }
 
+/*
+ * Parse the networks string and create the matching set of NIs on the
+ * nilist.
+ */
 int
 lnet_parse_networks(struct list_head *netlist, char *networks)
 {
-	struct cfs_expr_list *el = NULL;
+	struct cfs_expr_list *net_el = NULL;
+	struct cfs_expr_list *ni_el = NULL;
 	char *tokens;
 	char *str;
-	char *tmp;
 	struct lnet_net *net;
 	struct lnet_ni *ni = NULL;
 	__u32 net_id;
 	int nnets = 0;
-	struct list_head *temp_node;
 
 	if (!networks) {
 		CERROR("networks string is undefined\n");
@@ -476,84 +524,108 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
 		return -ENOMEM;
 	}
 
-	tmp = tokens;
 	str = tokens;
 
-	while (str && *str) {
-		char *comma = strchr(str, ',');
-		char *bracket = strchr(str, '(');
-		char *square = strchr(str, '[');
-		char *iface;
-		int niface;
+	/*
+	 * Main parser loop.
+	 *
+	 * NB we don't check interface conflicts here; it's the LNDs
+	 * responsibility (if it cares at all)
+	 */
+	do {
+		char *nistr;
+		char *elstr;
+		char *name;
 		int rc;
 
 		/*
-		 * NB we don't check interface conflicts here; it's the LNDs
-		 * responsibility (if it cares at all)
+		 * Parse a network string into its components.
+		 *
+		 * <name>{"("...")"}{"["<el>"]"}
 		 */
-		if (square && (!comma || square < comma)) {
-			/*
-			 * i.e: o2ib0(ib0)[1,2], number between square
-			 * brackets are CPTs this NI needs to be bond
-			 */
-			if (bracket && bracket > square) {
-				tmp = square;
+
+		/* Network name (mandatory)
+		 */
+		while (isspace(*str))
+			*str++ = '\0';
+		if (!*str)
+			break;
+		name = str;
+		str += strcspn(str, SPACESTR ":()[],");
+		while (isspace(*str))
+			*str++ = '\0';
+
+		/* Interface list (optional) */
+		if (*str == '(') {
+			*str++ = '\0';
+			nistr = str;
+			str += strcspn(str, ")");
+			if (*str != ')') {
+				str = nistr;
 				goto failed_syntax;
 			}
+			do {
+				*str++ = '\0';
+			} while (isspace(*str));
+		} else {
+			nistr = NULL;
+		}
 
-			tmp = strchr(square, ']');
-			if (!tmp) {
-				tmp = square;
+		/* CPT expression (optional) */
+		if (*str == '[') {
+			elstr = str;
+			str += strcspn(str, "]");
+			if (*str != ']') {
+				str = elstr;
 				goto failed_syntax;
 			}
-
-			rc = cfs_expr_list_parse(square, tmp - square + 1,
-						 0, LNET_CPT_NUMBER - 1, &el);
+			rc = cfs_expr_list_parse(elstr, str - elstr + 1,
+						0, LNET_CPT_NUMBER - 1,
+						&net_el);
 			if (rc) {
-				tmp = square;
+				str = elstr;
 				goto failed_syntax;
 			}
-
-			while (square <= tmp)
-				*square++ = ' ';
+			*elstr = '\0';
+			do {
+				*str++ = '\0';
+			} while (isspace(*str));
 		}
 
-		if (!bracket || (comma && comma < bracket)) {
-			/* no interface list specified */
+		/* Bad delimiters */
+		if (*str && (strchr(DELIMITERS, *str) != NULL))
+			goto failed_syntax;
 
-			if (comma)
-				*comma++ = 0;
-			net_id = libcfs_str2net(strim(str));
+		/* go to the next net if it exits */
+		str += strcspn(str, ",");
+		if (*str == ',')
+			*str++ = '\0';
 
-			if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
-				LCONSOLE_ERROR_MSG(0x113,
-						   "Unrecognised network type\n");
-				tmp = str;
-				goto failed_syntax;
-			}
-
-			if (LNET_NETTYP(net_id) != LOLND) { /* LO is implicit */
-				net = lnet_net_alloc(net_id, netlist);
-				if (!net ||
-				    !lnet_ni_alloc(net, el, NULL))
-					goto failed;
-			}
+		/*
+		 * At this point the name is properly terminated.
+		 */
+		net_id = libcfs_str2net(name);
+		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
+			LCONSOLE_ERROR_MSG(0x113,
+					"Unrecognised network type\n");
+			str = name;
+			goto failed_syntax;
+		}
 
-			if (el) {
-				cfs_expr_list_free(el);
-				el = NULL;
+		if (LNET_NETTYP(net_id) == LOLND) {
+			/* Loopback is implicit, and there can be only one. */
+			if (net_el) {
+				cfs_expr_list_free(net_el);
+				net_el = NULL;
 			}
-
-			str = comma;
+			/* Should we error out instead? */
 			continue;
 		}
 
-		*bracket = 0;
-		net_id = libcfs_str2net(strim(str));
-		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
-			tmp = str;
-			goto failed_syntax;
-		}
+		/*
+		 * All network paramaters are now known.
+		 */
+		nnets++;
 
 		/* always allocate a net, since we will eventually add an
 		 * interface to it, or we will fail, in which case we'll
@@ -562,88 +634,107 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
 		if (IS_ERR_OR_NULL(net))
 			goto failed;
 
-		ni = lnet_ni_alloc(net, el, NULL);
-		if (IS_ERR_OR_NULL(ni))
-			goto failed;
-
-		if (el) {
-			cfs_expr_list_free(el);
-			el = NULL;
-		}
-
-		niface = 0;
-		iface = bracket + 1;
+		if (!nistr) {
+			/*
+			 * No interface list was specified, allocate a
+			 * ni using the defaults.
+			 */
+			ni = lnet_ni_alloc(net, net_el, NULL);
+			if (IS_ERR_OR_NULL(ni))
+				goto failed;
 
-		bracket = strchr(iface, ')');
-		if (!bracket) {
-			tmp = iface;
-			goto failed_syntax;
+			if (net_el) {
+				cfs_expr_list_free(net_el);
+				net_el = NULL;
+			}
+			continue;
 		}
 
-		*bracket = 0;
 		do {
-			comma = strchr(iface, ',');
-			if (comma)
-				*comma++ = 0;
-
-			iface = strim(iface);
-			if (!*iface) {
-				tmp = iface;
-				goto failed_syntax;
+			elstr = NULL;
+
+			/* Interface name (mandatory) */
+			while (isspace(*nistr))
+				*nistr++ = '\0';
+			name = nistr;
+			nistr += strcspn(nistr, SPACESTR "[],");
+			while (isspace(*nistr))
+				*nistr++ = '\0';
+
+			/* CPT expression (optional) */
+			if (*nistr == '[') {
+				elstr = nistr;
+				nistr += strcspn(nistr, "]");
+				if (*nistr != ']') {
+					str = elstr;
+					goto failed_syntax;
+				}
+				rc = cfs_expr_list_parse(elstr,
+							nistr - elstr + 1,
+							0, LNET_CPT_NUMBER - 1,
+							&ni_el);
+				if (rc != 0) {
+					str = elstr;
+					goto failed_syntax;
+				}
+				*elstr = '\0';
+				do {
+					*nistr++ = '\0';
+				} while (isspace(*nistr));
+			} else {
+				ni_el = net_el;
 			}
 
-			if (niface == LNET_MAX_INTERFACES) {
-				LCONSOLE_ERROR_MSG(0x115,
-						   "Too many interfaces for net %s\n",
-						   libcfs_net2str(net_id));
-				goto failed;
+			/*
+			 * End of single interface specificaton,
+			 * advance to the start of the next one, if
+			 * any.
+			 */
+			if (*nistr == ',') {
+				do {
+					*nistr++ = '\0';
+				} while (isspace(*nistr));
+				if (!*nistr) {
+					str = nistr;
+					goto failed_syntax;
+				}
+			} else if (*nistr) {
+				str = nistr;
+				goto failed_syntax;
 			}
 
 			/*
-			 * Allocate a separate piece of memory and copy
-			 * into it the string, so we don't have
-			 * a depencency on the tokens string.  This way we
-			 * can free the tokens@the end of the function.
-			 * The newly allocated ni_interfaces[] can be
-			 * freed when freeing the NI
+			 * At this point the name
+			 is properly terminated.
 			 */
-			ni->ni_interfaces[niface] = kstrdup(iface, GFP_KERNEL);
-			if (!ni->ni_interfaces[niface]) {
-				CERROR("Can't allocate net interface name\n");
-				goto failed;
-			}
-			niface++;
-			iface = comma;
-		} while (iface);
-
-		str = bracket + 1;
-		comma = strchr(bracket + 1, ',');
-		if (comma) {
-			*comma = 0;
-			str = strim(str);
-			if (*str) {
-				tmp = str;
+			if (!*name) {
+				str = name;
 				goto failed_syntax;
 			}
-			str = comma + 1;
-			continue;
-		}
 
-		str = strim(str);
-		if (*str) {
-			tmp = str;
-			goto failed_syntax;
-		}
-	}
+			ni = lnet_ni_alloc(net, ni_el, name);
+			if (IS_ERR_OR_NULL(ni))
+				goto failed;
 
-	list_for_each(temp_node, netlist)
-		nnets++;
+			if (ni_el) {
+				if (ni_el != net_el) {
+					cfs_expr_list_free(ni_el);
+					ni_el = NULL;
+				}
+			}
+		} while (*nistr);
+
+		if (net_el) {
+			cfs_expr_list_free(net_el);
+			net_el = NULL;
+		}
+	} while (*str);
 
 	kfree(tokens);
 	return nnets;
 
  failed_syntax:
-	lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
+	lnet_syntax("networks", networks, (int)(str - tokens), strlen(str));
  failed:
 	/* free the net list and all the nis on each net */
 	while (!list_empty(netlist)) {
@@ -653,8 +744,10 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
 		lnet_net_free(net);
 	}
 
-	if (el)
-		cfs_expr_list_free(el);
+	if (ni_el && ni_el != net_el)
+		cfs_expr_list_free(ni_el);
+	if (net_el)
+		cfs_expr_list_free(net_el);
 
 	kfree(tokens);
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 34/34] lnet: introduce use_tcp_bonding mod param
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (18 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list NeilBrown
@ 2018-09-07  0:49 ` NeilBrown
  2018-09-12  4:54   ` Doug Oucharek
  2018-09-07  0:49 ` [lustre-devel] [PATCH 28/34] lnet: add checks to ensure network interface names are unique NeilBrown
                   ` (14 subsequent siblings)
  34 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-07  0:49 UTC (permalink / raw)
  To: lustre-devel

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb@suse.com>
---
 .../staging/lustre/include/linux/lnet/lib-lnet.h   |    3 +
 drivers/staging/lustre/lnet/lnet/api-ni.c          |   22 ++++++++-
 drivers/staging/lustre/lnet/lnet/config.c          |   50 ++++++++++++++++----
 3 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index ef551b571935..5ee770cd7a5f 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -629,7 +629,8 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
 
 int lnet_parse_ip2nets(char **networksp, char *ip2nets);
 int lnet_parse_routes(char *route_str, int *im_a_router);
-int lnet_parse_networks(struct list_head *nilist, char *networks);
+int lnet_parse_networks(struct list_head *nilist, char *networks,
+			bool use_tcp_bonding);
 bool lnet_net_unique(__u32 net_id, struct list_head *nilist,
 		     struct lnet_net **net);
 bool lnet_ni_unique_net(struct list_head *nilist, char *iface);
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index ff5149da2d79..8ff386992c99 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -59,6 +59,11 @@ static int rnet_htable_size = LNET_REMOTE_NETS_HASH_DEFAULT;
 module_param(rnet_htable_size, int, 0444);
 MODULE_PARM_DESC(rnet_htable_size, "size of remote network hash table");
 
+static int use_tcp_bonding = false;
+module_param(use_tcp_bonding, int, 0444);
+MODULE_PARM_DESC(use_tcp_bonding,
+		 "Set to 1 to use socklnd bonding. 0 to use Multi-Rail");
+
 static int lnet_ping(struct lnet_process_id id, signed long timeout,
 		     struct lnet_process_id __user *ids, int n_ids);
 
@@ -1446,6 +1451,18 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
 	 * to avoid a memory leak.
 	 */
 
+	/*
+	 * When a network uses TCP bonding then all its interfaces
+	 * must be specified when the network is first defined: the
+	 * TCP bonding code doesn't allow for interfaces to be added
+	 * or removed.
+	 */
+	if (net_l != net && net_l != NULL && use_tcp_bonding &&
+	    LNET_NETTYP(net_l->net_id) == SOCKLND) {
+		rc = -EINVAL;
+		goto failed0;
+	}
+
 	while (!list_empty(&net->net_ni_added)) {
 		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
 				ni_netlist);
@@ -1702,7 +1719,8 @@ LNetNIInit(lnet_pid_t requested_pid)
 	 * routes if it has been loaded
 	 */
 	if (!the_lnet.ln_nis_from_mod_params) {
-		rc = lnet_parse_networks(&net_head, lnet_get_networks());
+		rc = lnet_parse_networks(&net_head, lnet_get_networks(),
+					 use_tcp_bonding);
 		if (rc < 0)
 			goto err_empty_list;
 	}
@@ -2000,7 +2018,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
 
 	/* Create a net/ni structures for the network string */
-	rc = lnet_parse_networks(&net_head, nets);
+	rc = lnet_parse_networks(&net_head, nets, use_tcp_bonding);
 	if (rc <= 0)
 		return !rc ? -EINVAL : rc;
 
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 0571fa6a7249..abfc5d8dc219 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -117,6 +117,21 @@ lnet_ni_unique_net(struct list_head *nilist, char *iface)
 	return true;
 }
 
+/* check that the NI is unique to the interfaces with in the same NI.
+ * This is only a consideration if use_tcp_bonding is set */
+static bool
+lnet_ni_unique_ni(char *iface_list[LNET_MAX_INTERFACES], char *iface)
+{
+	int i;
+	for (i = 0; i < LNET_MAX_INTERFACES; i++) {
+		if (iface_list[i] != NULL &&
+		    strncmp(iface_list[i], iface, strlen(iface)) == 0)
+			return false;
+	}
+
+	return true;
+}
+
 static bool
 in_array(__u32 *array, __u32 size, __u32 value)
 {
@@ -374,6 +389,9 @@ lnet_ni_add_interface(struct lnet_ni *ni, char *iface)
 	if (ni == NULL)
 		return -ENOMEM;
 
+	if (!lnet_ni_unique_ni(ni->ni_interfaces, iface))
+		return -EINVAL;
+
 	/* Allocate a separate piece of memory and copy
 	 * into it the string, so we don't have
 	 * a depencency on the tokens string.  This way we
@@ -495,7 +513,8 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
  * nilist.
  */
 int
-lnet_parse_networks(struct list_head *netlist, char *networks)
+lnet_parse_networks(struct list_head *netlist, char *networks,
+		    bool use_tcp_bonding)
 {
 	struct cfs_expr_list *net_el = NULL;
 	struct cfs_expr_list *ni_el = NULL;
@@ -634,7 +653,8 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
 		if (IS_ERR_OR_NULL(net))
 			goto failed;
 
-		if (!nistr) {
+		if (!nistr ||
+		    (use_tcp_bonding && LNET_NETTYP(net_id) == SOCKLND)) {
 			/*
 			 * No interface list was specified, allocate a
 			 * ni using the defaults.
@@ -643,11 +663,13 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
 			if (IS_ERR_OR_NULL(ni))
 				goto failed;
 
-			if (net_el) {
-				cfs_expr_list_free(net_el);
-				net_el = NULL;
+			if (!nistr) {
+				if (net_el) {
+					cfs_expr_list_free(net_el);
+					net_el = NULL;
+				}
+				continue;
 			}
-			continue;
 		}
 
 		do {
@@ -704,17 +726,23 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
 			}
 
 			/*
-			 * At this point the name
-			 is properly terminated.
+			 * At this point the name is properly terminated.
 			 */
 			if (!*name) {
 				str = name;
 				goto failed_syntax;
 			}
 
-			ni = lnet_ni_alloc(net, ni_el, name);
-			if (IS_ERR_OR_NULL(ni))
-				goto failed;
+			if (use_tcp_bonding &&
+			    LNET_NETTYP(net->net_id) == SOCKLND) {
+				rc = lnet_ni_add_interface(ni, name);
+				if (rc != 0)
+					goto failed;
+			} else {
+				ni = lnet_ni_alloc(net, ni_el, name);
+				if (IS_ERR_OR_NULL(ni))
+					goto failed;
+			}
 
 			if (ni_el) {
 				if (ni_el != net_el) {

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments NeilBrown
@ 2018-09-10 22:49   ` Doug Oucharek
  2018-09-10 23:17   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 22:49 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

This is part of

8cbb8cd3e771e7f7e0f99cafc19fad32770dc015 LU-7734 lnet: Multi-Rail
local NI split
---
.../staging/lustre/include/linux/lnet/lib-types.h  |   38 +++++++++++++++-----
1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 6d4106fd9039..078bc97a9ebf 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -263,18 +263,38 @@ struct lnet_ni {
int  ni_peerrtrcredits;
/* seconds to consider peer dead */
int  ni_peertimeout;
- int  ni_ncpts; /* number of CPTs */
- __u32 *ni_cpts; /* bond NI on some CPTs */
- lnet_nid_t  ni_nid; /* interface's NID */
- void *ni_data; /* instance-specific data */
+ /* number of CPTs */
+ int ni_ncpts;
+
+ /* bond NI on some CPTs */
+ __u32 *ni_cpts;
+
+ /* interface's NID */
+ lnet_nid_t ni_nid;
+
+ /* instance-specific data */
+ void *ni_data;
+
struct lnet_lnd *ni_lnd; /* procedural interface */
- struct lnet_tx_queue **ni_tx_queues; /* percpt TX queues */
- int **ni_refs; /* percpt reference count */
- time64_t  ni_last_alive;/* when I was last alive */
- struct lnet_ni_status *ni_status; /* my health status */
+
+ /* percpt TX queues */
+ struct lnet_tx_queue **ni_tx_queues;
+
+ /* percpt reference count */
+ int **ni_refs;
+
+ /* when I was last alive */
+ time64_t ni_last_alive;
+
+ /* my health status */
+ struct lnet_ni_status *ni_status;
+
/* per NI LND tunables */
struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
- /* equivalent interfaces to use */
+ /*
+ * equivalent interfaces to use
+ * This is an array because socklnd bonding can still be configured
+ */
char *ni_interfaces[LNET_MAX_INTERFACES];
/* original net namespace */
struct net *ni_net_ns;



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/b1a8cc20/attachment.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net
  2018-09-07  0:49 ` [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net NeilBrown
@ 2018-09-10 22:56   ` Doug Oucharek
  2018-09-10 23:23   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 22:56 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

This will contain some fields from lnet_ni, to be shared
between multiple ni on the one network.

For now, only tunables are moved across, using
struct lnet_ioctl_config_lnd_cmn_tunables
which is changed to use signed values so -1 can be stored.
-1 means "no value"
If the tunables haven't been initialised, then net_tunables_set is
false.  Previously a NULL pointer had this meaning.

A 'struct lnet_net' is allocated as part of lnet_ni_alloc(), and freed
by lnet_ni_free().

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-types.h  |   25 ++++++--
.../lustre/include/uapi/linux/lnet/lnet-dlc.h      |    8 +--
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 -
.../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   61 +++++++++++---------
.../staging/lustre/lnet/klnds/socklnd/socklnd.c    |   19 ++++--
drivers/staging/lustre/lnet/lnet/api-ni.c          |   45 +++++++++------
drivers/staging/lustre/lnet/lnet/config.c          |   24 ++++++--
drivers/staging/lustre/lnet/lnet/lib-move.c        |    5 +-
drivers/staging/lustre/lnet/lnet/peer.c            |    9 ++-
drivers/staging/lustre/lnet/lnet/router.c          |    8 ++-
drivers/staging/lustre/lnet/lnet/router_proc.c     |    6 +-
11 files changed, 129 insertions(+), 83 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 078bc97a9ebf..ead8a4e1125a 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -43,6 +43,7 @@

#include <uapi/linux/lnet/lnet-types.h>
#include <uapi/linux/lnet/lnetctl.h>
+#include <uapi/linux/lnet/lnet-dlc.h>

/* Max payload size */
#define LNET_MAX_PAYLOAD      CONFIG_LNET_MAX_PAYLOAD
@@ -252,17 +253,22 @@ struct lnet_tx_queue {
struct list_head tq_delayed; /* delayed TXs */
};

+struct lnet_net {
+ /* network tunables */
+ struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
+
+ /*
+ * boolean to indicate that the tunables have been set and
+ * shouldn't be reset
+ */
+ bool  net_tunables_set;
+};
+
struct lnet_ni {
spinlock_t  ni_lock;
struct list_head  ni_list; /* chain on ln_nis */
struct list_head  ni_cptlist; /* chain on ln_nis_cpt */
- int  ni_maxtxcredits; /* # tx credits  */
- /* # per-peer send credits */
- int  ni_peertxcredits;
- /* # per-peer router buffer credits */
- int  ni_peerrtrcredits;
- /* seconds to consider peer dead */
- int  ni_peertimeout;
+
/* number of CPTs */
int ni_ncpts;

@@ -286,6 +292,9 @@ struct lnet_ni {
/* when I was last alive */
time64_t ni_last_alive;

+ /* pointer to parent network */
+ struct lnet_net *ni_net;
+
/* my health status */
struct lnet_ni_status *ni_status;

@@ -397,7 +406,7 @@ struct lnet_peer_table {
 * lnet_ni::ni_peertimeout has been set to a positive value
 */
#define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing && \
- (lp)->lp_ni->ni_peertimeout > 0)
+ (lp)->lp_ni->ni_net->net_tunables.lct_peer_timeout > 0)

struct lnet_route {
struct list_head lr_list; /* chain on net */
diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
index c1619f411d81..a8eb3b8f9fd7 100644
--- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
+++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
@@ -39,10 +39,10 @@

struct lnet_ioctl_config_lnd_cmn_tunables {
__u32 lct_version;
- __u32 lct_peer_timeout;
- __u32 lct_peer_tx_credits;
- __u32 lct_peer_rtr_credits;
- __u32 lct_max_tx_credits;
+ __s32 lct_peer_timeout;
+ __s32 lct_peer_tx_credits;
+ __s32 lct_peer_rtr_credits;
+ __s32 lct_max_tx_credits;
};

struct lnet_ioctl_config_o2iblnd_tunables {
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index f496e6fcc416..0d17e22c4401 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -337,7 +337,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer **peerp,
peer->ibp_error = 0;
peer->ibp_last_alive = 0;
peer->ibp_max_frags = kiblnd_cfg_rdma_frags(peer->ibp_ni);
- peer->ibp_queue_depth = ni->ni_peertxcredits;
+ peer->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits;
atomic_set(&peer->ibp_refcount, 1);  /* 1 ref for caller */

INIT_LIST_HEAD(&peer->ibp_list);     /* not in the peer table yet */
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
index 39d07926d603..a1aca4dda38f 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
@@ -171,7 +171,7 @@ int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
if (version == IBLND_MSG_VERSION_1)
return IBLND_MSG_QUEUE_SIZE_V1;
else if (ni)
- return ni->ni_peertxcredits;
+ return ni->ni_net->net_tunables.lct_peer_tx_credits;
else
return peer_credits;
}
@@ -179,6 +179,7 @@ int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
int kiblnd_tunables_setup(struct lnet_ni *ni)
{
struct lnet_ioctl_config_o2iblnd_tunables *tunables;
+ struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables;

/*
* if there was no tunables specified, setup the tunables to be
@@ -204,35 +205,39 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
return -EINVAL;
}

- if (!ni->ni_peertimeout)
- ni->ni_peertimeout = peer_timeout;
+ net_tunables = &ni->ni_net->net_tunables;

- if (!ni->ni_maxtxcredits)
- ni->ni_maxtxcredits = credits;
+ if (net_tunables->lct_peer_timeout == -1)
+ net_tunables->lct_peer_timeout = peer_timeout;

- if (!ni->ni_peertxcredits)
- ni->ni_peertxcredits = peer_credits;
+ if (net_tunables->lct_max_tx_credits == -1)
+ net_tunables->lct_max_tx_credits = credits;

- if (!ni->ni_peerrtrcredits)
- ni->ni_peerrtrcredits = peer_buffer_credits;
+ if (net_tunables->lct_peer_tx_credits == -1)
+ net_tunables->lct_peer_tx_credits = peer_credits;

- if (ni->ni_peertxcredits < IBLND_CREDITS_DEFAULT)
- ni->ni_peertxcredits = IBLND_CREDITS_DEFAULT;
+ if (net_tunables->lct_peer_rtr_credits == -1)
+ net_tunables->lct_peer_rtr_credits = peer_buffer_credits;

- if (ni->ni_peertxcredits > IBLND_CREDITS_MAX)
- ni->ni_peertxcredits = IBLND_CREDITS_MAX;
+ if (net_tunables->lct_peer_tx_credits < IBLND_CREDITS_DEFAULT)
+ net_tunables->lct_peer_tx_credits = IBLND_CREDITS_DEFAULT;

- if (ni->ni_peertxcredits > credits)
- ni->ni_peertxcredits = credits;
+ if (net_tunables->lct_peer_tx_credits > IBLND_CREDITS_MAX)
+ net_tunables->lct_peer_tx_credits = IBLND_CREDITS_MAX;
+
+ if (net_tunables->lct_peer_tx_credits >
+    net_tunables->lct_max_tx_credits)
+ net_tunables->lct_peer_tx_credits =
+ net_tunables->lct_max_tx_credits;

if (!tunables->lnd_peercredits_hiw)
tunables->lnd_peercredits_hiw = peer_credits_hiw;

- if (tunables->lnd_peercredits_hiw < ni->ni_peertxcredits / 2)
- tunables->lnd_peercredits_hiw = ni->ni_peertxcredits / 2;
+ if (tunables->lnd_peercredits_hiw < net_tunables->lct_peer_tx_credits / 2)
+ tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits / 2;

- if (tunables->lnd_peercredits_hiw >= ni->ni_peertxcredits)
- tunables->lnd_peercredits_hiw = ni->ni_peertxcredits - 1;
+ if (tunables->lnd_peercredits_hiw >= net_tunables->lct_peer_tx_credits)
+ tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits - 1;

if (tunables->lnd_map_on_demand <= 0 ||
   tunables->lnd_map_on_demand > IBLND_MAX_RDMA_FRAGS) {
@@ -252,21 +257,23 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
if (tunables->lnd_map_on_demand > 0 &&
   tunables->lnd_map_on_demand <= IBLND_MAX_RDMA_FRAGS / 8) {
tunables->lnd_concurrent_sends =
- ni->ni_peertxcredits * 2;
+ net_tunables->lct_peer_tx_credits * 2;
} else {
- tunables->lnd_concurrent_sends = ni->ni_peertxcredits;
+ tunables->lnd_concurrent_sends =
+ net_tunables->lct_peer_tx_credits;
}
}

- if (tunables->lnd_concurrent_sends > ni->ni_peertxcredits * 2)
- tunables->lnd_concurrent_sends = ni->ni_peertxcredits * 2;
+ if (tunables->lnd_concurrent_sends > net_tunables->lct_peer_tx_credits * 2)
+ tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits * 2;

- if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits / 2)
- tunables->lnd_concurrent_sends = ni->ni_peertxcredits / 2;
+ if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits / 2)
+ tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits / 2;

- if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits) {
+ if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits) {
CWARN("Concurrent sends %d is lower than message queue size: %d, performance may drop slightly.\n",
-      tunables->lnd_concurrent_sends, ni->ni_peertxcredits);
+      tunables->lnd_concurrent_sends,
+      net_tunables->lct_peer_tx_credits);
}

if (!tunables->lnd_fmr_pool_size)
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 4dde158451ea..4ad885f10235 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -2739,12 +2739,19 @@ ksocknal_startup(struct lnet_ni *ni)
goto fail_0;

spin_lock_init(&net->ksnn_lock);
- net->ksnn_incarnation = ktime_get_real_ns();
- ni->ni_data = net;
- ni->ni_peertimeout    = *ksocknal_tunables.ksnd_peertimeout;
- ni->ni_maxtxcredits   = *ksocknal_tunables.ksnd_credits;
- ni->ni_peertxcredits  = *ksocknal_tunables.ksnd_peertxcredits;
- ni->ni_peerrtrcredits = *ksocknal_tunables.ksnd_peerrtrcredits;
+        net->ksnn_incarnation = ktime_get_real_ns();
+        ni->ni_data = net;
+ if (!ni->ni_net->net_tunables_set) {
+ ni->ni_net->net_tunables.lct_peer_timeout =
+ *ksocknal_tunables.ksnd_peertimeout;
+ ni->ni_net->net_tunables.lct_max_tx_credits =
+ *ksocknal_tunables.ksnd_credits;
+ ni->ni_net->net_tunables.lct_peer_tx_credits =
+ *ksocknal_tunables.ksnd_peertxcredits;
+ ni->ni_net->net_tunables.lct_peer_rtr_credits =
+ *ksocknal_tunables.ksnd_peerrtrcredits;
+ ni->ni_net->net_tunables_set = true;
+ }

net->ksnn_ninterfaces = 0;
if (!ni->ni_interfaces[0]) {
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index f9fcce2a5643..cd4189fa7acb 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1036,11 +1036,11 @@ lnet_ni_tq_credits(struct lnet_ni *ni)
LASSERT(ni->ni_ncpts >= 1);

if (ni->ni_ncpts == 1)
- return ni->ni_maxtxcredits;
+ return ni->ni_net->net_tunables.lct_max_tx_credits;

- credits = ni->ni_maxtxcredits / ni->ni_ncpts;
- credits = max(credits, 8 * ni->ni_peertxcredits);
- credits = min(credits, ni->ni_maxtxcredits);
+ credits = ni->ni_net->net_tunables.lct_max_tx_credits / ni->ni_ncpts;
+ credits = max(credits, 8 * ni->ni_net->net_tunables.lct_peer_tx_credits);
+ credits = min(credits, ni->ni_net->net_tunables.lct_max_tx_credits);

return credits;
}
@@ -1271,16 +1271,16 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
*/
if (conf) {
if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
- ni->ni_peerrtrcredits =
+ ni->ni_net->net_tunables.lct_peer_rtr_credits =
conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
- ni->ni_peertimeout =
+ ni->ni_net->net_tunables.lct_peer_timeout =
conf->cfg_config_u.cfg_net.net_peer_timeout;
if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
- ni->ni_peertxcredits =
+ ni->ni_net->net_tunables.lct_peer_tx_credits =
conf->cfg_config_u.cfg_net.net_peer_tx_credits;
if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
- ni->ni_maxtxcredits =
+ ni->ni_net->net_tunables.lct_max_tx_credits =
conf->cfg_config_u.cfg_net.net_max_tx_credits;
}

@@ -1297,8 +1297,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
goto failed0;
}

- LASSERT(ni->ni_peertimeout <= 0 || lnd->lnd_query);
-
lnet_net_lock(LNET_LOCK_EX);
/* refcount for ln_nis */
lnet_ni_addref_locked(ni, 0);
@@ -1314,13 +1312,18 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
lnet_ni_addref(ni);
LASSERT(!the_lnet.ln_loni);
the_lnet.ln_loni = ni;
+ ni->ni_net->net_tunables.lct_peer_tx_credits = 0;
+ ni->ni_net->net_tunables.lct_peer_rtr_credits = 0;
+ ni->ni_net->net_tunables.lct_max_tx_credits = 0;
+ ni->ni_net->net_tunables.lct_peer_timeout = 0;
return 0;
}

- if (!ni->ni_peertxcredits || !ni->ni_maxtxcredits) {
+ if (!ni->ni_net->net_tunables.lct_peer_tx_credits ||
+    !ni->ni_net->net_tunables.lct_max_tx_credits) {
LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
  libcfs_lnd2str(lnd->lnd_type),
-   !ni->ni_peertxcredits ?
+   !ni->ni_net->net_tunables.lct_peer_tx_credits ?
  "" : "per-peer ");
/*
* shutdown the NI since if we get here then it must've already
@@ -1343,9 +1346,11 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
add_device_randomness(&seed, sizeof(seed));

CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
-       libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
+       libcfs_nid2str(ni->ni_nid),
+ ni->ni_net->net_tunables.lct_peer_tx_credits,
      lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
-       ni->ni_peerrtrcredits, ni->ni_peertimeout);
+       ni->ni_net->net_tunables.lct_peer_rtr_credits,
+ ni->ni_net->net_tunables.lct_peer_timeout);

return 0;
failed0:
@@ -1667,10 +1672,14 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
}

config->cfg_nid = ni->ni_nid;
- config->cfg_config_u.cfg_net.net_peer_timeout = ni->ni_peertimeout;
- config->cfg_config_u.cfg_net.net_max_tx_credits = ni->ni_maxtxcredits;
- config->cfg_config_u.cfg_net.net_peer_tx_credits = ni->ni_peertxcredits;
- config->cfg_config_u.cfg_net.net_peer_rtr_credits = ni->ni_peerrtrcredits;
+ config->cfg_config_u.cfg_net.net_peer_timeout =
+ ni->ni_net->net_tunables.lct_peer_timeout;
+ config->cfg_config_u.cfg_net.net_max_tx_credits =
+ ni->ni_net->net_tunables.lct_max_tx_credits;
+ config->cfg_config_u.cfg_net.net_peer_tx_credits =
+ ni->ni_net->net_tunables.lct_peer_tx_credits;
+ config->cfg_config_u.cfg_net.net_peer_rtr_credits =
+ ni->ni_net->net_tunables.lct_peer_rtr_credits;

net_config->ni_status = ni->ni_status->ns_status;

diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 091c4f714e84..86a53854e427 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -114,29 +114,38 @@ lnet_ni_free(struct lnet_ni *ni)
if (ni->ni_net_ns)
put_net(ni->ni_net_ns);

+ kvfree(ni->ni_net);
kfree(ni);
}

struct lnet_ni *
-lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
+lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
{
struct lnet_tx_queue *tq;
struct lnet_ni *ni;
int rc;
int i;
+ struct lnet_net *net;

- if (!lnet_net_unique(net, nilist)) {
+ if (!lnet_net_unique(net_id, nilist)) {
LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
-   libcfs_net2str(net));
+   libcfs_net2str(net_id));
return NULL;
}

ni = kzalloc(sizeof(*ni), GFP_NOFS);
- if (!ni) {
+ net = kzalloc(sizeof(*net), GFP_NOFS);
+ if (!ni || !net) {
+ kfree(ni); kfree(net);
CERROR("Out of memory creating network %s\n",
-       libcfs_net2str(net));
+       libcfs_net2str(net_id));
return NULL;
}
+ /* initialize global paramters to undefiend */
+ net->net_tunables.lct_peer_timeout = -1;
+ net->net_tunables.lct_max_tx_credits = -1;
+ net->net_tunables.lct_peer_tx_credits = -1;
+ net->net_tunables.lct_peer_rtr_credits = -1;

spin_lock_init(&ni->ni_lock);
INIT_LIST_HEAD(&ni->ni_cptlist);
@@ -160,7 +169,7 @@ lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
if (rc <= 0) {
CERROR("Failed to set CPTs for NI %s: %d\n",
-       libcfs_net2str(net), rc);
+       libcfs_net2str(net_id), rc);
goto failed;
}

@@ -173,8 +182,9 @@ lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
ni->ni_ncpts = rc;
}

+ ni->ni_net = net;
/* LND will fill in the address part of the NID */
- ni->ni_nid = LNET_MKNID(net, 0);
+ ni->ni_nid = LNET_MKNID(net_id, 0);

/* Store net namespace in which current ni is being created */
if (current->nsproxy->net_ns)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index edcafac055ed..f186e6a16d34 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -524,7 +524,8 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
   lp->lp_timestamp >= lp->lp_last_alive)
return 0;

- deadline = lp->lp_last_alive + lp->lp_ni->ni_peertimeout;
+ deadline = lp->lp_last_alive +
+ lp->lp_ni->ni_net->net_tunables.lct_peer_timeout;
alive = deadline > now;

/* Update obsolete lp_alive except for routers assumed to be dead
@@ -569,7 +570,7 @@ lnet_peer_alive_locked(struct lnet_peer *lp)
     libcfs_nid2str(lp->lp_nid),
     now, next_query,
     lnet_queryinterval,
-      lp->lp_ni->ni_peertimeout);
+      lp->lp_ni->ni_net->net_tunables.lct_peer_timeout);
return 0;
}
}
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index d9452c322e4d..b76ac3e051d9 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -342,8 +342,8 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
goto out;
}

- lp->lp_txcredits = lp->lp_ni->ni_peertxcredits;
- lp->lp_mintxcredits = lp->lp_ni->ni_peertxcredits;
+ lp->lp_txcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
+ lp->lp_mintxcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);

@@ -383,7 +383,7 @@ lnet_debug_peer(lnet_nid_t nid)

CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
      libcfs_nid2str(lp->lp_nid), lp->lp_refcount,
-       aliveness, lp->lp_ni->ni_peertxcredits,
+       aliveness, lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits,
      lp->lp_rtrcredits, lp->lp_minrtrcredits,
      lp->lp_txcredits, lp->lp_mintxcredits, lp->lp_txqnob);

@@ -438,7 +438,8 @@ lnet_get_peer_info(__u32 peer_index, __u64 *nid,

*nid = lp->lp_nid;
*refcount = lp->lp_refcount;
- *ni_peer_tx_credits = lp->lp_ni->ni_peertxcredits;
+ *ni_peer_tx_credits =
+ lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
*peer_tx_credits = lp->lp_txcredits;
*peer_rtr_credits = lp->lp_rtrcredits;
*peer_min_rtr_credits = lp->lp_mintxcredits;
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 02241fbc9eaa..7d61c5d71426 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -57,9 +57,11 @@ MODULE_PARM_DESC(auto_down, "Automatically mark peers down on comms error");
int
lnet_peer_buffer_credits(struct lnet_ni *ni)
{
+ struct lnet_net *net = ni->ni_net;
+
/* NI option overrides LNet default */
- if (ni->ni_peerrtrcredits > 0)
- return ni->ni_peerrtrcredits;
+ if (net->net_tunables.lct_peer_rtr_credits > 0)
+ return net->net_tunables.lct_peer_rtr_credits;
if (peer_buffer_credits > 0)
return peer_buffer_credits;

@@ -67,7 +69,7 @@ lnet_peer_buffer_credits(struct lnet_ni *ni)
* As an approximation, allow this peer the same number of router
* buffers as it is allowed outstanding sends
*/
- return ni->ni_peertxcredits;
+ return net->net_tunables.lct_peer_tx_credits;
}

/* forward ref's */
diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index 31f4982f7f17..19cea7076057 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -489,7 +489,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write,
int nrefs = peer->lp_refcount;
time64_t lastalive = -1;
char *aliveness = "NA";
- int maxcr = peer->lp_ni->ni_peertxcredits;
+ int maxcr = peer->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
int txcr = peer->lp_txcredits;
int mintxcr = peer->lp_mintxcredits;
int rtrcr = peer->lp_rtrcredits;
@@ -704,8 +704,8 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
     "%-24s %6s %5lld %4d %4d %4d %5d %5d %5d\n",
     libcfs_nid2str(ni->ni_nid), stat,
     last_alive, *ni->ni_refs[i],
-      ni->ni_peertxcredits,
-      ni->ni_peerrtrcredits,
+      ni->ni_net->net_tunables.lct_peer_tx_credits,
+      ni->ni_net->net_tunables.lct_peer_rtr_credits,
     tq->tq_credits_max,
     tq->tq_credits,
     tq->tq_credits_min);



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/a915d30b/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-07  0:49 ` [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net NeilBrown
@ 2018-09-10 23:04   ` Doug Oucharek
  2018-09-10 23:19     ` James Simmons
  2018-09-10 23:19     ` James Simmons
  2018-09-10 23:24   ` James Simmons
  2018-09-10 23:25   ` James Simmons
  2 siblings, 2 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:04 UTC (permalink / raw)
  To: lustre-devel

Is the gnilnd module being built upstream?  Just as there were changes to o2iblnd.c and socklnd.c for this change, there should be a corresponding change to gnilnd.c.

Doug

> On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb@suse.com> wrote:
> 
> Also make some other minor changes to the structures.
> 
> This is part of
>    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>       LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
> .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
> drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
> drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
> drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
> drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
> drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
> drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
> 9 files changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index ead8a4e1125a..e170eb07a5bf 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -262,12 +262,17 @@ struct lnet_net {
> 	 * shouldn't be reset
> 	 */
> 	bool			  net_tunables_set;
> +	/* procedural interface */
> +	struct lnet_lnd		*net_lnd;
> };
> 
> struct lnet_ni {
> -	spinlock_t		  ni_lock;
> -	struct list_head	  ni_list;	/* chain on ln_nis */
> -	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
> +	/* chain on ln_nis */
> +	struct list_head	  ni_list;
> +	/* chain on ln_nis_cpt */
> +	struct list_head	ni_cptlist;
> +
> +	spinlock_t		ni_lock;
> 
> 	/* number of CPTs */
> 	int			ni_ncpts;
> @@ -281,8 +286,6 @@ struct lnet_ni {
> 	/* instance-specific data */
> 	void			*ni_data;
> 
> -	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
> -
> 	/* percpt TX queues */
> 	struct lnet_tx_queue	**ni_tx_queues;
> 
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index 0d17e22c4401..5e1592b398c1 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
> 	int rc;
> 	int newdev;
> 
> -	LASSERT(ni->ni_lnd == &the_o2iblnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
> 
> 	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
> 		rc = kiblnd_base_startup();
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> index 4ad885f10235..2036a0ae5917 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> @@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
> 	int rc;
> 	int i;
> 
> -	LASSERT(ni->ni_lnd == &the_ksocklnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
> 
> 	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
> 		rc = ksocknal_base_startup();
> diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> index 3ae3ca1311a1..f8c921f0221c 100644
> --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> @@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
> 		return -EPERM;
> 	}
> 
> -	if (!ni->ni_lnd->lnd_accept) {
> +	if (!ni->ni_net->net_lnd->lnd_accept) {
> 		/* This catches a request for the loopback LND */
> 		lnet_ni_decref(ni);
> 		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
> @@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
> 	CDEBUG(D_NET, "Accept %s from %pI4h\n",
> 	       libcfs_nid2str(cr.acr_nid), &peer_ip);
> 
> -	rc = ni->ni_lnd->lnd_accept(ni, sock);
> +	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
> 
> 	lnet_ni_decref(ni);
> 	return rc;
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index cd4189fa7acb..0896e75bc3d7 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
> 
> 	cpt = lnet_net_lock_current();
> 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_lnd->lnd_accept)
> +		if (ni->ni_net->net_lnd->lnd_accept)
> 			count++;
> 	}
> 
> @@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
> 			continue;
> 		}
> 
> -		ni->ni_lnd->lnd_refcount--;
> +		ni->ni_net->net_lnd->lnd_refcount--;
> 		lnet_net_unlock(LNET_LOCK_EX);
> 
> -		islo = ni->ni_lnd->lnd_type == LOLND;
> +		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
> 
> 		LASSERT(!in_interrupt());
> -		ni->ni_lnd->lnd_shutdown(ni);
> +		ni->ni_net->net_lnd->lnd_shutdown(ni);
> 
> 		/*
> 		 * can't deref lnd anymore now; it might have unregistered
> @@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
> 	lnd->lnd_refcount++;
> 	lnet_net_unlock(LNET_LOCK_EX);
> 
> -	ni->ni_lnd = lnd;
> +	ni->ni_net->net_lnd = lnd;
> 
> 	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
> 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
> @@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> 	if (rc)
> 		goto failed1;
> 
> -	if (ni->ni_lnd->lnd_accept) {
> +	if (ni->ni_net->net_lnd->lnd_accept) {
> 		rc = lnet_acceptor_start();
> 		if (rc < 0) {
> 			/* shutdown the ni that we just started */
> @@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
> 		if (!ni)
> 			return -EINVAL;
> 
> -		if (!ni->ni_lnd->lnd_ctl)
> +		if (!ni->ni_net->net_lnd->lnd_ctl)
> 			rc = -EINVAL;
> 		else
> -			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
> +			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
> 
> 		lnet_ni_decref(ni);
> 		return rc;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index f186e6a16d34..1bf12af87a20 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
> 		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
> 		iov_iter_advance(&to, offset);
> 	}
> -	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> +	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> 	if (rc < 0)
> 		lnet_finalize(ni, msg, rc);
> }
> @@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
> 	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
> 		(msg->msg_txcredit && msg->msg_peertxcredit));
> 
> -	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
> +	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
> 	if (rc < 0)
> 		lnet_finalize(ni, msg, rc);
> }
> @@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
> 	LASSERT(!msg->msg_sending);
> 	LASSERT(msg->msg_receiving);
> 	LASSERT(!msg->msg_rx_ready_delay);
> -	LASSERT(ni->ni_lnd->lnd_eager_recv);
> +	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
> 
> 	msg->msg_rx_ready_delay = 1;
> -	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> +	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> 					&msg->msg_private);
> 	if (rc) {
> 		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
> @@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
> 	time64_t last_alive = 0;
> 
> 	LASSERT(lnet_peer_aliveness_enabled(lp));
> -	LASSERT(ni->ni_lnd->lnd_query);
> +	LASSERT(ni->ni_net->net_lnd->lnd_query);
> 
> 	lnet_net_unlock(lp->lp_cpt);
> -	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> +	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> 	lnet_net_lock(lp->lp_cpt);
> 
> 	lp->lp_last_query = ktime_get_seconds();
> @@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
> 	info.mi_roffset	= hdr->msg.put.offset;
> 	info.mi_mbits	= hdr->msg.put.match_bits;
> 
> -	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
> +	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
> 	ready_delay = msg->msg_rx_ready_delay;
> 
>  again:
> @@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
> 
> 	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
> 	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
> -		if (!ni->ni_lnd->lnd_eager_recv) {
> +		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
> 			msg->msg_rx_ready_delay = 1;
> 		} else {
> 			lnet_net_unlock(msg->msg_rx_cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
> index eb14146bd879..8167980c2323 100644
> --- a/drivers/staging/lustre/lnet/lnet/lo.c
> +++ b/drivers/staging/lustre/lnet/lnet/lo.c
> @@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
> static int
> lolnd_startup(struct lnet_ni *ni)
> {
> -	LASSERT(ni->ni_lnd == &the_lolnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
> 	LASSERT(!lolnd_instanced);
> 	lolnd_instanced = 1;
> 
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 7d61c5d71426..0c0ec0b27982 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
> 		lp->lp_notifylnd = 0;
> 		lp->lp_notify    = 0;
> 
> -		if (notifylnd && ni->ni_lnd->lnd_notify) {
> +		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
> 			lnet_net_unlock(lp->lp_cpt);
> 
> 			/*
> 			 * A new notification could happen now; I'll handle it
> 			 * when control returns to me
> 			 */
> -			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
> +			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
> 
> 			lnet_net_lock(lp->lp_cpt);
> 		}
> @@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
> 		lnet_net_unlock(LNET_LOCK_EX);
> 
> 		/* XXX Assume alive */
> -		if (ni->ni_lnd->lnd_notify)
> -			ni->ni_lnd->lnd_notify(ni, gateway, 1);
> +		if (ni->ni_net->net_lnd->lnd_notify)
> +			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
> 
> 		lnet_net_lock(LNET_LOCK_EX);
> 	}
> @@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
> 
> 	now = ktime_get_real_seconds();
> 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_lnd->lnd_type == LOLND)
> +		if (ni->ni_net->net_lnd->lnd_type == LOLND)
> 			continue;
> 
> 		if (now < ni->ni_last_alive + timeout)
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index 19cea7076057..f3ccd6a2b70e 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
> 				last_alive = now - ni->ni_last_alive;
> 
> 			/* @lo forever alive */
> -			if (ni->ni_lnd->lnd_type == LOLND)
> +			if (ni->ni_net->net_lnd->lnd_type == LOLND)
> 				last_alive = 0;
> 
> 			lnet_ni_lock(ni);
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 04/34] lnet: embed lnd_tunables in lnet_ni
  2018-09-07  0:49 ` [lustre-devel] [PATCH 04/34] lnet: embed lnd_tunables in lnet_ni NeilBrown
@ 2018-09-10 23:08   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:08 UTC (permalink / raw)
  To: lustre-devel

James Simmons: Should these tunable changes be made to gnilnd in a separate patch set?

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

Instead of a pointer, embed the data struct.
Also other related changes.

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-types.h  |    6 ++++
.../lustre/include/uapi/linux/lnet/lnet-dlc.h      |   10 +++++--
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    6 ++--
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    8 +++---
.../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   13 +++-------
drivers/staging/lustre/lnet/lnet/api-ni.c          |   27 +++++++++-----------
drivers/staging/lustre/lnet/lnet/config.c          |    2 -
8 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index e170eb07a5bf..c5e3363de727 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -302,7 +302,11 @@ struct lnet_ni {
struct lnet_ni_status *ni_status;

/* per NI LND tunables */
- struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
+ struct lnet_lnd_tunables ni_lnd_tunables;
+
+ /* lnd tunables set explicitly */
+ bool ni_lnd_tunables_set;
+
/*
* equivalent interfaces to use
* This is an array because socklnd bonding can still be configured
diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
index a8eb3b8f9fd7..ac29f9d24d5d 100644
--- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
+++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
@@ -57,11 +57,15 @@ struct lnet_ioctl_config_o2iblnd_tunables {
__u16 pad;
};

+struct lnet_lnd_tunables {
+ union {
+ struct lnet_ioctl_config_o2iblnd_tunables lnd_o2ib;
+ } lnd_tun_u;
+};
+
struct lnet_ioctl_config_lnd_tunables {
struct lnet_ioctl_config_lnd_cmn_tunables lt_cmn;
- union {
- struct lnet_ioctl_config_o2iblnd_tunables lt_o2ib;
- } lt_tun_u;
+ struct lnet_lnd_tunables lt_tun;
};

struct lnet_ioctl_net_config {
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 5e1592b398c1..ade566d20c69 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2122,7 +2122,7 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni,
int rc;
int i;

- tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+ tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;

if (tunables->lnd_fmr_pool_size < *kiblnd_tunables.kib_ntx / 4) {
CERROR("Can't set fmr pool size (%d) < ntx / 4(%d)\n",
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index 42dc15cef194..522eb150d9a6 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -608,7 +608,7 @@ kiblnd_cfg_rdma_frags(struct lnet_ni *ni)
struct lnet_ioctl_config_o2iblnd_tunables *tunables;
int mod;

- tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+ tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
mod = tunables->lnd_map_on_demand;
return mod ? mod : IBLND_MAX_RDMA_FRAGS >> IBLND_FRAG_SHIFT;
}
@@ -627,7 +627,7 @@ kiblnd_concurrent_sends(int version, struct lnet_ni *ni)
struct lnet_ioctl_config_o2iblnd_tunables *tunables;
int concurrent_sends;

- tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+ tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
concurrent_sends = tunables->lnd_concurrent_sends;

if (version == IBLND_MSG_VERSION_1) {
@@ -777,7 +777,7 @@ kiblnd_need_noop(struct kib_conn *conn)
struct lnet_ni *ni = conn->ibc_peer->ibp_ni;

LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED);
- tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+ tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;

if (conn->ibc_outstanding_credits <
   IBLND_CREDITS_HIGHWATER(tunables, conn->ibc_version) &&
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index a8d2b4911dab..c266940cb2ae 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1452,7 +1452,7 @@ kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid)

/* Brand new peer */
LASSERT(!peer->ibp_connecting);
- tunables = &peer->ibp_ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+ tunables = &peer->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
peer->ibp_connecting = tunables->lnd_conns_per_peer;

/* always called with a ref on ni, which prevents ni being shutdown */
@@ -2592,14 +2592,14 @@ kiblnd_check_reconnect(struct kib_conn *conn, int version,
break;

case IBLND_REJECT_RDMA_FRAGS: {
- struct lnet_ioctl_config_lnd_tunables *tunables;
+ struct lnet_ioctl_config_o2iblnd_tunables *tunables;

if (!cp) {
reason = "can't negotiate max frags";
goto out;
}
- tunables = peer->ibp_ni->ni_lnd_tunables;
- if (!tunables->lt_tun_u.lt_o2ib.lnd_map_on_demand) {
+ tunables = &peer->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;
+ if (!tunables->lnd_map_on_demand) {
reason = "map_on_demand must be enabled";
goto out;
}
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
index a1aca4dda38f..5117594f38fb 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
@@ -185,16 +185,11 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
* if there was no tunables specified, setup the tunables to be
* defaulted
*/
- if (!ni->ni_lnd_tunables) {
- ni->ni_lnd_tunables = kzalloc(sizeof(*ni->ni_lnd_tunables),
-      GFP_NOFS);
- if (!ni->ni_lnd_tunables)
- return -ENOMEM;
-
- memcpy(&ni->ni_lnd_tunables->lt_tun_u.lt_o2ib,
+ if (!ni->ni_lnd_tunables_set)
+ memcpy(&ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib,
      &default_tunables, sizeof(*tunables));
- }
- tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
+
+ tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib;

/* Current API version */
tunables->lnd_version = 0;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 0896e75bc3d7..c944fbb155c8 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1198,6 +1198,7 @@ static int
lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
{
struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
+ struct lnet_lnd_tunables *tun = NULL;
int rc = -EINVAL;
int lnd_type;
struct lnet_lnd *lnd;
@@ -1250,19 +1251,15 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)

ni->ni_net->net_lnd = lnd;

- if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
+ if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf)) {
lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
+ tun = &lnd_tunables->lt_tun;
+ }

- if (lnd_tunables) {
- ni->ni_lnd_tunables = kzalloc(sizeof(*ni->ni_lnd_tunables),
-      GFP_NOFS);
- if (!ni->ni_lnd_tunables) {
- mutex_unlock(&the_lnet.ln_lnd_mutex);
- rc = -ENOMEM;
- goto failed0;
- }
- memcpy(ni->ni_lnd_tunables, lnd_tunables,
-       sizeof(*ni->ni_lnd_tunables));
+ if (tun) {
+ memcpy(&ni->ni_lnd_tunables, tun,
+       sizeof(*tun));
+ ni->ni_lnd_tunables_set = true;
}

/*
@@ -1702,15 +1699,15 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
tunable_size = config->cfg_hdr.ioc_len - min_size;

/* Don't copy to much data to user space */
- min_size = min(tunable_size, sizeof(*ni->ni_lnd_tunables));
+ min_size = min(tunable_size, sizeof(ni->ni_lnd_tunables));
lnd_cfg = (struct lnet_ioctl_config_lnd_tunables *)net_config->cfg_bulk;

- if (ni->ni_lnd_tunables && lnd_cfg && min_size) {
- memcpy(lnd_cfg, ni->ni_lnd_tunables, min_size);
+ if (lnd_cfg && min_size) {
+ memcpy(&lnd_cfg->lt_tun, &ni->ni_lnd_tunables, min_size);
config->cfg_config_u.cfg_net.net_interface_count = 1;

/* Tell user land that kernel side has less data */
- if (tunable_size > sizeof(*ni->ni_lnd_tunables)) {
+ if (tunable_size > sizeof(ni->ni_lnd_tunables)) {
min_size = tunable_size - sizeof(ni->ni_lnd_tunables);
config->cfg_hdr.ioc_len -= min_size;
}
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 86a53854e427..5646feeb433e 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -105,8 +105,6 @@ lnet_ni_free(struct lnet_ni *ni)
if (ni->ni_cpts)
cfs_expr_list_values_free(ni->ni_cpts, ni->ni_ncpts);

- kfree(ni->ni_lnd_tunables);
-
for (i = 0; i < LNET_MAX_INTERFACES && ni->ni_interfaces[i]; i++)
kfree(ni->ni_interfaces[i]);




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/499b697e/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre
  2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
                   ` (33 preceding siblings ...)
  2018-09-07  0:49 ` [lustre-devel] [PATCH 21/34] lnet: add net_ni_added NeilBrown
@ 2018-09-10 23:10 ` James Simmons
  2018-09-24  6:58   ` NeilBrown
  34 siblings, 1 reply; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:10 UTC (permalink / raw)
  To: lustre-devel


> The following series implements the first patch in the
> multi-rail series:
> Commit: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")
> 
> I split that commit up into 40 individual commits which can be found
> at
>   https://github.com/neilbrown/lustre/commits/multirail
> though you need to scroll down a bit, as that contains all the
> multi-rail series.
> 
> I then ported most of these patches to my mainline tree.
> Some that I haven't included are:
> - lnet: Move lnet_msg_alloc/free down a bit.
>     lnet_msg_alloc/free don't exist any more
> - lnet: lib-types: change some tabs to spaces
> - lnet - assorted whitespace changes.
> - lnet: change ni_last_alive from time64_t to long
> - lnet: add lnet_net_state
>     net_state is never used.
> - lnet: remove 'static' from lnet_get_net_config()
> 
> I've also made a couple of minor changes to individual patches not
> strictly related to porting (the net_prio field is never used, so I
> never added it - I should have made that a separate patch).
>
> This series compiles, but doesn't work.  I get a NULL pointer
> reference, then an assertion failure.  If I fix those, it hangs.
> The NULL pointer ref and the failing assertion are gone with
> later patches, so I hope the other problems are too.
> 

I have tried it and did a compare to what landed in the OpenSFS branch.
I saw the failures in my testing and foudn the mistake in the 7th patch.

> Some of these patches have very poor descriptions, such as "I have no
> idea what this does".  If someone would like to explain - or maybe say
> "Oh, we really shouldn't have done that", I'd be very happy to
> receive that, and update the description or patch accordingly.

When I ran checkpatch it really dislikes:

This is part of
    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
       LU-7734 lnet: Multi-Rail local NI split

I don't recommend landing the above in the commit messsage as for the
reason that a person outside of lustre will not know where to look for
that git commit. Instead I recommend replacing it with:

------------------------------------------------------------------
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18274
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: NeilBrown <neilb@suse.com>

This gives the reviewer a URL link for both the JIRA ticket that usually
contains details not in the commit message as well as the gerrit URL
for the original patch. This way if a future bug is found a comparison
can be done against the original patch. 
 
The policy for the Lustre project is to perserve authorship for patches
when porting to other branches, upstream or LTS.

> These will all appear in my lustre-testing branch, but won't migrate
> to 'lustre' until I, at least, have enough other patches that I can
> get a successful test run.
> 
> Review and comments always welcome.
> 
> Thanks,
> NeilBrown
> 
> 
> ---
> 
> Amir Shehata (1):
>       Completely re-write lnet_parse_networks().
> 
> NeilBrown (33):
>       struct lnet_ni - reformat comments.
>       lnet: Create struct lnet_net
>       lnet: struct lnet_ni: move ni_lnd to lnet_net
>       lnet: embed lnd_tunables in lnet_ni
>       lnet: begin separating "networks" from "network interfaces".
>       lnet: store separate xmit/recv net-interface in each message.
>       lnet: change lnet_peer to reference the net, rather than ni.
>       lnet: add cpt to lnet_match_info.
>       lnet: add list of cpts to lnet_net.
>       lnet: add ni arg to lnet_cpt_of_nid()
>       lnet: pass tun to lnet_startup_lndni, instead of full conf
>       lnet: split lnet_startup_lndni
>       lnet: reverse order of lnet_startup_lnd{net,ni}
>       lnet: rename lnet_find_net_locked to lnet_find_rnet_locked
>       lnet: extend zombie handling to nets and nis
>       lnet: lnet_shutdown_lndnets - remove some cleanup code.
>       lnet: move lnet_shutdown_lndnets down to after first use
>       lnet: add ni_state
>       lnet: simplify lnet_islocalnet()
>       lnet: discard ni_cpt_list
>       lnet: add net_ni_added
>       lnet: don't take reference in lnet_XX2ni_locked()
>       lnet: don't need lock to test ln_shutdown.
>       lnet: don't take lock over lnet_net_unique()
>       lnet: swap 'then' and 'else' branches in lnet_startup_lndnet
>       lnet: only valid lnd_type when net_id is unique.
>       lnet: make it possible to add a new interface to a network
>       lnet: add checks to ensure network interface names are unique.
>       lnet: track tunables in lnet_startup_lndnet()
>       lnet: fix typo
>       lnet: lnet_dyn_add_ni: fix ping_info count
>       lnet: lnet_dyn_del_ni: fix ping_info count
>       lnet: introduce use_tcp_bonding mod param
> 
> 
>  .../staging/lustre/include/linux/lnet/lib-lnet.h   |   31 -
>  .../staging/lustre/include/linux/lnet/lib-types.h  |  142 ++-
>  .../lustre/include/uapi/linux/lnet/lnet-dlc.h      |   18 
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |   10 
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    6 
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   12 
>  .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   74 +-
>  .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |   25 -
>  drivers/staging/lustre/lnet/lnet/acceptor.c        |    8 
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |  939 +++++++++++++-------
>  drivers/staging/lustre/lnet/lnet/config.c          |  688 +++++++++++----
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |  132 ++-
>  drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    6 
>  drivers/staging/lustre/lnet/lnet/lo.c              |    2 
>  drivers/staging/lustre/lnet/lnet/net_fault.c       |    3 
>  drivers/staging/lustre/lnet/lnet/peer.c            |   31 -
>  drivers/staging/lustre/lnet/lnet/router.c          |   51 +
>  drivers/staging/lustre/lnet/lnet/router_proc.c     |   24 -
>  drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 
>  drivers/staging/lustre/lnet/selftest/framework.c   |    3 
>  drivers/staging/lustre/lnet/selftest/selftest.h    |    2 
>  21 files changed, 1507 insertions(+), 702 deletions(-)
> 
> --
> Signature
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni NeilBrown
@ 2018-09-10 23:17   ` James Simmons
  2018-09-12  2:56     ` NeilBrown
  0 siblings, 1 reply; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:17 UTC (permalink / raw)
  To: lustre-devel


> As a net will soon have multiple ni, a peer should identify
> just the net.
> Various places that we need the ni, we now use rxni or txni from
> the message
> 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-lnet.h   |    3 +
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    5 +-
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |   13 +++++
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |   49 +++++++++++---------
>  drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 -
>  drivers/staging/lustre/lnet/lnet/net_fault.c       |    3 +
>  drivers/staging/lustre/lnet/lnet/peer.c            |   26 ++++-------
>  drivers/staging/lustre/lnet/lnet/router.c          |   14 +++---
>  drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 -
>  9 files changed, 67 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> index 4440b87299c4..34509e52bac7 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> @@ -435,6 +435,7 @@ int lnet_dyn_add_ni(lnet_pid_t requested_pid,
>  		    struct lnet_ioctl_config_data *conf);
>  int lnet_dyn_del_ni(__u32 net);
>  int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
> +struct lnet_net *lnet_get_net_locked(__u32 net_id);

Using __u32 and friends for internal lustre kernel code was disliked by 
Greg. I recommend any new code pushed in which __uXX is used is changed
to the proper kernel uXX versions. 
  
>  int lnet_islocalnid(lnet_nid_t nid);
>  int lnet_islocalnet(__u32 net);
> @@ -617,7 +618,7 @@ int lnet_sock_connect(struct socket **sockp, int *fatal,
>  void libcfs_sock_release(struct socket *sock);
>  
>  int lnet_peers_start_down(void);
> -int lnet_peer_buffer_credits(struct lnet_ni *ni);
> +int lnet_peer_buffer_credits(struct lnet_net *net);
>  
>  int lnet_router_checker_start(void);
>  void lnet_router_checker_stop(void);
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 16a493529a46..255c6c4bbb89 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -396,7 +396,8 @@ struct lnet_peer {
>  	time64_t		 lp_last_query;	/* when lp_ni was queried
>  						 * last time
>  						 */
> -	struct lnet_ni		*lp_ni;		/* interface peer is on */
> +	/* network peer is on */
> +	struct lnet_net		*lp_net;
>  	lnet_nid_t		 lp_nid;	/* peer's NID */
>  	int			 lp_refcount;	/* # refs */
>  	int			 lp_cpt;	/* CPT this peer attached on */
> @@ -427,7 +428,7 @@ struct lnet_peer_table {
>   * lnet_ni::ni_peertimeout has been set to a positive value
>   */
>  #define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing && \
> -					 (lp)->lp_ni->ni_net->net_tunables.lct_peer_timeout > 0)
> +					 (lp)->lp_net->net_tunables.lct_peer_timeout > 0)
>  
>  struct lnet_route {
>  	struct list_head	 lr_list;	/* chain on net */
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index 05687278334a..c21aef32cdde 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -680,6 +680,19 @@ lnet_net2ni(__u32 net)
>  }
>  EXPORT_SYMBOL(lnet_net2ni);
>  
> +struct lnet_net *
> +lnet_get_net_locked(__u32 net_id)
> +{
> +	struct lnet_net	 *net;
> +
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		if (net->net_id == net_id)
> +			return net;
> +	}
> +
> +	return NULL;
> +}
> +
>  static unsigned int
>  lnet_nid_cpt_hash(lnet_nid_t nid, unsigned int number)
>  {
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index b2a52ddcefcb..b8b15f56a275 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -525,7 +525,7 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
>  		return 0;
>  
>  	deadline = lp->lp_last_alive +
> -		lp->lp_ni->ni_net->net_tunables.lct_peer_timeout;
> +		lp->lp_net->net_tunables.lct_peer_timeout;
>  	alive = deadline > now;
>  
>  	/* Update obsolete lp_alive except for routers assumed to be dead
> @@ -544,7 +544,7 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
>   *     may drop the lnet_net_lock
>   */
>  static int
> -lnet_peer_alive_locked(struct lnet_peer *lp)
> +lnet_peer_alive_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>  {
>  	time64_t now = ktime_get_seconds();
>  
> @@ -570,13 +570,13 @@ lnet_peer_alive_locked(struct lnet_peer *lp)
>  				      libcfs_nid2str(lp->lp_nid),
>  				      now, next_query,
>  				      lnet_queryinterval,
> -				      lp->lp_ni->ni_net->net_tunables.lct_peer_timeout);
> +				      lp->lp_net->net_tunables.lct_peer_timeout);
>  			return 0;
>  		}
>  	}
>  
>  	/* query NI for latest aliveness news */
> -	lnet_ni_query_locked(lp->lp_ni, lp);
> +	lnet_ni_query_locked(ni, lp);
>  
>  	if (lnet_peer_is_alive(lp, now))
>  		return 1;
> @@ -600,7 +600,7 @@ static int
>  lnet_post_send_locked(struct lnet_msg *msg, int do_send)
>  {
>  	struct lnet_peer *lp = msg->msg_txpeer;
> -	struct lnet_ni *ni = lp->lp_ni;
> +	struct lnet_ni *ni = msg->msg_txni;
>  	int cpt = msg->msg_tx_cpt;
>  	struct lnet_tx_queue *tq = ni->ni_tx_queues[cpt];
>  
> @@ -611,7 +611,7 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send)
>  
>  	/* NB 'lp' is always the next hop */
>  	if (!(msg->msg_target.pid & LNET_PID_USERFLAG) &&
> -	    !lnet_peer_alive_locked(lp)) {
> +	    !lnet_peer_alive_locked(ni, lp)) {
>  		the_lnet.ln_counters[cpt]->drop_count++;
>  		the_lnet.ln_counters[cpt]->drop_length += msg->msg_len;
>  		lnet_net_unlock(cpt);
> @@ -770,7 +770,7 @@ lnet_post_routed_recv_locked(struct lnet_msg *msg, int do_recv)
>  		int cpt = msg->msg_rx_cpt;
>  
>  		lnet_net_unlock(cpt);
> -		lnet_ni_recv(lp->lp_ni, msg->msg_private, msg, 1,
> +		lnet_ni_recv(msg->msg_rxni, msg->msg_private, msg, 1,
>  			     0, msg->msg_len, msg->msg_len);
>  		lnet_net_lock(cpt);
>  	}
> @@ -785,7 +785,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
>  	struct lnet_ni	*txni = msg->msg_txni;
>  
>  	if (msg->msg_txcredit) {
> -		struct lnet_ni *ni = txpeer->lp_ni;
> +		struct lnet_ni *ni = msg->msg_txni;
>  		struct lnet_tx_queue *tq = ni->ni_tx_queues[msg->msg_tx_cpt];
>  
>  		/* give back NI txcredits */
> @@ -800,7 +800,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
>  					  struct lnet_msg, msg_list);
>  			list_del(&msg2->msg_list);
>  
> -			LASSERT(msg2->msg_txpeer->lp_ni == ni);
> +			LASSERT(msg2->msg_txni == ni);
>  			LASSERT(msg2->msg_tx_delayed);
>  
>  			(void)lnet_post_send_locked(msg2, 1);
> @@ -869,7 +869,7 @@ lnet_drop_routed_msgs_locked(struct list_head *list, int cpt)
>  
>  	while(!list_empty(&drop)) {
>  		msg = list_first_entry(&drop, struct lnet_msg, msg_list);
> -		lnet_ni_recv(msg->msg_rxpeer->lp_ni, msg->msg_private, NULL,
> +		lnet_ni_recv(msg->msg_rxni, msg->msg_private, NULL,
>  			     0, 0, 0, msg->msg_hdr.payload_length);
>  		list_del_init(&msg->msg_list);
>  		lnet_finalize(NULL, msg, -ECANCELED);
> @@ -1007,7 +1007,7 @@ lnet_compare_routes(struct lnet_route *r1, struct lnet_route *r2)
>  }
>  
>  static struct lnet_peer *
> -lnet_find_route_locked(struct lnet_ni *ni, lnet_nid_t target,
> +lnet_find_route_locked(struct lnet_net *net, lnet_nid_t target,
>  		       lnet_nid_t rtr_nid)
>  {
>  	struct lnet_remotenet *rnet;
> @@ -1035,7 +1035,7 @@ lnet_find_route_locked(struct lnet_ni *ni, lnet_nid_t target,
>  		if (!lnet_is_route_alive(route))
>  			continue;
>  
> -		if (ni && lp->lp_ni != ni)
> +		if (net && lp->lp_net != net)
>  			continue;
>  
>  		if (lp->lp_nid == rtr_nid) /* it's pre-determined router */
> @@ -1164,10 +1164,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>  			/* ENOMEM or shutting down */
>  			return rc;
>  		}
> -		LASSERT(lp->lp_ni == src_ni);
> +		LASSERT(lp->lp_net == src_ni->ni_net);
>  	} else {
>  		/* sending to a remote network */
> -		lp = lnet_find_route_locked(src_ni, dst_nid, rtr_nid);
> +		lp = lnet_find_route_locked(src_ni != NULL ?
> +					    src_ni->ni_net : NULL,
> +					    dst_nid, rtr_nid);
>  		if (!lp) {
>  			if (src_ni)
>  				lnet_ni_decref_locked(src_ni, cpt);
> @@ -1203,10 +1205,11 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>  		       lnet_msgtyp2str(msg->msg_type), msg->msg_len);
>  
>  		if (!src_ni) {
> -			src_ni = lp->lp_ni;
> +			src_ni = lnet_get_next_ni_locked(lp->lp_net, NULL);
> +			LASSERT(src_ni != NULL);

Checkpatch will not like the above.

>  			src_nid = src_ni->ni_nid;
>  		} else {
> -			LASSERT(src_ni == lp->lp_ni);
> +			LASSERT(src_ni->ni_net == lp->lp_net);
>  			lnet_ni_decref_locked(src_ni, cpt);
>  		}
>  
> @@ -1918,7 +1921,7 @@ lnet_drop_delayed_msg_list(struct list_head *head, char *reason)
>  		 * called lnet_drop_message(), so I just hang onto msg as well
>  		 * until that's done
>  		 */
> -		lnet_drop_message(msg->msg_rxpeer->lp_ni,
> +		lnet_drop_message(msg->msg_rxni,
>  				  msg->msg_rxpeer->lp_cpt,
>  				  msg->msg_private, msg->msg_len);
>  		/*
> @@ -1926,7 +1929,7 @@ lnet_drop_delayed_msg_list(struct list_head *head, char *reason)
>  		 * but we still should give error code so lnet_msg_decommit()
>  		 * can skip counters operations and other checks.
>  		 */
> -		lnet_finalize(msg->msg_rxpeer->lp_ni, msg, -ENOENT);
> +		lnet_finalize(msg->msg_rxni, msg, -ENOENT);
>  	}
>  }
>  
> @@ -1959,7 +1962,7 @@ lnet_recv_delayed_msg_list(struct list_head *head)
>  		       msg->msg_hdr.msg.put.offset,
>  		       msg->msg_hdr.payload_length);
>  
> -		lnet_recv_put(msg->msg_rxpeer->lp_ni, msg);
> +		lnet_recv_put(msg->msg_rxni, msg);
>  	}
>  }
>  
> @@ -2384,8 +2387,12 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
>  
>  			LASSERT(shortest);
>  			hops = shortest_hops;
> -			if (srcnidp)
> -				*srcnidp = shortest->lr_gateway->lp_ni->ni_nid;
> +			if (srcnidp) {
> +				ni = lnet_get_next_ni_locked(
> +					shortest->lr_gateway->lp_net,
> +					NULL);
> +				*srcnidp = ni->ni_nid;
> +			}
>  			if (orderp)
>  				*orderp = order;
>  			lnet_net_unlock(cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index fc47379c5938..4c5737083422 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -946,7 +946,7 @@ lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason)
>  		/* grab all messages which are on the NI passed in */
>  		list_for_each_entry_safe(msg, tmp, &ptl->ptl_msg_delayed,
>  					 msg_list) {
> -			if (msg->msg_rxpeer->lp_ni == ni)
> +			if (msg->msg_txni == ni || msg->msg_rxni == ni)
>  				list_move(&msg->msg_list, &zombies);
>  		}
>  	} else {
> diff --git a/drivers/staging/lustre/lnet/lnet/net_fault.c b/drivers/staging/lustre/lnet/lnet/net_fault.c
> index 41d6131ee15a..6c53ae1811e5 100644
> --- a/drivers/staging/lustre/lnet/lnet/net_fault.c
> +++ b/drivers/staging/lustre/lnet/lnet/net_fault.c
> @@ -601,8 +601,9 @@ delayed_msg_process(struct list_head *msg_list, bool drop)
>  
>  		msg = list_entry(msg_list->next, struct lnet_msg, msg_list);
>  		LASSERT(msg->msg_rxpeer);
> +		LASSERT(msg->msg_rxni != NULL);
>  
> -		ni = msg->msg_rxpeer->lp_ni;
> +		ni = msg->msg_rxni;
>  		cpt = msg->msg_rx_cpt;
>  
>  		list_del_init(&msg->msg_list);
> diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
> index b76ac3e051d9..ed29124ebded 100644
> --- a/drivers/staging/lustre/lnet/lnet/peer.c
> +++ b/drivers/staging/lustre/lnet/lnet/peer.c
> @@ -112,7 +112,7 @@ lnet_peer_table_cleanup_locked(struct lnet_ni *ni,
>  	for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
>  		list_for_each_entry_safe(lp, tmp, &ptable->pt_hash[i],
>  					 lp_hashlist) {
> -			if (ni && ni != lp->lp_ni)
> +			if (ni && ni->ni_net != lp->lp_net)
>  				continue;
>  			list_del_init(&lp->lp_hashlist);
>  			/* Lose hash table's ref */
> @@ -154,7 +154,7 @@ lnet_peer_table_del_rtrs_locked(struct lnet_ni *ni,
>  	for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
>  		list_for_each_entry_safe(lp, tmp, &ptable->pt_hash[i],
>  					 lp_hashlist) {
> -			if (ni != lp->lp_ni)
> +			if (ni->ni_net != lp->lp_net)
>  				continue;
>  
>  			if (!lp->lp_rtr_refcount)
> @@ -230,8 +230,7 @@ lnet_destroy_peer_locked(struct lnet_peer *lp)
>  	LASSERT(ptable->pt_number > 0);
>  	ptable->pt_number--;
>  
> -	lnet_ni_decref_locked(lp->lp_ni, lp->lp_cpt);
> -	lp->lp_ni = NULL;
> +	lp->lp_net = NULL;
>  
>  	list_add(&lp->lp_hashlist, &ptable->pt_deathrow);
>  	LASSERT(ptable->pt_zombies > 0);
> @@ -336,16 +335,11 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
>  		goto out;
>  	}
>  
> -	lp->lp_ni = lnet_net2ni_locked(LNET_NIDNET(nid), cpt2);
> -	if (!lp->lp_ni) {
> -		rc = -EHOSTUNREACH;
> -		goto out;
> -	}
> -
> -	lp->lp_txcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
> -	lp->lp_mintxcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
> -	lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
> -	lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
> +	lp->lp_net = lnet_get_net_locked(LNET_NIDNET(!lp->lp_nid));

This is the single error in your port that broke stuff. The correct code 
is:

lp->lp_net = lnet_get_net_locked(LNET_NIDNET(lp->lp_nid));


> +	lp->lp_txcredits =
> +		lp->lp_mintxcredits = lp->lp_net->net_tunables.lct_peer_tx_credits;
> +	lp->lp_rtrcredits =
> +		lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_net);
>  
>  	list_add_tail(&lp->lp_hashlist,
>  		      &ptable->pt_hash[lnet_nid2peerhash(nid)]);
> @@ -383,7 +377,7 @@ lnet_debug_peer(lnet_nid_t nid)
>  
>  	CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
>  	       libcfs_nid2str(lp->lp_nid), lp->lp_refcount,
> -	       aliveness, lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits,
> +	       aliveness, lp->lp_net->net_tunables.lct_peer_tx_credits,
>  	       lp->lp_rtrcredits, lp->lp_minrtrcredits,
>  	       lp->lp_txcredits, lp->lp_mintxcredits, lp->lp_txqnob);
>  
> @@ -439,7 +433,7 @@ lnet_get_peer_info(__u32 peer_index, __u64 *nid,
>  			*nid = lp->lp_nid;
>  			*refcount = lp->lp_refcount;
>  			*ni_peer_tx_credits =
> -				lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
> +				lp->lp_net->net_tunables.lct_peer_tx_credits;
>  			*peer_tx_credits = lp->lp_txcredits;
>  			*peer_rtr_credits = lp->lp_rtrcredits;
>  			*peer_min_rtr_credits = lp->lp_mintxcredits;
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 135dfe793b0b..72b8ca2b0fc6 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -55,10 +55,8 @@ module_param(auto_down, int, 0444);
>  MODULE_PARM_DESC(auto_down, "Automatically mark peers down on comms error");
>  
>  int
> -lnet_peer_buffer_credits(struct lnet_ni *ni)
> +lnet_peer_buffer_credits(struct lnet_net *net)
>  {
> -	struct lnet_net *net = ni->ni_net;
> -
>  	/* NI option overrides LNet default */
>  	if (net->net_tunables.lct_peer_rtr_credits > 0)
>  		return net->net_tunables.lct_peer_rtr_credits;
> @@ -373,7 +371,7 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
>  		lnet_peer_addref_locked(route->lr_gateway); /* +1 for notify */
>  		lnet_add_route_to_rnet(rnet2, route);
>  
> -		ni = route->lr_gateway->lp_ni;
> +		ni = lnet_get_next_ni_locked(route->lr_gateway->lp_net, NULL);
>  		lnet_net_unlock(LNET_LOCK_EX);
>  
>  		/* XXX Assume alive */
> @@ -428,8 +426,8 @@ lnet_check_routes(void)
>  					continue;
>  				}
>  
> -				if (route->lr_gateway->lp_ni ==
> -				    route2->lr_gateway->lp_ni)
> +				if (route->lr_gateway->lp_net ==
> +				    route2->lr_gateway->lp_net)
>  					continue;
>  
>  				nid1 = route->lr_gateway->lp_nid;
> @@ -952,6 +950,7 @@ lnet_ping_router_locked(struct lnet_peer *rtr)
>  	struct lnet_rc_data *rcd = NULL;
>  	time64_t now = ktime_get_seconds();
>  	time64_t secs;
> +	struct lnet_ni  *ni;

Another grep from Greg was the spacing in declared variables. As I port
patches new code removes the spacing. Newer lustre code no long does
this kind of spacing. Well most of it :-)

>  
>  	lnet_peer_addref_locked(rtr);
>  
> @@ -960,7 +959,8 @@ lnet_ping_router_locked(struct lnet_peer *rtr)
>  		lnet_notify_locked(rtr, 1, 0, now);
>  
>  	/* Run any outstanding notifications */
> -	lnet_ni_notify_locked(rtr->lp_ni, rtr);
> +	ni = lnet_get_next_ni_locked(rtr->lp_net, NULL);
> +	lnet_ni_notify_locked(ni, rtr);
>  
>  	if (!lnet_isrouter(rtr) ||
>  	    the_lnet.ln_rc_state != LNET_RC_STATE_RUNNING) {
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index 2a366e9a8627..52714b898aac 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -489,7 +489,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write,
>  			int nrefs = peer->lp_refcount;
>  			time64_t lastalive = -1;
>  			char *aliveness = "NA";
> -			int maxcr = peer->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
> +			int maxcr = peer->lp_net->net_tunables.lct_peer_tx_credits;
>  			int txcr = peer->lp_txcredits;
>  			int mintxcr = peer->lp_mintxcredits;
>  			int rtrcr = peer->lp_rtrcredits;
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments NeilBrown
  2018-09-10 22:49   ` Doug Oucharek
@ 2018-09-10 23:17   ` James Simmons
  2018-09-12  2:44     ` NeilBrown
  1 sibling, 1 reply; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:17 UTC (permalink / raw)
  To: lustre-devel


> This is part of
> 
> 8cbb8cd3e771e7f7e0f99cafc19fad32770dc015 LU-7734 lnet: Multi-Rail
> local NI split

Better commit message would be:

Rework the commonents in lib-types.h to limit the checkpatch
chatter of being over 80 characters.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18274
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: NeilBrown <neilb@suse.com>

> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |   38 +++++++++++++++-----
>  1 file changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 6d4106fd9039..078bc97a9ebf 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -263,18 +263,38 @@ struct lnet_ni {
>  	int			  ni_peerrtrcredits;
>  	/* seconds to consider peer dead */
>  	int			  ni_peertimeout;
> -	int			  ni_ncpts;	/* number of CPTs */
> -	__u32			 *ni_cpts;	/* bond NI on some CPTs */
> -	lnet_nid_t		  ni_nid;	/* interface's NID */
> -	void			 *ni_data;	/* instance-specific data */
> +	/* number of CPTs */
> +	int			ni_ncpts;
> +
> +	/* bond NI on some CPTs */
> +	__u32			*ni_cpts;
> +
> +	/* interface's NID */
> +	lnet_nid_t		ni_nid;
> +
> +	/* instance-specific data */
> +	void			*ni_data;
> +
>  	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
> -	struct lnet_tx_queue	**ni_tx_queues;	/* percpt TX queues */
> -	int			**ni_refs;	/* percpt reference count */
> -	time64_t		  ni_last_alive;/* when I was last alive */
> -	struct lnet_ni_status	 *ni_status;	/* my health status */
> +
> +	/* percpt TX queues */
> +	struct lnet_tx_queue	**ni_tx_queues;
> +
> +	/* percpt reference count */
> +	int			**ni_refs;
> +
> +	/* when I was last alive */
> +	time64_t		ni_last_alive;
> +
> +	/* my health status */
> +	struct lnet_ni_status	*ni_status;
> +
>  	/* per NI LND tunables */
>  	struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
> -	/* equivalent interfaces to use */
> +	/*
> +	 * equivalent interfaces to use
> +	 * This is an array because socklnd bonding can still be configured
> +	 */
>  	char			 *ni_interfaces[LNET_MAX_INTERFACES];
>  	/* original net namespace */
>  	struct net		 *ni_net_ns;
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces".
  2018-09-07  0:49 ` [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces" NeilBrown
@ 2018-09-10 23:18   ` Doug Oucharek
  2018-09-12  2:48     ` NeilBrown
  2018-09-10 23:27   ` James Simmons
  1 sibling, 1 reply; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:18 UTC (permalink / raw)
  To: lustre-devel

This patch is fine and can land, just one request: Please keep style improvement like how comments look in separate patches from functional changes.  Keeping them separate makes it much easier to review.  Style patches take a different reviewer mindset than functional changes.  The original MR patches mixes these and that made them hard to review too.

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

We already have "struct lnet_net" separate from "struct lnet_ni",
but they are currently allocated together and freed together and
it is assumed that they are 1-to-1.

This patch starts breaking that assumption.  We have separate
lnet_net_alloc() and lnet_net_free() to alloc/free the new lnet_net,
though they is currently called only when lnet_ni_alloc/free are
called.

The netid is now stored in the lnet_net and fetched directly from
there, rather than extracting it from the net-interface-id ni_nid.

The linkage between these two structures is now richer, lnet_net
can link to a list of lnet_ni.  lnet_net now has a list of lnet_net,
so to find all the lnet_ni, we need to walk a list of lists.
This need to walk a list-of-lists occurs in several places, and new
helpers like lnet_get_ni_idx_locked() and lnet_get_next_ni_locked are
introduced.

Previously a list_head was passed to lnet_ni_alloc() for the new
lnet_ni to be attached to.
Now a list is passed to lnet_net_alloc() for the net to be attached
to, and a lnet_net is passed to lnet_ni_alloc() for the ni to attach
to.
lnet_ni_alloc() also receives an interface name, but this is currently
unused.

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-lnet.h   |   15 +
.../staging/lustre/include/linux/lnet/lib-types.h  |   23 +-
drivers/staging/lustre/lnet/lnet/acceptor.c        |    2
drivers/staging/lustre/lnet/lnet/api-ni.c          |  255 ++++++++++++++------
drivers/staging/lustre/lnet/lnet/config.c          |  135 +++++++----
drivers/staging/lustre/lnet/lnet/lib-move.c        |    6
drivers/staging/lustre/lnet/lnet/router.c          |   15 -
drivers/staging/lustre/lnet/lnet/router_proc.c     |   16 -
8 files changed, 308 insertions(+), 159 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 0fecf0d32c58..4440b87299c4 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -369,8 +369,14 @@ lnet_ni_decref(struct lnet_ni *ni)
}

void lnet_ni_free(struct lnet_ni *ni);
+void lnet_net_free(struct lnet_net *net);
+
+struct lnet_net *
+lnet_net_alloc(__u32 net_type, struct list_head *netlist);
+
struct lnet_ni *
-lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
+lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el,
+      char *iface);

static inline int
lnet_nid2peerhash(lnet_nid_t nid)
@@ -412,6 +418,9 @@ void lnet_destroy_routes(void);
int lnet_get_route(int idx, __u32 *net, __u32 *hops,
  lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
+struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet,
+ struct lnet_ni *prev);
+struct lnet_ni *lnet_get_ni_idx_locked(int idx);

void lnet_router_debugfs_init(void);
void lnet_router_debugfs_fini(void);
@@ -584,7 +593,7 @@ int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
__u32 local_ip, __u32 peer_ip, int peer_port);
void lnet_connect_console_error(int rc, lnet_nid_t peer_nid,
__u32 peer_ip, int port);
-int lnet_count_acceptor_nis(void);
+int lnet_count_acceptor_nets(void);
int lnet_acceptor_timeout(void);
int lnet_acceptor_port(void);

@@ -618,7 +627,7 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
int lnet_parse_ip2nets(char **networksp, char *ip2nets);
int lnet_parse_routes(char *route_str, int *im_a_router);
int lnet_parse_networks(struct list_head *nilist, char *networks);
-int lnet_net_unique(__u32 net, struct list_head *nilist);
+bool lnet_net_unique(__u32 net, struct list_head *nilist);

int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index c5e3363de727..5f0d4703bf86 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -254,6 +254,15 @@ struct lnet_tx_queue {
};

struct lnet_net {
+ /* chain on the ln_nets */
+ struct list_head net_list;
+
+ /* net ID, which is compoed of
+ * (net_type << 16) | net_num.
+ * net_type can be one of the enumarated types defined in
+ * lnet/include/lnet/nidstr.h */
+ __u32 net_id;
+
/* network tunables */
struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;

@@ -264,11 +273,13 @@ struct lnet_net {
bool  net_tunables_set;
/* procedural interface */
struct lnet_lnd *net_lnd;
+ /* list of NIs on this net */
+ struct list_head net_ni_list;
};

struct lnet_ni {
- /* chain on ln_nis */
- struct list_head  ni_list;
+ /* chain on the lnet_net structure */
+ struct list_head  ni_netlist;
/* chain on ln_nis_cpt */
struct list_head ni_cptlist;

@@ -626,14 +637,16 @@ struct lnet {
/* failure simulation */
struct list_head  ln_test_peers;
struct list_head  ln_drop_rules;
- struct list_head  ln_delay_rules;
+ struct list_head ln_delay_rules;

- struct list_head  ln_nis; /* LND instances */
+ /* LND instances */
+ struct list_head ln_nets;
/* NIs bond on specific CPT(s) */
struct list_head  ln_nis_cpt;
/* dying LND instances */
struct list_head  ln_nis_zombie;
- struct lnet_ni *ln_loni; /* the loopback NI */
+ /* the loopback NI */
+ struct lnet_ni *ln_loni;

/* remote networks with routes to them */
struct list_head *ln_remote_nets_hash;
diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
index f8c921f0221c..88b90c1fdbaf 100644
--- a/drivers/staging/lustre/lnet/lnet/acceptor.c
+++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
@@ -454,7 +454,7 @@ lnet_acceptor_start(void)
if (rc <= 0)
return rc;

- if (!lnet_count_acceptor_nis())  /* not required */
+ if (lnet_count_acceptor_nets() == 0)  /* not required */
return 0;

task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure,
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c944fbb155c8..05687278334a 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -537,7 +537,7 @@ lnet_prepare(lnet_pid_t requested_pid)
the_lnet.ln_pid = requested_pid;

INIT_LIST_HEAD(&the_lnet.ln_test_peers);
- INIT_LIST_HEAD(&the_lnet.ln_nis);
+ INIT_LIST_HEAD(&the_lnet.ln_nets);
INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
INIT_LIST_HEAD(&the_lnet.ln_routers);
@@ -616,7 +616,7 @@ lnet_unprepare(void)

LASSERT(!the_lnet.ln_refcount);
LASSERT(list_empty(&the_lnet.ln_test_peers));
- LASSERT(list_empty(&the_lnet.ln_nis));
+ LASSERT(list_empty(&the_lnet.ln_nets));
LASSERT(list_empty(&the_lnet.ln_nis_cpt));
LASSERT(list_empty(&the_lnet.ln_nis_zombie));

@@ -648,14 +648,17 @@ lnet_unprepare(void)
}

struct lnet_ni  *
-lnet_net2ni_locked(__u32 net, int cpt)
+lnet_net2ni_locked(__u32 net_id, int cpt)
{
- struct lnet_ni *ni;
+ struct lnet_ni   *ni;
+ struct lnet_net  *net;

LASSERT(cpt != LNET_LOCK_EX);

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- if (LNET_NIDNET(ni->ni_nid) == net) {
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ if (net->net_id == net_id) {
+ ni = list_entry(net->net_ni_list.next, struct lnet_ni,
+ ni_netlist);
lnet_ni_addref_locked(ni, cpt);
return ni;
}
@@ -760,14 +763,17 @@ lnet_islocalnet(__u32 net)
struct lnet_ni  *
lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
{
- struct lnet_ni *ni;
+ struct lnet_net  *net;
+ struct lnet_ni *ni;

LASSERT(cpt != LNET_LOCK_EX);

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- if (ni->ni_nid == nid) {
- lnet_ni_addref_locked(ni, cpt);
- return ni;
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ if (ni->ni_nid == nid) {
+ lnet_ni_addref_locked(ni, cpt);
+ return ni;
+ }
}
}

@@ -790,16 +796,18 @@ lnet_islocalnid(lnet_nid_t nid)
}

int
-lnet_count_acceptor_nis(void)
+lnet_count_acceptor_nets(void)
{
/* Return the # of NIs that need the acceptor. */
- int count = 0;
- struct lnet_ni *ni;
- int cpt;
+ int count = 0;
+ struct lnet_net  *net;
+ int cpt;

cpt = lnet_net_lock_current();
- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- if (ni->ni_net->net_lnd->lnd_accept)
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ /* all socklnd type networks should have the acceptor
+ * thread started */
+ if (net->net_lnd->lnd_accept)
count++;
}

@@ -832,13 +840,16 @@ lnet_ping_info_create(int num_ni)
static inline int
lnet_get_ni_count(void)
{
- struct lnet_ni *ni;
- int count = 0;
+ struct lnet_ni *ni;
+ struct lnet_net *net;
+ int count = 0;

lnet_net_lock(0);

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list)
- count++;
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
+ count++;
+ }

lnet_net_unlock(0);

@@ -854,14 +865,17 @@ lnet_ping_info_free(struct lnet_ping_info *pinfo)
static void
lnet_ping_info_destroy(void)
{
+ struct lnet_net *net;
struct lnet_ni *ni;

lnet_net_lock(LNET_LOCK_EX);

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- lnet_ni_lock(ni);
- ni->ni_status = NULL;
- lnet_ni_unlock(ni);
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ lnet_ni_lock(ni);
+ ni->ni_status = NULL;
+ lnet_ni_unlock(ni);
+ }
}

lnet_ping_info_free(the_lnet.ln_ping_info);
@@ -963,24 +977,28 @@ lnet_ping_md_unlink(struct lnet_ping_info *pinfo,
static void
lnet_ping_info_install_locked(struct lnet_ping_info *ping_info)
{
+ int i = 0;
struct lnet_ni_status *ns;
struct lnet_ni *ni;
- int i = 0;
+ struct lnet_net *net;

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- LASSERT(i < ping_info->pi_nnis);
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ LASSERT(i < ping_info->pi_nnis);

- ns = &ping_info->pi_ni[i];
+ ns = &ping_info->pi_ni[i];

- ns->ns_nid = ni->ni_nid;
+ ns->ns_nid = ni->ni_nid;

- lnet_ni_lock(ni);
- ns->ns_status = (ni->ni_status) ?
- ni->ni_status->ns_status : LNET_NI_STATUS_UP;
- ni->ni_status = ns;
- lnet_ni_unlock(ni);
+ lnet_ni_lock(ni);
+ ns->ns_status = ni->ni_status ?
+ ni->ni_status->ns_status :
+ LNET_NI_STATUS_UP;
+ ni->ni_status = ns;
+ lnet_ni_unlock(ni);

- i++;
+ i++;
+ }
}
}

@@ -1054,9 +1072,9 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
}

/* move it to zombie list and nobody can find it anymore */
- LASSERT(!list_empty(&ni->ni_list));
- list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
- lnet_ni_decref_locked(ni, 0); /* drop ln_nis' ref */
+ LASSERT(!list_empty(&ni->ni_netlist));
+ list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
+ lnet_ni_decref_locked(ni, 0);
}

static void
@@ -1076,17 +1094,17 @@ lnet_clear_zombies_nis_locked(void)
int j;

ni = list_entry(the_lnet.ln_nis_zombie.next,
- struct lnet_ni, ni_list);
- list_del_init(&ni->ni_list);
+ struct lnet_ni, ni_netlist);
+ list_del_init(&ni->ni_netlist);
cfs_percpt_for_each(ref, j, ni->ni_refs) {
if (!*ref)
continue;
/* still busy, add it back to zombie list */
- list_add(&ni->ni_list, &the_lnet.ln_nis_zombie);
+ list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
break;
}

- if (!list_empty(&ni->ni_list)) {
+ if (!list_empty(&ni->ni_netlist)) {
lnet_net_unlock(LNET_LOCK_EX);
++i;
if ((i & (-i)) == i) {
@@ -1126,6 +1144,7 @@ lnet_shutdown_lndnis(void)
{
struct lnet_ni *ni;
int i;
+ struct lnet_net *net;

/* NB called holding the global mutex */

@@ -1138,10 +1157,14 @@ lnet_shutdown_lndnis(void)
the_lnet.ln_shutdown = 1; /* flag shutdown */

/* Unlink NIs from the global table */
- while (!list_empty(&the_lnet.ln_nis)) {
- ni = list_entry(the_lnet.ln_nis.next,
- struct lnet_ni, ni_list);
- lnet_ni_unlink_locked(ni);
+ while (!list_empty(&the_lnet.ln_nets)) {
+ net = list_entry(the_lnet.ln_nets.next,
+ struct lnet_net, net_list);
+ while (!list_empty(&net->net_ni_list)) {
+ ni = list_entry(net->net_ni_list.next,
+ struct lnet_ni, ni_netlist);
+ lnet_ni_unlink_locked(ni);
+ }
}

/* Drop the cached loopback NI. */
@@ -1212,7 +1235,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)

/* Make sure this new NI is unique. */
lnet_net_lock(LNET_LOCK_EX);
- rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis);
+ rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nets);
lnet_net_unlock(LNET_LOCK_EX);
if (!rc) {
if (lnd_type == LOLND) {
@@ -1297,7 +1320,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
lnet_net_lock(LNET_LOCK_EX);
/* refcount for ln_nis */
lnet_ni_addref_locked(ni, 0);
- list_add_tail(&ni->ni_list, &the_lnet.ln_nis);
+ list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
if (ni->ni_cpts) {
lnet_ni_addref_locked(ni, 0);
list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
@@ -1363,8 +1386,8 @@ lnet_startup_lndnis(struct list_head *nilist)
int ni_count = 0;

while (!list_empty(nilist)) {
- ni = list_entry(nilist->next, struct lnet_ni, ni_list);
- list_del(&ni->ni_list);
+ ni = list_entry(nilist->next, struct lnet_ni, ni_netlist);
+ list_del(&ni->ni_netlist);
rc = lnet_startup_lndni(ni, NULL);

if (rc < 0)
@@ -1486,6 +1509,7 @@ LNetNIInit(lnet_pid_t requested_pid)
struct lnet_ping_info *pinfo;
struct lnet_handle_md md_handle;
struct list_head net_head;
+ struct lnet_net *net;

INIT_LIST_HEAD(&net_head);

@@ -1505,8 +1529,15 @@ LNetNIInit(lnet_pid_t requested_pid)
return rc;
}

- /* Add in the loopback network */
- if (!lnet_ni_alloc(LNET_MKNET(LOLND, 0), NULL, &net_head)) {
+ /* create a network for Loopback network */
+ net = lnet_net_alloc(LNET_MKNET(LOLND, 0), &net_head);
+ if (net == NULL) {
+ rc = -ENOMEM;
+ goto err_empty_list;
+ }
+
+ /* Add in the loopback NI */
+ if (lnet_ni_alloc(net, NULL, NULL) == NULL) {
rc = -ENOMEM;
goto err_empty_list;
}
@@ -1584,11 +1615,11 @@ LNetNIInit(lnet_pid_t requested_pid)
LASSERT(rc < 0);
mutex_unlock(&the_lnet.ln_api_mutex);
while (!list_empty(&net_head)) {
- struct lnet_ni *ni;
+ struct lnet_net *net;

- ni = list_entry(net_head.next, struct lnet_ni, ni_list);
- list_del_init(&ni->ni_list);
- lnet_ni_free(ni);
+ net = list_entry(net_head.next, struct lnet_net, net_list);
+ list_del_init(&net->net_list);
+ lnet_net_free(net);
}
return rc;
}
@@ -1714,25 +1745,83 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
}
}

+struct lnet_ni *
+lnet_get_ni_idx_locked(int idx)
+{
+ struct lnet_ni *ni;
+ struct lnet_net *net;
+
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ if (idx-- == 0)
+ return ni;
+ }
+ }
+
+ return NULL;
+}
+
+struct lnet_ni *
+lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev)
+{
+ struct lnet_ni *ni;
+ struct lnet_net *net = mynet;
+
+ if (prev == NULL) {
+ if (net == NULL)
+ net = list_entry(the_lnet.ln_nets.next, struct lnet_net,
+ net_list);
+ ni = list_entry(net->net_ni_list.next, struct lnet_ni,
+ ni_netlist);
+
+ return ni;
+ }
+
+ if (prev->ni_netlist.next == &prev->ni_net->net_ni_list) {
+ /* if you reached the end of the ni list and the net is
+ * specified, then there are no more nis in that net */
+ if (net != NULL)
+ return NULL;
+
+ /* we reached the end of this net ni list. move to the
+ * next net */
+ if (prev->ni_net->net_list.next == &the_lnet.ln_nets)
+ /* no more nets and no more NIs. */
+ return NULL;
+
+ /* get the next net */
+ net = list_entry(prev->ni_net->net_list.next, struct lnet_net,
+ net_list);
+ /* get the ni on it */
+ ni = list_entry(net->net_ni_list.next, struct lnet_ni,
+ ni_netlist);
+
+ return ni;
+ }
+
+ /* there are more nis left */
+ ni = list_entry(prev->ni_netlist.next, struct lnet_ni, ni_netlist);
+
+ return ni;
+}
+
static int
lnet_get_net_config(struct lnet_ioctl_config_data *config)
{
struct lnet_ni *ni;
+ int cpt;
int idx = config->cfg_count;
- int cpt, i = 0;
int rc = -ENOENT;

cpt = lnet_net_lock_current();

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- if (i++ != idx)
- continue;
+ ni = lnet_get_ni_idx_locked(idx);

+ if (ni != NULL) {
+ rc = 0;
lnet_ni_lock(ni);
lnet_fill_ni_info(ni, config);
lnet_ni_unlock(ni);
- rc = 0;
- break;
}

lnet_net_unlock(cpt);
@@ -1745,6 +1834,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
char *nets = conf->cfg_config_u.cfg_net.net_intf;
struct lnet_ping_info *pinfo;
struct lnet_handle_md md_handle;
+ struct lnet_net *net;
struct lnet_ni *ni;
struct list_head net_head;
struct lnet_remotenet *rnet;
@@ -1752,7 +1842,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)

INIT_LIST_HEAD(&net_head);

- /* Create a ni structure for the network string */
+ /* Create a net/ni structures for the network string */
rc = lnet_parse_networks(&net_head, nets);
if (rc <= 0)
return !rc ? -EINVAL : rc;
@@ -1760,14 +1850,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
mutex_lock(&the_lnet.ln_api_mutex);

if (rc > 1) {
- rc = -EINVAL; /* only add one interface per call */
+ rc = -EINVAL; /* only add one network per call */
goto failed0;
}

- ni = list_entry(net_head.next, struct lnet_ni, ni_list);
+ net = list_entry(net_head.next, struct lnet_net, net_list);

lnet_net_lock(LNET_LOCK_EX);
- rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid));
+ rnet = lnet_find_net_locked(net->net_id);
lnet_net_unlock(LNET_LOCK_EX);
/*
* make sure that the net added doesn't invalidate the current
@@ -1785,8 +1875,8 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
if (rc)
goto failed0;

- list_del_init(&ni->ni_list);
-
+ list_del_init(&net->net_list);
+ ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
rc = lnet_startup_lndni(ni, conf);
if (rc)
goto failed1;
@@ -1812,9 +1902,9 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
failed0:
mutex_unlock(&the_lnet.ln_api_mutex);
while (!list_empty(&net_head)) {
- ni = list_entry(net_head.next, struct lnet_ni, ni_list);
- list_del_init(&ni->ni_list);
- lnet_ni_free(ni);
+ net = list_entry(net_head.next, struct lnet_net, net_list);
+ list_del_init(&net->net_list);
+ lnet_net_free(net);
}
return rc;
}
@@ -1849,7 +1939,7 @@ lnet_dyn_del_ni(__u32 net)

lnet_shutdown_lndni(ni);

- if (!lnet_count_acceptor_nis())
+ if (!lnet_count_acceptor_nets())
lnet_acceptor_stop();

lnet_ping_target_update(pinfo, md_handle);
@@ -2103,7 +2193,8 @@ EXPORT_SYMBOL(LNetDebugPeer);
int
LNetGetId(unsigned int index, struct lnet_process_id *id)
{
- struct lnet_ni *ni;
+ struct lnet_ni *ni;
+ struct lnet_net  *net;
int cpt;
int rc = -ENOENT;

@@ -2111,14 +2202,16 @@ LNetGetId(unsigned int index, struct lnet_process_id *id)

cpt = lnet_net_lock_current();

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
- if (index--)
- continue;
+ list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ if (index-- != 0)
+ continue;

- id->nid = ni->ni_nid;
- id->pid = the_lnet.ln_pid;
- rc = 0;
- break;
+ id->nid = ni->ni_nid;
+ id->pid = the_lnet.ln_pid;
+ rc = 0;
+ break;
+ }
}

lnet_net_unlock(cpt);
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 5646feeb433e..e83bdbec11e3 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -78,17 +78,17 @@ lnet_issep(char c)
}
}

-int
-lnet_net_unique(__u32 net, struct list_head *nilist)
+bool
+lnet_net_unique(__u32 net, struct list_head *netlist)
{
- struct lnet_ni *ni;
+ struct lnet_net *net_l;

- list_for_each_entry(ni, nilist, ni_list) {
- if (LNET_NIDNET(ni->ni_nid) == net)
- return 0;
+ list_for_each_entry(net_l, netlist, net_list) {
+ if (net_l->net_id == net)
+ return false;
}

- return 1;
+ return true;
}

void
@@ -112,41 +112,78 @@ lnet_ni_free(struct lnet_ni *ni)
if (ni->ni_net_ns)
put_net(ni->ni_net_ns);

- kvfree(ni->ni_net);
kfree(ni);
}

-struct lnet_ni *
-lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
+void
+lnet_net_free(struct lnet_net *net)
{
- struct lnet_tx_queue *tq;
+ struct list_head *tmp, *tmp2;
struct lnet_ni *ni;
- int rc;
- int i;
+
+ /* delete any nis which have been started. */
+ list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
+ ni = list_entry(tmp, struct lnet_ni, ni_netlist);
+ list_del_init(&ni->ni_netlist);
+ lnet_ni_free(ni);
+ }
+
+ kfree(net);
+}
+
+struct lnet_net *
+lnet_net_alloc(__u32 net_id, struct list_head *net_list)
+{
struct lnet_net *net;

- if (!lnet_net_unique(net_id, nilist)) {
- LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
-   libcfs_net2str(net_id));
+ if (!lnet_net_unique(net_id, net_list)) {
+ CERROR("Duplicate net %s. Ignore\n",
+       libcfs_net2str(net_id));
return NULL;
}

- ni = kzalloc(sizeof(*ni), GFP_NOFS);
net = kzalloc(sizeof(*net), GFP_NOFS);
- if (!ni || !net) {
- kfree(ni); kfree(net);
+ if (!net) {
CERROR("Out of memory creating network %s\n",
      libcfs_net2str(net_id));
return NULL;
}
+
+ INIT_LIST_HEAD(&net->net_list);
+ INIT_LIST_HEAD(&net->net_ni_list);
+
+ net->net_id = net_id;
+
/* initialize global paramters to undefiend */
net->net_tunables.lct_peer_timeout = -1;
net->net_tunables.lct_max_tx_credits = -1;
net->net_tunables.lct_peer_tx_credits = -1;
net->net_tunables.lct_peer_rtr_credits = -1;

+ list_add_tail(&net->net_list, net_list);
+
+ return net;
+}
+
+struct lnet_ni *
+lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
+{
+ struct lnet_tx_queue *tq;
+ struct lnet_ni *ni;
+ int rc;
+ int i;
+
+ ni = kzalloc(sizeof(*ni), GFP_KERNEL);
+ if (ni == NULL) {
+ CERROR("Out of memory creating network interface %s%s\n",
+       libcfs_net2str(net->net_id),
+       (iface != NULL) ? iface : "");
+ return NULL;
+ }
+
spin_lock_init(&ni->ni_lock);
INIT_LIST_HEAD(&ni->ni_cptlist);
+ INIT_LIST_HEAD(&ni->ni_netlist);
ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(),
      sizeof(*ni->ni_refs[0]));
if (!ni->ni_refs)
@@ -166,8 +203,9 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
} else {
rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
if (rc <= 0) {
- CERROR("Failed to set CPTs for NI %s: %d\n",
-       libcfs_net2str(net_id), rc);
+ CERROR("Failed to set CPTs for NI %s(%s): %d\n",
+       libcfs_net2str(net->net_id),
+       (iface != NULL) ? iface : "", rc);
goto failed;
}

@@ -182,7 +220,7 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)

ni->ni_net = net;
/* LND will fill in the address part of the NID */
- ni->ni_nid = LNET_MKNID(net_id, 0);
+ ni->ni_nid = LNET_MKNID(net->net_id, 0);

/* Store net namespace in which current ni is being created */
if (current->nsproxy->net_ns)
@@ -191,22 +229,24 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
ni->ni_net_ns = NULL;

ni->ni_last_alive = ktime_get_real_seconds();
- list_add_tail(&ni->ni_list, nilist);
+ list_add_tail(&ni->ni_netlist, &net->net_ni_list);
+
return ni;
- failed:
+failed:
lnet_ni_free(ni);
return NULL;
}

int
-lnet_parse_networks(struct list_head *nilist, char *networks)
+lnet_parse_networks(struct list_head *netlist, char *networks)
{
struct cfs_expr_list *el = NULL;
char *tokens;
char *str;
char *tmp;
- struct lnet_ni *ni;
- __u32 net;
+ struct lnet_net *net;
+ struct lnet_ni *ni = NULL;
+ __u32 net_id;
int nnets = 0;
struct list_head *temp_node;

@@ -275,18 +315,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)

if (comma)
*comma++ = 0;
- net = libcfs_str2net(strim(str));
+ net_id = libcfs_str2net(strim(str));

- if (net == LNET_NIDNET(LNET_NID_ANY)) {
+ if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
LCONSOLE_ERROR_MSG(0x113,
  "Unrecognised network type\n");
tmp = str;
goto failed_syntax;
}

- if (LNET_NETTYP(net) != LOLND && /* LO is implicit */
-    !lnet_ni_alloc(net, el, nilist))
- goto failed;
+ if (LNET_NETTYP(net_id) != LOLND) { /* LO is implicit */
+ net = lnet_net_alloc(net_id, netlist);
+ if (!net ||
+    !lnet_ni_alloc(net, el, NULL))
+ goto failed;
+ }

if (el) {
cfs_expr_list_free(el);
@@ -298,14 +341,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
}

*bracket = 0;
- net = libcfs_str2net(strim(str));
- if (net == LNET_NIDNET(LNET_NID_ANY)) {
+ net_id = libcfs_str2net(strim(str));
+ if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
tmp = str;
goto failed_syntax;
}

- ni = lnet_ni_alloc(net, el, nilist);
- if (!ni)
+ /* always allocate a net, since we will eventually add an
+ * interface to it, or we will fail, in which case we'll
+ * just delete it */
+ net = lnet_net_alloc(net_id, netlist);
+ if (IS_ERR_OR_NULL(net))
+ goto failed;
+
+ ni = lnet_ni_alloc(net, el, NULL);
+ if (IS_ERR_OR_NULL(ni))
goto failed;

if (el) {
@@ -337,7 +387,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
if (niface == LNET_MAX_INTERFACES) {
LCONSOLE_ERROR_MSG(0x115,
  "Too many interfaces for net %s\n",
-   libcfs_net2str(net));
+   libcfs_net2str(net_id));
goto failed;
}

@@ -378,7 +428,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
}
}

- list_for_each(temp_node, nilist)
+ list_for_each(temp_node, netlist)
nnets++;

kfree(tokens);
@@ -387,11 +437,12 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
 failed_syntax:
lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
 failed:
- while (!list_empty(nilist)) {
- ni = list_entry(nilist->next, struct lnet_ni, ni_list);
+ /* free the net list and all the nis on each net */
+ while (!list_empty(netlist)) {
+ net = list_entry(netlist->next, struct lnet_net, net_list);

- list_del(&ni->ni_list);
- lnet_ni_free(ni);
+ list_del_init(&net->net_list);
+ lnet_net_free(net);
}

if (el)
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 1bf12af87a20..1c874025fa74 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -2289,7 +2289,7 @@ EXPORT_SYMBOL(LNetGet);
int
LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
{
- struct lnet_ni *ni;
+ struct lnet_ni *ni = NULL;
struct lnet_remotenet *rnet;
__u32 dstnet = LNET_NIDNET(dstnid);
int hops;
@@ -2307,9 +2307,9 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)

cpt = lnet_net_lock_current();

- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
+ while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
if (ni->ni_nid == dstnid) {
- if (srcnidp)
+ if (srcnidp != NULL)
*srcnidp = dstnid;
if (orderp) {
if (LNET_NETTYP(LNET_NIDNET(dstnid)) == LOLND)
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 0c0ec0b27982..135dfe793b0b 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -245,13 +245,10 @@ static void lnet_shuffle_seed(void)
if (seeded)
return;

- /*
- * Nodes with small feet have little entropy
- * the NID for this node gives the most entropy in the low bits
- */
- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
+ /* Nodes with small feet have little entropy
+ * the NID for this node gives the most entropy in the low bits */
+ while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
__u32 lnd_type, seed;
-
lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
if (lnd_type != LOLND) {
seed = (LNET_NIDADDR(ni->ni_nid) | lnd_type);
@@ -807,8 +804,8 @@ lnet_router_ni_update_locked(struct lnet_peer *gw, __u32 net)
static void
lnet_update_ni_status_locked(void)
{
- struct lnet_ni *ni;
- time64_t now;
+ struct lnet_ni *ni = NULL;
+ time64_t now;
time64_t timeout;

LASSERT(the_lnet.ln_routing);
@@ -817,7 +814,7 @@ lnet_update_ni_status_locked(void)
 max(live_router_check_interval, dead_router_check_interval);

now = ktime_get_real_seconds();
- list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
+ while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
if (ni->ni_net->net_lnd->lnd_type == LOLND)
continue;

diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
index f3ccd6a2b70e..2a366e9a8627 100644
--- a/drivers/staging/lustre/lnet/lnet/router_proc.c
+++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
@@ -641,26 +641,12 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
     "rtr", "max", "tx", "min");
LASSERT(tmpstr + tmpsiz - s > 0);
} else {
- struct list_head *n;
struct lnet_ni *ni = NULL;
int skip = *ppos - 1;

lnet_net_lock(0);

- n = the_lnet.ln_nis.next;
-
- while (n != &the_lnet.ln_nis) {
- struct lnet_ni *a_ni;
-
- a_ni = list_entry(n, struct lnet_ni, ni_list);
- if (!skip) {
- ni = a_ni;
- break;
- }
-
- skip--;
- n = n->next;
- }
+ ni = lnet_get_ni_idx_locked(skip);

if (ni) {
struct lnet_tx_queue *tq;



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/fa61cf3a/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-10 23:04   ` Doug Oucharek
@ 2018-09-10 23:19     ` James Simmons
  2018-09-10 23:19       ` Doug Oucharek
  2018-09-10 23:19     ` James Simmons
  1 sibling, 1 reply; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:19 UTC (permalink / raw)
  To: lustre-devel


> Is the gnilnd module being built upstream?  Just as there were changes to o2iblnd.c and socklnd.c for this change, there should be a corresponding change to gnilnd.c.

No gnilnd in th elinux kernel :-(


> Doug
> 
> > On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb@suse.com> wrote:
> > 
> > Also make some other minor changes to the structures.
> > 
> > This is part of
> >    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
> >       LU-7734 lnet: Multi-Rail local NI split
> > 
> > Signed-off-by: NeilBrown <neilb@suse.com>
> > ---
> > .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
> > .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
> > .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
> > drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
> > drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
> > drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
> > drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
> > drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
> > drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
> > 9 files changed, 35 insertions(+), 32 deletions(-)
> > 
> > diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> > index ead8a4e1125a..e170eb07a5bf 100644
> > --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> > +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> > @@ -262,12 +262,17 @@ struct lnet_net {
> > 	 * shouldn't be reset
> > 	 */
> > 	bool			  net_tunables_set;
> > +	/* procedural interface */
> > +	struct lnet_lnd		*net_lnd;
> > };
> > 
> > struct lnet_ni {
> > -	spinlock_t		  ni_lock;
> > -	struct list_head	  ni_list;	/* chain on ln_nis */
> > -	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
> > +	/* chain on ln_nis */
> > +	struct list_head	  ni_list;
> > +	/* chain on ln_nis_cpt */
> > +	struct list_head	ni_cptlist;
> > +
> > +	spinlock_t		ni_lock;
> > 
> > 	/* number of CPTs */
> > 	int			ni_ncpts;
> > @@ -281,8 +286,6 @@ struct lnet_ni {
> > 	/* instance-specific data */
> > 	void			*ni_data;
> > 
> > -	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
> > -
> > 	/* percpt TX queues */
> > 	struct lnet_tx_queue	**ni_tx_queues;
> > 
> > diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> > index 0d17e22c4401..5e1592b398c1 100644
> > --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> > +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> > @@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
> > 	int rc;
> > 	int newdev;
> > 
> > -	LASSERT(ni->ni_lnd == &the_o2iblnd);
> > +	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
> > 
> > 	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
> > 		rc = kiblnd_base_startup();
> > diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> > index 4ad885f10235..2036a0ae5917 100644
> > --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> > +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> > @@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
> > 	int rc;
> > 	int i;
> > 
> > -	LASSERT(ni->ni_lnd == &the_ksocklnd);
> > +	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
> > 
> > 	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
> > 		rc = ksocknal_base_startup();
> > diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> > index 3ae3ca1311a1..f8c921f0221c 100644
> > --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> > +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> > @@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
> > 		return -EPERM;
> > 	}
> > 
> > -	if (!ni->ni_lnd->lnd_accept) {
> > +	if (!ni->ni_net->net_lnd->lnd_accept) {
> > 		/* This catches a request for the loopback LND */
> > 		lnet_ni_decref(ni);
> > 		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
> > @@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
> > 	CDEBUG(D_NET, "Accept %s from %pI4h\n",
> > 	       libcfs_nid2str(cr.acr_nid), &peer_ip);
> > 
> > -	rc = ni->ni_lnd->lnd_accept(ni, sock);
> > +	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
> > 
> > 	lnet_ni_decref(ni);
> > 	return rc;
> > diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> > index cd4189fa7acb..0896e75bc3d7 100644
> > --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> > +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> > @@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
> > 
> > 	cpt = lnet_net_lock_current();
> > 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> > -		if (ni->ni_lnd->lnd_accept)
> > +		if (ni->ni_net->net_lnd->lnd_accept)
> > 			count++;
> > 	}
> > 
> > @@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
> > 			continue;
> > 		}
> > 
> > -		ni->ni_lnd->lnd_refcount--;
> > +		ni->ni_net->net_lnd->lnd_refcount--;
> > 		lnet_net_unlock(LNET_LOCK_EX);
> > 
> > -		islo = ni->ni_lnd->lnd_type == LOLND;
> > +		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
> > 
> > 		LASSERT(!in_interrupt());
> > -		ni->ni_lnd->lnd_shutdown(ni);
> > +		ni->ni_net->net_lnd->lnd_shutdown(ni);
> > 
> > 		/*
> > 		 * can't deref lnd anymore now; it might have unregistered
> > @@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
> > 	lnd->lnd_refcount++;
> > 	lnet_net_unlock(LNET_LOCK_EX);
> > 
> > -	ni->ni_lnd = lnd;
> > +	ni->ni_net->net_lnd = lnd;
> > 
> > 	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
> > 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
> > @@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> > 	if (rc)
> > 		goto failed1;
> > 
> > -	if (ni->ni_lnd->lnd_accept) {
> > +	if (ni->ni_net->net_lnd->lnd_accept) {
> > 		rc = lnet_acceptor_start();
> > 		if (rc < 0) {
> > 			/* shutdown the ni that we just started */
> > @@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
> > 		if (!ni)
> > 			return -EINVAL;
> > 
> > -		if (!ni->ni_lnd->lnd_ctl)
> > +		if (!ni->ni_net->net_lnd->lnd_ctl)
> > 			rc = -EINVAL;
> > 		else
> > -			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
> > +			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
> > 
> > 		lnet_ni_decref(ni);
> > 		return rc;
> > diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> > index f186e6a16d34..1bf12af87a20 100644
> > --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> > +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> > @@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
> > 		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
> > 		iov_iter_advance(&to, offset);
> > 	}
> > -	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> > +	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> > 	if (rc < 0)
> > 		lnet_finalize(ni, msg, rc);
> > }
> > @@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
> > 	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
> > 		(msg->msg_txcredit && msg->msg_peertxcredit));
> > 
> > -	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
> > +	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
> > 	if (rc < 0)
> > 		lnet_finalize(ni, msg, rc);
> > }
> > @@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
> > 	LASSERT(!msg->msg_sending);
> > 	LASSERT(msg->msg_receiving);
> > 	LASSERT(!msg->msg_rx_ready_delay);
> > -	LASSERT(ni->ni_lnd->lnd_eager_recv);
> > +	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
> > 
> > 	msg->msg_rx_ready_delay = 1;
> > -	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> > +	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> > 					&msg->msg_private);
> > 	if (rc) {
> > 		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
> > @@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
> > 	time64_t last_alive = 0;
> > 
> > 	LASSERT(lnet_peer_aliveness_enabled(lp));
> > -	LASSERT(ni->ni_lnd->lnd_query);
> > +	LASSERT(ni->ni_net->net_lnd->lnd_query);
> > 
> > 	lnet_net_unlock(lp->lp_cpt);
> > -	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> > +	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> > 	lnet_net_lock(lp->lp_cpt);
> > 
> > 	lp->lp_last_query = ktime_get_seconds();
> > @@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
> > 	info.mi_roffset	= hdr->msg.put.offset;
> > 	info.mi_mbits	= hdr->msg.put.match_bits;
> > 
> > -	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
> > +	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
> > 	ready_delay = msg->msg_rx_ready_delay;
> > 
> >  again:
> > @@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
> > 
> > 	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
> > 	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
> > -		if (!ni->ni_lnd->lnd_eager_recv) {
> > +		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
> > 			msg->msg_rx_ready_delay = 1;
> > 		} else {
> > 			lnet_net_unlock(msg->msg_rx_cpt);
> > diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
> > index eb14146bd879..8167980c2323 100644
> > --- a/drivers/staging/lustre/lnet/lnet/lo.c
> > +++ b/drivers/staging/lustre/lnet/lnet/lo.c
> > @@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
> > static int
> > lolnd_startup(struct lnet_ni *ni)
> > {
> > -	LASSERT(ni->ni_lnd == &the_lolnd);
> > +	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
> > 	LASSERT(!lolnd_instanced);
> > 	lolnd_instanced = 1;
> > 
> > diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> > index 7d61c5d71426..0c0ec0b27982 100644
> > --- a/drivers/staging/lustre/lnet/lnet/router.c
> > +++ b/drivers/staging/lustre/lnet/lnet/router.c
> > @@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
> > 		lp->lp_notifylnd = 0;
> > 		lp->lp_notify    = 0;
> > 
> > -		if (notifylnd && ni->ni_lnd->lnd_notify) {
> > +		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
> > 			lnet_net_unlock(lp->lp_cpt);
> > 
> > 			/*
> > 			 * A new notification could happen now; I'll handle it
> > 			 * when control returns to me
> > 			 */
> > -			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
> > +			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
> > 
> > 			lnet_net_lock(lp->lp_cpt);
> > 		}
> > @@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
> > 		lnet_net_unlock(LNET_LOCK_EX);
> > 
> > 		/* XXX Assume alive */
> > -		if (ni->ni_lnd->lnd_notify)
> > -			ni->ni_lnd->lnd_notify(ni, gateway, 1);
> > +		if (ni->ni_net->net_lnd->lnd_notify)
> > +			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
> > 
> > 		lnet_net_lock(LNET_LOCK_EX);
> > 	}
> > @@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
> > 
> > 	now = ktime_get_real_seconds();
> > 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> > -		if (ni->ni_lnd->lnd_type == LOLND)
> > +		if (ni->ni_net->net_lnd->lnd_type == LOLND)
> > 			continue;
> > 
> > 		if (now < ni->ni_last_alive + timeout)
> > diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> > index 19cea7076057..f3ccd6a2b70e 100644
> > --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> > +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> > @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
> > 				last_alive = now - ni->ni_last_alive;
> > 
> > 			/* @lo forever alive */
> > -			if (ni->ni_lnd->lnd_type == LOLND)
> > +			if (ni->ni_net->net_lnd->lnd_type == LOLND)
> > 				last_alive = 0;
> > 
> > 			lnet_ni_lock(ni);
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-10 23:04   ` Doug Oucharek
  2018-09-10 23:19     ` James Simmons
@ 2018-09-10 23:19     ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:19 UTC (permalink / raw)
  To: lustre-devel


> Is the gnilnd module being built upstream?  Just as there were changes to o2iblnd.c and socklnd.c for this change, there should be a corresponding change to gnilnd.c.

No gnilnd in th elinux kernel :-(


> Doug
> 
> > On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb@suse.com> wrote:
> > 
> > Also make some other minor changes to the structures.
> > 
> > This is part of
> >    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
> >       LU-7734 lnet: Multi-Rail local NI split
> > 
> > Signed-off-by: NeilBrown <neilb@suse.com>
> > ---
> > .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
> > .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
> > .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
> > drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
> > drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
> > drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
> > drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
> > drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
> > drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
> > 9 files changed, 35 insertions(+), 32 deletions(-)
> > 
> > diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> > index ead8a4e1125a..e170eb07a5bf 100644
> > --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> > +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> > @@ -262,12 +262,17 @@ struct lnet_net {
> > 	 * shouldn't be reset
> > 	 */
> > 	bool			  net_tunables_set;
> > +	/* procedural interface */
> > +	struct lnet_lnd		*net_lnd;
> > };
> > 
> > struct lnet_ni {
> > -	spinlock_t		  ni_lock;
> > -	struct list_head	  ni_list;	/* chain on ln_nis */
> > -	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
> > +	/* chain on ln_nis */
> > +	struct list_head	  ni_list;
> > +	/* chain on ln_nis_cpt */
> > +	struct list_head	ni_cptlist;
> > +
> > +	spinlock_t		ni_lock;
> > 
> > 	/* number of CPTs */
> > 	int			ni_ncpts;
> > @@ -281,8 +286,6 @@ struct lnet_ni {
> > 	/* instance-specific data */
> > 	void			*ni_data;
> > 
> > -	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
> > -
> > 	/* percpt TX queues */
> > 	struct lnet_tx_queue	**ni_tx_queues;
> > 
> > diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> > index 0d17e22c4401..5e1592b398c1 100644
> > --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> > +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> > @@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
> > 	int rc;
> > 	int newdev;
> > 
> > -	LASSERT(ni->ni_lnd == &the_o2iblnd);
> > +	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
> > 
> > 	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
> > 		rc = kiblnd_base_startup();
> > diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> > index 4ad885f10235..2036a0ae5917 100644
> > --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> > +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> > @@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
> > 	int rc;
> > 	int i;
> > 
> > -	LASSERT(ni->ni_lnd == &the_ksocklnd);
> > +	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
> > 
> > 	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
> > 		rc = ksocknal_base_startup();
> > diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> > index 3ae3ca1311a1..f8c921f0221c 100644
> > --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> > +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> > @@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
> > 		return -EPERM;
> > 	}
> > 
> > -	if (!ni->ni_lnd->lnd_accept) {
> > +	if (!ni->ni_net->net_lnd->lnd_accept) {
> > 		/* This catches a request for the loopback LND */
> > 		lnet_ni_decref(ni);
> > 		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
> > @@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
> > 	CDEBUG(D_NET, "Accept %s from %pI4h\n",
> > 	       libcfs_nid2str(cr.acr_nid), &peer_ip);
> > 
> > -	rc = ni->ni_lnd->lnd_accept(ni, sock);
> > +	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
> > 
> > 	lnet_ni_decref(ni);
> > 	return rc;
> > diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> > index cd4189fa7acb..0896e75bc3d7 100644
> > --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> > +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> > @@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
> > 
> > 	cpt = lnet_net_lock_current();
> > 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> > -		if (ni->ni_lnd->lnd_accept)
> > +		if (ni->ni_net->net_lnd->lnd_accept)
> > 			count++;
> > 	}
> > 
> > @@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
> > 			continue;
> > 		}
> > 
> > -		ni->ni_lnd->lnd_refcount--;
> > +		ni->ni_net->net_lnd->lnd_refcount--;
> > 		lnet_net_unlock(LNET_LOCK_EX);
> > 
> > -		islo = ni->ni_lnd->lnd_type == LOLND;
> > +		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
> > 
> > 		LASSERT(!in_interrupt());
> > -		ni->ni_lnd->lnd_shutdown(ni);
> > +		ni->ni_net->net_lnd->lnd_shutdown(ni);
> > 
> > 		/*
> > 		 * can't deref lnd anymore now; it might have unregistered
> > @@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
> > 	lnd->lnd_refcount++;
> > 	lnet_net_unlock(LNET_LOCK_EX);
> > 
> > -	ni->ni_lnd = lnd;
> > +	ni->ni_net->net_lnd = lnd;
> > 
> > 	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
> > 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
> > @@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> > 	if (rc)
> > 		goto failed1;
> > 
> > -	if (ni->ni_lnd->lnd_accept) {
> > +	if (ni->ni_net->net_lnd->lnd_accept) {
> > 		rc = lnet_acceptor_start();
> > 		if (rc < 0) {
> > 			/* shutdown the ni that we just started */
> > @@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
> > 		if (!ni)
> > 			return -EINVAL;
> > 
> > -		if (!ni->ni_lnd->lnd_ctl)
> > +		if (!ni->ni_net->net_lnd->lnd_ctl)
> > 			rc = -EINVAL;
> > 		else
> > -			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
> > +			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
> > 
> > 		lnet_ni_decref(ni);
> > 		return rc;
> > diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> > index f186e6a16d34..1bf12af87a20 100644
> > --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> > +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> > @@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
> > 		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
> > 		iov_iter_advance(&to, offset);
> > 	}
> > -	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> > +	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> > 	if (rc < 0)
> > 		lnet_finalize(ni, msg, rc);
> > }
> > @@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
> > 	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
> > 		(msg->msg_txcredit && msg->msg_peertxcredit));
> > 
> > -	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
> > +	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
> > 	if (rc < 0)
> > 		lnet_finalize(ni, msg, rc);
> > }
> > @@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
> > 	LASSERT(!msg->msg_sending);
> > 	LASSERT(msg->msg_receiving);
> > 	LASSERT(!msg->msg_rx_ready_delay);
> > -	LASSERT(ni->ni_lnd->lnd_eager_recv);
> > +	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
> > 
> > 	msg->msg_rx_ready_delay = 1;
> > -	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> > +	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> > 					&msg->msg_private);
> > 	if (rc) {
> > 		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
> > @@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
> > 	time64_t last_alive = 0;
> > 
> > 	LASSERT(lnet_peer_aliveness_enabled(lp));
> > -	LASSERT(ni->ni_lnd->lnd_query);
> > +	LASSERT(ni->ni_net->net_lnd->lnd_query);
> > 
> > 	lnet_net_unlock(lp->lp_cpt);
> > -	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> > +	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> > 	lnet_net_lock(lp->lp_cpt);
> > 
> > 	lp->lp_last_query = ktime_get_seconds();
> > @@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
> > 	info.mi_roffset	= hdr->msg.put.offset;
> > 	info.mi_mbits	= hdr->msg.put.match_bits;
> > 
> > -	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
> > +	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
> > 	ready_delay = msg->msg_rx_ready_delay;
> > 
> >  again:
> > @@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
> > 
> > 	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
> > 	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
> > -		if (!ni->ni_lnd->lnd_eager_recv) {
> > +		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
> > 			msg->msg_rx_ready_delay = 1;
> > 		} else {
> > 			lnet_net_unlock(msg->msg_rx_cpt);
> > diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
> > index eb14146bd879..8167980c2323 100644
> > --- a/drivers/staging/lustre/lnet/lnet/lo.c
> > +++ b/drivers/staging/lustre/lnet/lnet/lo.c
> > @@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
> > static int
> > lolnd_startup(struct lnet_ni *ni)
> > {
> > -	LASSERT(ni->ni_lnd == &the_lolnd);
> > +	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
> > 	LASSERT(!lolnd_instanced);
> > 	lolnd_instanced = 1;
> > 
> > diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> > index 7d61c5d71426..0c0ec0b27982 100644
> > --- a/drivers/staging/lustre/lnet/lnet/router.c
> > +++ b/drivers/staging/lustre/lnet/lnet/router.c
> > @@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
> > 		lp->lp_notifylnd = 0;
> > 		lp->lp_notify    = 0;
> > 
> > -		if (notifylnd && ni->ni_lnd->lnd_notify) {
> > +		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
> > 			lnet_net_unlock(lp->lp_cpt);
> > 
> > 			/*
> > 			 * A new notification could happen now; I'll handle it
> > 			 * when control returns to me
> > 			 */
> > -			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
> > +			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
> > 
> > 			lnet_net_lock(lp->lp_cpt);
> > 		}
> > @@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
> > 		lnet_net_unlock(LNET_LOCK_EX);
> > 
> > 		/* XXX Assume alive */
> > -		if (ni->ni_lnd->lnd_notify)
> > -			ni->ni_lnd->lnd_notify(ni, gateway, 1);
> > +		if (ni->ni_net->net_lnd->lnd_notify)
> > +			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
> > 
> > 		lnet_net_lock(LNET_LOCK_EX);
> > 	}
> > @@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
> > 
> > 	now = ktime_get_real_seconds();
> > 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> > -		if (ni->ni_lnd->lnd_type == LOLND)
> > +		if (ni->ni_net->net_lnd->lnd_type == LOLND)
> > 			continue;
> > 
> > 		if (now < ni->ni_last_alive + timeout)
> > diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> > index 19cea7076057..f3ccd6a2b70e 100644
> > --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> > +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> > @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
> > 				last_alive = now - ni->ni_last_alive;
> > 
> > 			/* @lo forever alive */
> > -			if (ni->ni_lnd->lnd_type == LOLND)
> > +			if (ni->ni_net->net_lnd->lnd_type == LOLND)
> > 				last_alive = 0;
> > 
> > 			lnet_ni_lock(ni);
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-10 23:19     ` James Simmons
@ 2018-09-10 23:19       ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:19 UTC (permalink / raw)
  To: lustre-devel

Ok.  I will ignore gnilnd in my review.

Doug

> On Sep 10, 2018, at 4:19 PM, James Simmons <jsimmons@infradead.org> wrote:
> 
> 
>> Is the gnilnd module being built upstream?  Just as there were changes to o2iblnd.c and socklnd.c for this change, there should be a corresponding change to gnilnd.c.
> 
> No gnilnd in th elinux kernel :-(
> 
> 
>> Doug
>> 
>>> On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb@suse.com> wrote:
>>> 
>>> Also make some other minor changes to the structures.
>>> 
>>> This is part of
>>>   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>>>      LU-7734 lnet: Multi-Rail local NI split
>>> 
>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>> ---
>>> .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
>>> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
>>> .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
>>> drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
>>> drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
>>> drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
>>> drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
>>> drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
>>> drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
>>> 9 files changed, 35 insertions(+), 32 deletions(-)
>>> 
>>> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>>> index ead8a4e1125a..e170eb07a5bf 100644
>>> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
>>> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>>> @@ -262,12 +262,17 @@ struct lnet_net {
>>> 	 * shouldn't be reset
>>> 	 */
>>> 	bool			  net_tunables_set;
>>> +	/* procedural interface */
>>> +	struct lnet_lnd		*net_lnd;
>>> };
>>> 
>>> struct lnet_ni {
>>> -	spinlock_t		  ni_lock;
>>> -	struct list_head	  ni_list;	/* chain on ln_nis */
>>> -	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
>>> +	/* chain on ln_nis */
>>> +	struct list_head	  ni_list;
>>> +	/* chain on ln_nis_cpt */
>>> +	struct list_head	ni_cptlist;
>>> +
>>> +	spinlock_t		ni_lock;
>>> 
>>> 	/* number of CPTs */
>>> 	int			ni_ncpts;
>>> @@ -281,8 +286,6 @@ struct lnet_ni {
>>> 	/* instance-specific data */
>>> 	void			*ni_data;
>>> 
>>> -	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
>>> -
>>> 	/* percpt TX queues */
>>> 	struct lnet_tx_queue	**ni_tx_queues;
>>> 
>>> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>>> index 0d17e22c4401..5e1592b398c1 100644
>>> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>>> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>>> @@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
>>> 	int rc;
>>> 	int newdev;
>>> 
>>> -	LASSERT(ni->ni_lnd == &the_o2iblnd);
>>> +	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
>>> 
>>> 	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
>>> 		rc = kiblnd_base_startup();
>>> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
>>> index 4ad885f10235..2036a0ae5917 100644
>>> --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
>>> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
>>> @@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
>>> 	int rc;
>>> 	int i;
>>> 
>>> -	LASSERT(ni->ni_lnd == &the_ksocklnd);
>>> +	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
>>> 
>>> 	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
>>> 		rc = ksocknal_base_startup();
>>> diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
>>> index 3ae3ca1311a1..f8c921f0221c 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
>>> @@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
>>> 		return -EPERM;
>>> 	}
>>> 
>>> -	if (!ni->ni_lnd->lnd_accept) {
>>> +	if (!ni->ni_net->net_lnd->lnd_accept) {
>>> 		/* This catches a request for the loopback LND */
>>> 		lnet_ni_decref(ni);
>>> 		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
>>> @@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
>>> 	CDEBUG(D_NET, "Accept %s from %pI4h\n",
>>> 	       libcfs_nid2str(cr.acr_nid), &peer_ip);
>>> 
>>> -	rc = ni->ni_lnd->lnd_accept(ni, sock);
>>> +	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
>>> 
>>> 	lnet_ni_decref(ni);
>>> 	return rc;
>>> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
>>> index cd4189fa7acb..0896e75bc3d7 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
>>> @@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
>>> 
>>> 	cpt = lnet_net_lock_current();
>>> 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
>>> -		if (ni->ni_lnd->lnd_accept)
>>> +		if (ni->ni_net->net_lnd->lnd_accept)
>>> 			count++;
>>> 	}
>>> 
>>> @@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
>>> 			continue;
>>> 		}
>>> 
>>> -		ni->ni_lnd->lnd_refcount--;
>>> +		ni->ni_net->net_lnd->lnd_refcount--;
>>> 		lnet_net_unlock(LNET_LOCK_EX);
>>> 
>>> -		islo = ni->ni_lnd->lnd_type == LOLND;
>>> +		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
>>> 
>>> 		LASSERT(!in_interrupt());
>>> -		ni->ni_lnd->lnd_shutdown(ni);
>>> +		ni->ni_net->net_lnd->lnd_shutdown(ni);
>>> 
>>> 		/*
>>> 		 * can't deref lnd anymore now; it might have unregistered
>>> @@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>>> 	lnd->lnd_refcount++;
>>> 	lnet_net_unlock(LNET_LOCK_EX);
>>> 
>>> -	ni->ni_lnd = lnd;
>>> +	ni->ni_net->net_lnd = lnd;
>>> 
>>> 	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
>>> 		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
>>> @@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>>> 	if (rc)
>>> 		goto failed1;
>>> 
>>> -	if (ni->ni_lnd->lnd_accept) {
>>> +	if (ni->ni_net->net_lnd->lnd_accept) {
>>> 		rc = lnet_acceptor_start();
>>> 		if (rc < 0) {
>>> 			/* shutdown the ni that we just started */
>>> @@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
>>> 		if (!ni)
>>> 			return -EINVAL;
>>> 
>>> -		if (!ni->ni_lnd->lnd_ctl)
>>> +		if (!ni->ni_net->net_lnd->lnd_ctl)
>>> 			rc = -EINVAL;
>>> 		else
>>> -			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
>>> +			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
>>> 
>>> 		lnet_ni_decref(ni);
>>> 		return rc;
>>> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
>>> index f186e6a16d34..1bf12af87a20 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
>>> @@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
>>> 		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
>>> 		iov_iter_advance(&to, offset);
>>> 	}
>>> -	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
>>> +	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
>>> 	if (rc < 0)
>>> 		lnet_finalize(ni, msg, rc);
>>> }
>>> @@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
>>> 	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
>>> 		(msg->msg_txcredit && msg->msg_peertxcredit));
>>> 
>>> -	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
>>> +	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
>>> 	if (rc < 0)
>>> 		lnet_finalize(ni, msg, rc);
>>> }
>>> @@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
>>> 	LASSERT(!msg->msg_sending);
>>> 	LASSERT(msg->msg_receiving);
>>> 	LASSERT(!msg->msg_rx_ready_delay);
>>> -	LASSERT(ni->ni_lnd->lnd_eager_recv);
>>> +	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
>>> 
>>> 	msg->msg_rx_ready_delay = 1;
>>> -	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
>>> +	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
>>> 					&msg->msg_private);
>>> 	if (rc) {
>>> 		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
>>> @@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>>> 	time64_t last_alive = 0;
>>> 
>>> 	LASSERT(lnet_peer_aliveness_enabled(lp));
>>> -	LASSERT(ni->ni_lnd->lnd_query);
>>> +	LASSERT(ni->ni_net->net_lnd->lnd_query);
>>> 
>>> 	lnet_net_unlock(lp->lp_cpt);
>>> -	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
>>> +	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
>>> 	lnet_net_lock(lp->lp_cpt);
>>> 
>>> 	lp->lp_last_query = ktime_get_seconds();
>>> @@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
>>> 	info.mi_roffset	= hdr->msg.put.offset;
>>> 	info.mi_mbits	= hdr->msg.put.match_bits;
>>> 
>>> -	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
>>> +	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
>>> 	ready_delay = msg->msg_rx_ready_delay;
>>> 
>>> again:
>>> @@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
>>> 
>>> 	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
>>> 	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
>>> -		if (!ni->ni_lnd->lnd_eager_recv) {
>>> +		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
>>> 			msg->msg_rx_ready_delay = 1;
>>> 		} else {
>>> 			lnet_net_unlock(msg->msg_rx_cpt);
>>> diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
>>> index eb14146bd879..8167980c2323 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/lo.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/lo.c
>>> @@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
>>> static int
>>> lolnd_startup(struct lnet_ni *ni)
>>> {
>>> -	LASSERT(ni->ni_lnd == &the_lolnd);
>>> +	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
>>> 	LASSERT(!lolnd_instanced);
>>> 	lolnd_instanced = 1;
>>> 
>>> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
>>> index 7d61c5d71426..0c0ec0b27982 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/router.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/router.c
>>> @@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>>> 		lp->lp_notifylnd = 0;
>>> 		lp->lp_notify    = 0;
>>> 
>>> -		if (notifylnd && ni->ni_lnd->lnd_notify) {
>>> +		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
>>> 			lnet_net_unlock(lp->lp_cpt);
>>> 
>>> 			/*
>>> 			 * A new notification could happen now; I'll handle it
>>> 			 * when control returns to me
>>> 			 */
>>> -			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
>>> +			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
>>> 
>>> 			lnet_net_lock(lp->lp_cpt);
>>> 		}
>>> @@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
>>> 		lnet_net_unlock(LNET_LOCK_EX);
>>> 
>>> 		/* XXX Assume alive */
>>> -		if (ni->ni_lnd->lnd_notify)
>>> -			ni->ni_lnd->lnd_notify(ni, gateway, 1);
>>> +		if (ni->ni_net->net_lnd->lnd_notify)
>>> +			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
>>> 
>>> 		lnet_net_lock(LNET_LOCK_EX);
>>> 	}
>>> @@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
>>> 
>>> 	now = ktime_get_real_seconds();
>>> 	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
>>> -		if (ni->ni_lnd->lnd_type == LOLND)
>>> +		if (ni->ni_net->net_lnd->lnd_type == LOLND)
>>> 			continue;
>>> 
>>> 		if (now < ni->ni_last_alive + timeout)
>>> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
>>> index 19cea7076057..f3ccd6a2b70e 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
>>> @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
>>> 				last_alive = now - ni->ni_last_alive;
>>> 
>>> 			/* @lo forever alive */
>>> -			if (ni->ni_lnd->lnd_type == LOLND)
>>> +			if (ni->ni_net->net_lnd->lnd_type == LOLND)
>>> 				last_alive = 0;
>>> 
>>> 			lnet_ni_lock(ni);
>>> 
>>> 
>> 
>> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net
  2018-09-07  0:49 ` [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net NeilBrown
  2018-09-10 22:56   ` Doug Oucharek
@ 2018-09-10 23:23   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:23 UTC (permalink / raw)
  To: lustre-devel


> This will contain some fields from lnet_ni, to be shared
> between multiple ni on the one network.
> 
> For now, only tunables are moved across, using
>  struct lnet_ioctl_config_lnd_cmn_tunables
> which is changed to use signed values so -1 can be stored.
> -1 means "no value"
> If the tunables haven't been initialised, then net_tunables_set is
> false.  Previously a NULL pointer had this meaning.
> 
> A 'struct lnet_net' is allocated as part of lnet_ni_alloc(), and freed
> by lnet_ni_free().

Acked-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.

> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |   25 ++++++--
>  .../lustre/include/uapi/linux/lnet/lnet-dlc.h      |    8 +--
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 -
>  .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   61 +++++++++++---------
>  .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |   19 ++++--
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |   45 +++++++++------
>  drivers/staging/lustre/lnet/lnet/config.c          |   24 ++++++--
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |    5 +-
>  drivers/staging/lustre/lnet/lnet/peer.c            |    9 ++-
>  drivers/staging/lustre/lnet/lnet/router.c          |    8 ++-
>  drivers/staging/lustre/lnet/lnet/router_proc.c     |    6 +-
>  11 files changed, 129 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 078bc97a9ebf..ead8a4e1125a 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -43,6 +43,7 @@
>  
>  #include <uapi/linux/lnet/lnet-types.h>
>  #include <uapi/linux/lnet/lnetctl.h>
> +#include <uapi/linux/lnet/lnet-dlc.h>
>  
>  /* Max payload size */
>  #define LNET_MAX_PAYLOAD      CONFIG_LNET_MAX_PAYLOAD
> @@ -252,17 +253,22 @@ struct lnet_tx_queue {
>  	struct list_head	tq_delayed;	/* delayed TXs */
>  };
>  
> +struct lnet_net {
> +	/* network tunables */
> +	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
> +
> +	/*
> +	 * boolean to indicate that the tunables have been set and
> +	 * shouldn't be reset
> +	 */
> +	bool			  net_tunables_set;
> +};
> +
>  struct lnet_ni {
>  	spinlock_t		  ni_lock;
>  	struct list_head	  ni_list;	/* chain on ln_nis */
>  	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
> -	int			  ni_maxtxcredits; /* # tx credits  */
> -	/* # per-peer send credits */
> -	int			  ni_peertxcredits;
> -	/* # per-peer router buffer credits */
> -	int			  ni_peerrtrcredits;
> -	/* seconds to consider peer dead */
> -	int			  ni_peertimeout;
> +
>  	/* number of CPTs */
>  	int			ni_ncpts;
>  
> @@ -286,6 +292,9 @@ struct lnet_ni {
>  	/* when I was last alive */
>  	time64_t		ni_last_alive;
>  
> +	/* pointer to parent network */
> +	struct lnet_net		*ni_net;
> +
>  	/* my health status */
>  	struct lnet_ni_status	*ni_status;
>  
> @@ -397,7 +406,7 @@ struct lnet_peer_table {
>   * lnet_ni::ni_peertimeout has been set to a positive value
>   */
>  #define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing && \
> -					 (lp)->lp_ni->ni_peertimeout > 0)
> +					 (lp)->lp_ni->ni_net->net_tunables.lct_peer_timeout > 0)
>  
>  struct lnet_route {
>  	struct list_head	 lr_list;	/* chain on net */
> diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
> index c1619f411d81..a8eb3b8f9fd7 100644
> --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
> +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
> @@ -39,10 +39,10 @@
>  
>  struct lnet_ioctl_config_lnd_cmn_tunables {
>  	__u32 lct_version;
> -	__u32 lct_peer_timeout;
> -	__u32 lct_peer_tx_credits;
> -	__u32 lct_peer_rtr_credits;
> -	__u32 lct_max_tx_credits;
> +	__s32 lct_peer_timeout;
> +	__s32 lct_peer_tx_credits;
> +	__s32 lct_peer_rtr_credits;
> +	__s32 lct_max_tx_credits;
>  };
>  
>  struct lnet_ioctl_config_o2iblnd_tunables {
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index f496e6fcc416..0d17e22c4401 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -337,7 +337,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer **peerp,
>  	peer->ibp_error = 0;
>  	peer->ibp_last_alive = 0;
>  	peer->ibp_max_frags = kiblnd_cfg_rdma_frags(peer->ibp_ni);
> -	peer->ibp_queue_depth = ni->ni_peertxcredits;
> +	peer->ibp_queue_depth = ni->ni_net->net_tunables.lct_peer_tx_credits;
>  	atomic_set(&peer->ibp_refcount, 1);  /* 1 ref for caller */
>  
>  	INIT_LIST_HEAD(&peer->ibp_list);     /* not in the peer table yet */
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
> index 39d07926d603..a1aca4dda38f 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c
> @@ -171,7 +171,7 @@ int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
>  	if (version == IBLND_MSG_VERSION_1)
>  		return IBLND_MSG_QUEUE_SIZE_V1;
>  	else if (ni)
> -		return ni->ni_peertxcredits;
> +		return ni->ni_net->net_tunables.lct_peer_tx_credits;
>  	else
>  		return peer_credits;
>  }
> @@ -179,6 +179,7 @@ int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
>  int kiblnd_tunables_setup(struct lnet_ni *ni)
>  {
>  	struct lnet_ioctl_config_o2iblnd_tunables *tunables;
> +	struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables;
>  
>  	/*
>  	 * if there was no tunables specified, setup the tunables to be
> @@ -204,35 +205,39 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
>  		return -EINVAL;
>  	}
>  
> -	if (!ni->ni_peertimeout)
> -		ni->ni_peertimeout = peer_timeout;
> +	net_tunables = &ni->ni_net->net_tunables;
>  
> -	if (!ni->ni_maxtxcredits)
> -		ni->ni_maxtxcredits = credits;
> +	if (net_tunables->lct_peer_timeout == -1)
> +		net_tunables->lct_peer_timeout = peer_timeout;
>  
> -	if (!ni->ni_peertxcredits)
> -		ni->ni_peertxcredits = peer_credits;
> +	if (net_tunables->lct_max_tx_credits == -1)
> +		net_tunables->lct_max_tx_credits = credits;
>  
> -	if (!ni->ni_peerrtrcredits)
> -		ni->ni_peerrtrcredits = peer_buffer_credits;
> +	if (net_tunables->lct_peer_tx_credits == -1)
> +		net_tunables->lct_peer_tx_credits = peer_credits;
>  
> -	if (ni->ni_peertxcredits < IBLND_CREDITS_DEFAULT)
> -		ni->ni_peertxcredits = IBLND_CREDITS_DEFAULT;
> +	if (net_tunables->lct_peer_rtr_credits == -1)
> +		net_tunables->lct_peer_rtr_credits = peer_buffer_credits;
>  
> -	if (ni->ni_peertxcredits > IBLND_CREDITS_MAX)
> -		ni->ni_peertxcredits = IBLND_CREDITS_MAX;
> +	if (net_tunables->lct_peer_tx_credits < IBLND_CREDITS_DEFAULT)
> +		net_tunables->lct_peer_tx_credits = IBLND_CREDITS_DEFAULT;
>  
> -	if (ni->ni_peertxcredits > credits)
> -		ni->ni_peertxcredits = credits;
> +	if (net_tunables->lct_peer_tx_credits > IBLND_CREDITS_MAX)
> +		net_tunables->lct_peer_tx_credits = IBLND_CREDITS_MAX;
> +
> +	if (net_tunables->lct_peer_tx_credits >
> +	    net_tunables->lct_max_tx_credits)
> +		net_tunables->lct_peer_tx_credits =
> +			net_tunables->lct_max_tx_credits;
>  
>  	if (!tunables->lnd_peercredits_hiw)
>  		tunables->lnd_peercredits_hiw = peer_credits_hiw;
>  
> -	if (tunables->lnd_peercredits_hiw < ni->ni_peertxcredits / 2)
> -		tunables->lnd_peercredits_hiw = ni->ni_peertxcredits / 2;
> +	if (tunables->lnd_peercredits_hiw < net_tunables->lct_peer_tx_credits / 2)
> +		tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits / 2;
>  
> -	if (tunables->lnd_peercredits_hiw >= ni->ni_peertxcredits)
> -		tunables->lnd_peercredits_hiw = ni->ni_peertxcredits - 1;
> +	if (tunables->lnd_peercredits_hiw >= net_tunables->lct_peer_tx_credits)
> +		tunables->lnd_peercredits_hiw = net_tunables->lct_peer_tx_credits - 1;
>  
>  	if (tunables->lnd_map_on_demand <= 0 ||
>  	    tunables->lnd_map_on_demand > IBLND_MAX_RDMA_FRAGS) {
> @@ -252,21 +257,23 @@ int kiblnd_tunables_setup(struct lnet_ni *ni)
>  		if (tunables->lnd_map_on_demand > 0 &&
>  		    tunables->lnd_map_on_demand <= IBLND_MAX_RDMA_FRAGS / 8) {
>  			tunables->lnd_concurrent_sends =
> -						ni->ni_peertxcredits * 2;
> +					net_tunables->lct_peer_tx_credits * 2;
>  		} else {
> -			tunables->lnd_concurrent_sends = ni->ni_peertxcredits;
> +			tunables->lnd_concurrent_sends =
> +				net_tunables->lct_peer_tx_credits;
>  		}
>  	}
>  
> -	if (tunables->lnd_concurrent_sends > ni->ni_peertxcredits * 2)
> -		tunables->lnd_concurrent_sends = ni->ni_peertxcredits * 2;
> +	if (tunables->lnd_concurrent_sends > net_tunables->lct_peer_tx_credits * 2)
> +		tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits * 2;
>  
> -	if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits / 2)
> -		tunables->lnd_concurrent_sends = ni->ni_peertxcredits / 2;
> +	if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits / 2)
> +		tunables->lnd_concurrent_sends = net_tunables->lct_peer_tx_credits / 2;
>  
> -	if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits) {
> +	if (tunables->lnd_concurrent_sends < net_tunables->lct_peer_tx_credits) {
>  		CWARN("Concurrent sends %d is lower than message queue size: %d, performance may drop slightly.\n",
> -		      tunables->lnd_concurrent_sends, ni->ni_peertxcredits);
> +		      tunables->lnd_concurrent_sends,
> +		      net_tunables->lct_peer_tx_credits);
>  	}
>  
>  	if (!tunables->lnd_fmr_pool_size)
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> index 4dde158451ea..4ad885f10235 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> @@ -2739,12 +2739,19 @@ ksocknal_startup(struct lnet_ni *ni)
>  		goto fail_0;
>  
>  	spin_lock_init(&net->ksnn_lock);
> -	net->ksnn_incarnation = ktime_get_real_ns();
> -	ni->ni_data = net;
> -	ni->ni_peertimeout    = *ksocknal_tunables.ksnd_peertimeout;
> -	ni->ni_maxtxcredits   = *ksocknal_tunables.ksnd_credits;
> -	ni->ni_peertxcredits  = *ksocknal_tunables.ksnd_peertxcredits;
> -	ni->ni_peerrtrcredits = *ksocknal_tunables.ksnd_peerrtrcredits;
> +        net->ksnn_incarnation = ktime_get_real_ns();
> +        ni->ni_data = net;
> +	if (!ni->ni_net->net_tunables_set) {
> +		ni->ni_net->net_tunables.lct_peer_timeout =
> +			*ksocknal_tunables.ksnd_peertimeout;
> +		ni->ni_net->net_tunables.lct_max_tx_credits =
> +			*ksocknal_tunables.ksnd_credits;
> +		ni->ni_net->net_tunables.lct_peer_tx_credits =
> +			*ksocknal_tunables.ksnd_peertxcredits;
> +		ni->ni_net->net_tunables.lct_peer_rtr_credits =
> +			*ksocknal_tunables.ksnd_peerrtrcredits;
> +		ni->ni_net->net_tunables_set = true;
> +	}
>  
>  	net->ksnn_ninterfaces = 0;
>  	if (!ni->ni_interfaces[0]) {
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index f9fcce2a5643..cd4189fa7acb 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -1036,11 +1036,11 @@ lnet_ni_tq_credits(struct lnet_ni *ni)
>  	LASSERT(ni->ni_ncpts >= 1);
>  
>  	if (ni->ni_ncpts == 1)
> -		return ni->ni_maxtxcredits;
> +		return ni->ni_net->net_tunables.lct_max_tx_credits;
>  
> -	credits = ni->ni_maxtxcredits / ni->ni_ncpts;
> -	credits = max(credits, 8 * ni->ni_peertxcredits);
> -	credits = min(credits, ni->ni_maxtxcredits);
> +	credits = ni->ni_net->net_tunables.lct_max_tx_credits / ni->ni_ncpts;
> +	credits = max(credits, 8 * ni->ni_net->net_tunables.lct_peer_tx_credits);
> +	credits = min(credits, ni->ni_net->net_tunables.lct_max_tx_credits);
>  
>  	return credits;
>  }
> @@ -1271,16 +1271,16 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  	 */
>  	if (conf) {
>  		if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
> -			ni->ni_peerrtrcredits =
> +			ni->ni_net->net_tunables.lct_peer_rtr_credits =
>  				conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
>  		if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
> -			ni->ni_peertimeout =
> +			ni->ni_net->net_tunables.lct_peer_timeout =
>  				conf->cfg_config_u.cfg_net.net_peer_timeout;
>  		if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
> -			ni->ni_peertxcredits =
> +			ni->ni_net->net_tunables.lct_peer_tx_credits =
>  				conf->cfg_config_u.cfg_net.net_peer_tx_credits;
>  		if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
> -			ni->ni_maxtxcredits =
> +			ni->ni_net->net_tunables.lct_max_tx_credits =
>  				conf->cfg_config_u.cfg_net.net_max_tx_credits;
>  	}
>  
> @@ -1297,8 +1297,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  		goto failed0;
>  	}
>  
> -	LASSERT(ni->ni_peertimeout <= 0 || lnd->lnd_query);
> -
>  	lnet_net_lock(LNET_LOCK_EX);
>  	/* refcount for ln_nis */
>  	lnet_ni_addref_locked(ni, 0);
> @@ -1314,13 +1312,18 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  		lnet_ni_addref(ni);
>  		LASSERT(!the_lnet.ln_loni);
>  		the_lnet.ln_loni = ni;
> +		ni->ni_net->net_tunables.lct_peer_tx_credits = 0;
> +		ni->ni_net->net_tunables.lct_peer_rtr_credits = 0;
> +		ni->ni_net->net_tunables.lct_max_tx_credits = 0;
> +		ni->ni_net->net_tunables.lct_peer_timeout = 0;
>  		return 0;
>  	}
>  
> -	if (!ni->ni_peertxcredits || !ni->ni_maxtxcredits) {
> +	if (!ni->ni_net->net_tunables.lct_peer_tx_credits ||
> +	    !ni->ni_net->net_tunables.lct_max_tx_credits) {
>  		LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
>  				   libcfs_lnd2str(lnd->lnd_type),
> -				   !ni->ni_peertxcredits ?
> +				   !ni->ni_net->net_tunables.lct_peer_tx_credits ?
>  				   "" : "per-peer ");
>  		/*
>  		 * shutdown the NI since if we get here then it must've already
> @@ -1343,9 +1346,11 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  	add_device_randomness(&seed, sizeof(seed));
>  
>  	CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
> -	       libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
> +	       libcfs_nid2str(ni->ni_nid),
> +		ni->ni_net->net_tunables.lct_peer_tx_credits,
>  	       lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
> -	       ni->ni_peerrtrcredits, ni->ni_peertimeout);
> +	       ni->ni_net->net_tunables.lct_peer_rtr_credits,
> +		ni->ni_net->net_tunables.lct_peer_timeout);
>  
>  	return 0;
>  failed0:
> @@ -1667,10 +1672,14 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
>  	}
>  
>  	config->cfg_nid = ni->ni_nid;
> -	config->cfg_config_u.cfg_net.net_peer_timeout = ni->ni_peertimeout;
> -	config->cfg_config_u.cfg_net.net_max_tx_credits = ni->ni_maxtxcredits;
> -	config->cfg_config_u.cfg_net.net_peer_tx_credits = ni->ni_peertxcredits;
> -	config->cfg_config_u.cfg_net.net_peer_rtr_credits = ni->ni_peerrtrcredits;
> +	config->cfg_config_u.cfg_net.net_peer_timeout =
> +		ni->ni_net->net_tunables.lct_peer_timeout;
> +	config->cfg_config_u.cfg_net.net_max_tx_credits =
> +		ni->ni_net->net_tunables.lct_max_tx_credits;
> +	config->cfg_config_u.cfg_net.net_peer_tx_credits =
> +		ni->ni_net->net_tunables.lct_peer_tx_credits;
> +	config->cfg_config_u.cfg_net.net_peer_rtr_credits =
> +		ni->ni_net->net_tunables.lct_peer_rtr_credits;
>  
>  	net_config->ni_status = ni->ni_status->ns_status;
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
> index 091c4f714e84..86a53854e427 100644
> --- a/drivers/staging/lustre/lnet/lnet/config.c
> +++ b/drivers/staging/lustre/lnet/lnet/config.c
> @@ -114,29 +114,38 @@ lnet_ni_free(struct lnet_ni *ni)
>  	if (ni->ni_net_ns)
>  		put_net(ni->ni_net_ns);
>  
> +	kvfree(ni->ni_net);
>  	kfree(ni);
>  }
>  
>  struct lnet_ni *
> -lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
> +lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
>  {
>  	struct lnet_tx_queue *tq;
>  	struct lnet_ni *ni;
>  	int rc;
>  	int i;
> +	struct lnet_net		*net;
>  
> -	if (!lnet_net_unique(net, nilist)) {
> +	if (!lnet_net_unique(net_id, nilist)) {
>  		LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
> -				   libcfs_net2str(net));
> +				   libcfs_net2str(net_id));
>  		return NULL;
>  	}
>  
>  	ni = kzalloc(sizeof(*ni), GFP_NOFS);
> -	if (!ni) {
> +	net = kzalloc(sizeof(*net), GFP_NOFS);
> +	if (!ni || !net) {
> +		kfree(ni); kfree(net);
>  		CERROR("Out of memory creating network %s\n",
> -		       libcfs_net2str(net));
> +		       libcfs_net2str(net_id));
>  		return NULL;
>  	}
> +	/* initialize global paramters to undefiend */
> +	net->net_tunables.lct_peer_timeout = -1;
> +	net->net_tunables.lct_max_tx_credits = -1;
> +	net->net_tunables.lct_peer_tx_credits = -1;
> +	net->net_tunables.lct_peer_rtr_credits = -1;
>  
>  	spin_lock_init(&ni->ni_lock);
>  	INIT_LIST_HEAD(&ni->ni_cptlist);
> @@ -160,7 +169,7 @@ lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
>  		rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
>  		if (rc <= 0) {
>  			CERROR("Failed to set CPTs for NI %s: %d\n",
> -			       libcfs_net2str(net), rc);
> +			       libcfs_net2str(net_id), rc);
>  			goto failed;
>  		}
>  
> @@ -173,8 +182,9 @@ lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist)
>  		ni->ni_ncpts = rc;
>  	}
>  
> +	ni->ni_net = net;
>  	/* LND will fill in the address part of the NID */
> -	ni->ni_nid = LNET_MKNID(net, 0);
> +	ni->ni_nid = LNET_MKNID(net_id, 0);
>  
>  	/* Store net namespace in which current ni is being created */
>  	if (current->nsproxy->net_ns)
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index edcafac055ed..f186e6a16d34 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -524,7 +524,8 @@ lnet_peer_is_alive(struct lnet_peer *lp, unsigned long now)
>  	    lp->lp_timestamp >= lp->lp_last_alive)
>  		return 0;
>  
> -	deadline = lp->lp_last_alive + lp->lp_ni->ni_peertimeout;
> +	deadline = lp->lp_last_alive +
> +		lp->lp_ni->ni_net->net_tunables.lct_peer_timeout;
>  	alive = deadline > now;
>  
>  	/* Update obsolete lp_alive except for routers assumed to be dead
> @@ -569,7 +570,7 @@ lnet_peer_alive_locked(struct lnet_peer *lp)
>  				      libcfs_nid2str(lp->lp_nid),
>  				      now, next_query,
>  				      lnet_queryinterval,
> -				      lp->lp_ni->ni_peertimeout);
> +				      lp->lp_ni->ni_net->net_tunables.lct_peer_timeout);
>  			return 0;
>  		}
>  	}
> diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
> index d9452c322e4d..b76ac3e051d9 100644
> --- a/drivers/staging/lustre/lnet/lnet/peer.c
> +++ b/drivers/staging/lustre/lnet/lnet/peer.c
> @@ -342,8 +342,8 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
>  		goto out;
>  	}
>  
> -	lp->lp_txcredits = lp->lp_ni->ni_peertxcredits;
> -	lp->lp_mintxcredits = lp->lp_ni->ni_peertxcredits;
> +	lp->lp_txcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
> +	lp->lp_mintxcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
>  	lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
>  	lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
>  
> @@ -383,7 +383,7 @@ lnet_debug_peer(lnet_nid_t nid)
>  
>  	CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
>  	       libcfs_nid2str(lp->lp_nid), lp->lp_refcount,
> -	       aliveness, lp->lp_ni->ni_peertxcredits,
> +	       aliveness, lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits,
>  	       lp->lp_rtrcredits, lp->lp_minrtrcredits,
>  	       lp->lp_txcredits, lp->lp_mintxcredits, lp->lp_txqnob);
>  
> @@ -438,7 +438,8 @@ lnet_get_peer_info(__u32 peer_index, __u64 *nid,
>  
>  			*nid = lp->lp_nid;
>  			*refcount = lp->lp_refcount;
> -			*ni_peer_tx_credits = lp->lp_ni->ni_peertxcredits;
> +			*ni_peer_tx_credits =
> +				lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
>  			*peer_tx_credits = lp->lp_txcredits;
>  			*peer_rtr_credits = lp->lp_rtrcredits;
>  			*peer_min_rtr_credits = lp->lp_mintxcredits;
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 02241fbc9eaa..7d61c5d71426 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -57,9 +57,11 @@ MODULE_PARM_DESC(auto_down, "Automatically mark peers down on comms error");
>  int
>  lnet_peer_buffer_credits(struct lnet_ni *ni)
>  {
> +	struct lnet_net *net = ni->ni_net;
> +
>  	/* NI option overrides LNet default */
> -	if (ni->ni_peerrtrcredits > 0)
> -		return ni->ni_peerrtrcredits;
> +	if (net->net_tunables.lct_peer_rtr_credits > 0)
> +		return net->net_tunables.lct_peer_rtr_credits;
>  	if (peer_buffer_credits > 0)
>  		return peer_buffer_credits;
>  
> @@ -67,7 +69,7 @@ lnet_peer_buffer_credits(struct lnet_ni *ni)
>  	 * As an approximation, allow this peer the same number of router
>  	 * buffers as it is allowed outstanding sends
>  	 */
> -	return ni->ni_peertxcredits;
> +	return net->net_tunables.lct_peer_tx_credits;
>  }
>  
>  /* forward ref's */
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index 31f4982f7f17..19cea7076057 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -489,7 +489,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write,
>  			int nrefs = peer->lp_refcount;
>  			time64_t lastalive = -1;
>  			char *aliveness = "NA";
> -			int maxcr = peer->lp_ni->ni_peertxcredits;
> +			int maxcr = peer->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
>  			int txcr = peer->lp_txcredits;
>  			int mintxcr = peer->lp_mintxcredits;
>  			int rtrcr = peer->lp_rtrcredits;
> @@ -704,8 +704,8 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
>  					      "%-24s %6s %5lld %4d %4d %4d %5d %5d %5d\n",
>  					      libcfs_nid2str(ni->ni_nid), stat,
>  					      last_alive, *ni->ni_refs[i],
> -					      ni->ni_peertxcredits,
> -					      ni->ni_peerrtrcredits,
> +					      ni->ni_net->net_tunables.lct_peer_tx_credits,
> +					      ni->ni_net->net_tunables.lct_peer_rtr_credits,
>  					      tq->tq_credits_max,
>  					      tq->tq_credits,
>  					      tq->tq_credits_min);
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message NeilBrown
@ 2018-09-10 23:24   ` Doug Oucharek
  2018-09-10 23:29   ` James Simmons
  2018-09-10 23:36   ` James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:24 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

Currently we store the net-interface in the peer, but the
peer should identify just the network, not the particular interface.
To help track which actual interface is used for each
message, store them explicitly.

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

and includes commit 63c3e5129873 ("LU-7734 lnet: Fix lnet_msg_free()")

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-types.h  |    3 +++
drivers/staging/lustre/lnet/lnet/lib-move.c        |   21 ++++++++++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 5f0d4703bf86..16a493529a46 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -98,6 +98,9 @@ struct lnet_msg {

void *msg_private;
struct lnet_libmd *msg_md;
+ /* the NI the message was sent or received over */
+ struct lnet_ni       *msg_txni;
+ struct lnet_ni       *msg_rxni;

unsigned int msg_len;
unsigned int msg_wanted;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index 1c874025fa74..b2a52ddcefcb 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -782,6 +782,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
{
struct lnet_peer *txpeer = msg->msg_txpeer;
struct lnet_msg *msg2;
+ struct lnet_ni *txni = msg->msg_txni;

if (msg->msg_txcredit) {
struct lnet_ni *ni = txpeer->lp_ni;
@@ -829,6 +830,11 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
}
}

+ if (txni != NULL) {
+ msg->msg_txni = NULL;
+ lnet_ni_decref_locked(txni, msg->msg_tx_cpt);
+ }
+
if (txpeer) {
msg->msg_txpeer = NULL;
lnet_peer_decref_locked(txpeer);
@@ -876,6 +882,7 @@ void
lnet_return_rx_credits_locked(struct lnet_msg *msg)
{
struct lnet_peer *rxpeer = msg->msg_rxpeer;
+ struct lnet_ni *rxni = msg->msg_rxni;
struct lnet_msg *msg2;

if (msg->msg_rtrcredit) {
@@ -951,6 +958,10 @@ lnet_return_rx_credits_locked(struct lnet_msg *msg)
(void)lnet_post_routed_recv_locked(msg2, 1);
}
}
+ if (rxni != NULL) {
+ msg->msg_rxni = NULL;
+ lnet_ni_decref_locked(rxni, msg->msg_rx_cpt);
+ }
if (rxpeer) {
msg->msg_rxpeer = NULL;
lnet_peer_decref_locked(rxpeer);
@@ -1218,9 +1229,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)

LASSERT(!msg->msg_peertxcredit);
LASSERT(!msg->msg_txcredit);
- LASSERT(!msg->msg_txpeer);
+ LASSERT(msg->msg_txpeer == NULL);

- msg->msg_txpeer = lp;   /* msg takes my ref on lp */
+ msg->msg_txpeer = lp;                   /* msg takes my ref on lp */
+ /* set the NI for this message */
+ msg->msg_txni = src_ni;
+ lnet_ni_addref_locked(msg->msg_txni, cpt);

rc = lnet_post_send_locked(msg, 0);
lnet_net_unlock(cpt);
@@ -1818,6 +1832,8 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
return 0;
goto drop;
}
+ msg->msg_rxni = ni;
+ lnet_ni_addref_locked(ni, cpt);

if (lnet_isrouter(msg->msg_rxpeer)) {
lnet_peer_set_alive(msg->msg_rxpeer);
@@ -1934,6 +1950,7 @@ lnet_recv_delayed_msg_list(struct list_head *head)
LASSERT(msg->msg_rx_delayed);
LASSERT(msg->msg_md);
LASSERT(msg->msg_rxpeer);
+ LASSERT(msg->msg_rxni);
LASSERT(msg->msg_hdr.type == LNET_MSG_PUT);

CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/da76abd5/attachment.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-07  0:49 ` [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net NeilBrown
  2018-09-10 23:04   ` Doug Oucharek
@ 2018-09-10 23:24   ` James Simmons
  2018-09-10 23:25   ` James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:24 UTC (permalink / raw)
  To: lustre-devel


> Also make some other minor changes to the structures.
>

Acked-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.
 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
>  .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
>  drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
>  drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
>  drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
>  drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
>  9 files changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index ead8a4e1125a..e170eb07a5bf 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -262,12 +262,17 @@ struct lnet_net {
>  	 * shouldn't be reset
>  	 */
>  	bool			  net_tunables_set;
> +	/* procedural interface */
> +	struct lnet_lnd		*net_lnd;
>  };
>  
>  struct lnet_ni {
> -	spinlock_t		  ni_lock;
> -	struct list_head	  ni_list;	/* chain on ln_nis */
> -	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
> +	/* chain on ln_nis */
> +	struct list_head	  ni_list;
> +	/* chain on ln_nis_cpt */
> +	struct list_head	ni_cptlist;
> +
> +	spinlock_t		ni_lock;
>  
>  	/* number of CPTs */
>  	int			ni_ncpts;
> @@ -281,8 +286,6 @@ struct lnet_ni {
>  	/* instance-specific data */
>  	void			*ni_data;
>  
> -	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
> -
>  	/* percpt TX queues */
>  	struct lnet_tx_queue	**ni_tx_queues;
>  
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index 0d17e22c4401..5e1592b398c1 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
>  	int rc;
>  	int newdev;
>  
> -	LASSERT(ni->ni_lnd == &the_o2iblnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
>  
>  	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
>  		rc = kiblnd_base_startup();
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> index 4ad885f10235..2036a0ae5917 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> @@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
>  	int rc;
>  	int i;
>  
> -	LASSERT(ni->ni_lnd == &the_ksocklnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
>  
>  	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
>  		rc = ksocknal_base_startup();
> diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> index 3ae3ca1311a1..f8c921f0221c 100644
> --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> @@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
>  		return -EPERM;
>  	}
>  
> -	if (!ni->ni_lnd->lnd_accept) {
> +	if (!ni->ni_net->net_lnd->lnd_accept) {
>  		/* This catches a request for the loopback LND */
>  		lnet_ni_decref(ni);
>  		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
> @@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
>  	CDEBUG(D_NET, "Accept %s from %pI4h\n",
>  	       libcfs_nid2str(cr.acr_nid), &peer_ip);
>  
> -	rc = ni->ni_lnd->lnd_accept(ni, sock);
> +	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
>  
>  	lnet_ni_decref(ni);
>  	return rc;
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index cd4189fa7acb..0896e75bc3d7 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
>  
>  	cpt = lnet_net_lock_current();
>  	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_lnd->lnd_accept)
> +		if (ni->ni_net->net_lnd->lnd_accept)
>  			count++;
>  	}
>  
> @@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
>  			continue;
>  		}
>  
> -		ni->ni_lnd->lnd_refcount--;
> +		ni->ni_net->net_lnd->lnd_refcount--;
>  		lnet_net_unlock(LNET_LOCK_EX);
>  
> -		islo = ni->ni_lnd->lnd_type == LOLND;
> +		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
>  
>  		LASSERT(!in_interrupt());
> -		ni->ni_lnd->lnd_shutdown(ni);
> +		ni->ni_net->net_lnd->lnd_shutdown(ni);
>  
>  		/*
>  		 * can't deref lnd anymore now; it might have unregistered
> @@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  	lnd->lnd_refcount++;
>  	lnet_net_unlock(LNET_LOCK_EX);
>  
> -	ni->ni_lnd = lnd;
> +	ni->ni_net->net_lnd = lnd;
>  
>  	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
>  		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
> @@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  	if (rc)
>  		goto failed1;
>  
> -	if (ni->ni_lnd->lnd_accept) {
> +	if (ni->ni_net->net_lnd->lnd_accept) {
>  		rc = lnet_acceptor_start();
>  		if (rc < 0) {
>  			/* shutdown the ni that we just started */
> @@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
>  		if (!ni)
>  			return -EINVAL;
>  
> -		if (!ni->ni_lnd->lnd_ctl)
> +		if (!ni->ni_net->net_lnd->lnd_ctl)
>  			rc = -EINVAL;
>  		else
> -			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
> +			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
>  
>  		lnet_ni_decref(ni);
>  		return rc;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index f186e6a16d34..1bf12af87a20 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
>  		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
>  		iov_iter_advance(&to, offset);
>  	}
> -	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> +	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
>  	if (rc < 0)
>  		lnet_finalize(ni, msg, rc);
>  }
> @@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
>  	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
>  		(msg->msg_txcredit && msg->msg_peertxcredit));
>  
> -	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
> +	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
>  	if (rc < 0)
>  		lnet_finalize(ni, msg, rc);
>  }
> @@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
>  	LASSERT(!msg->msg_sending);
>  	LASSERT(msg->msg_receiving);
>  	LASSERT(!msg->msg_rx_ready_delay);
> -	LASSERT(ni->ni_lnd->lnd_eager_recv);
> +	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
>  
>  	msg->msg_rx_ready_delay = 1;
> -	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> +	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
>  					&msg->msg_private);
>  	if (rc) {
>  		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
> @@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>  	time64_t last_alive = 0;
>  
>  	LASSERT(lnet_peer_aliveness_enabled(lp));
> -	LASSERT(ni->ni_lnd->lnd_query);
> +	LASSERT(ni->ni_net->net_lnd->lnd_query);
>  
>  	lnet_net_unlock(lp->lp_cpt);
> -	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> +	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
>  	lnet_net_lock(lp->lp_cpt);
>  
>  	lp->lp_last_query = ktime_get_seconds();
> @@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
>  	info.mi_roffset	= hdr->msg.put.offset;
>  	info.mi_mbits	= hdr->msg.put.match_bits;
>  
> -	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
> +	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
>  	ready_delay = msg->msg_rx_ready_delay;
>  
>   again:
> @@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
>  
>  	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
>  	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
> -		if (!ni->ni_lnd->lnd_eager_recv) {
> +		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
>  			msg->msg_rx_ready_delay = 1;
>  		} else {
>  			lnet_net_unlock(msg->msg_rx_cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
> index eb14146bd879..8167980c2323 100644
> --- a/drivers/staging/lustre/lnet/lnet/lo.c
> +++ b/drivers/staging/lustre/lnet/lnet/lo.c
> @@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
>  static int
>  lolnd_startup(struct lnet_ni *ni)
>  {
> -	LASSERT(ni->ni_lnd == &the_lolnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
>  	LASSERT(!lolnd_instanced);
>  	lolnd_instanced = 1;
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 7d61c5d71426..0c0ec0b27982 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>  		lp->lp_notifylnd = 0;
>  		lp->lp_notify    = 0;
>  
> -		if (notifylnd && ni->ni_lnd->lnd_notify) {
> +		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
>  			lnet_net_unlock(lp->lp_cpt);
>  
>  			/*
>  			 * A new notification could happen now; I'll handle it
>  			 * when control returns to me
>  			 */
> -			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
> +			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
>  
>  			lnet_net_lock(lp->lp_cpt);
>  		}
> @@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
>  		lnet_net_unlock(LNET_LOCK_EX);
>  
>  		/* XXX Assume alive */
> -		if (ni->ni_lnd->lnd_notify)
> -			ni->ni_lnd->lnd_notify(ni, gateway, 1);
> +		if (ni->ni_net->net_lnd->lnd_notify)
> +			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
>  
>  		lnet_net_lock(LNET_LOCK_EX);
>  	}
> @@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
>  
>  	now = ktime_get_real_seconds();
>  	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_lnd->lnd_type == LOLND)
> +		if (ni->ni_net->net_lnd->lnd_type == LOLND)
>  			continue;
>  
>  		if (now < ni->ni_last_alive + timeout)
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index 19cea7076057..f3ccd6a2b70e 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
>  				last_alive = now - ni->ni_last_alive;
>  
>  			/* @lo forever alive */
> -			if (ni->ni_lnd->lnd_type == LOLND)
> +			if (ni->ni_net->net_lnd->lnd_type == LOLND)
>  				last_alive = 0;
>  
>  			lnet_ni_lock(ni);
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net
  2018-09-07  0:49 ` [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net NeilBrown
  2018-09-10 23:04   ` Doug Oucharek
  2018-09-10 23:24   ` James Simmons
@ 2018-09-10 23:25   ` James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:25 UTC (permalink / raw)
  To: lustre-devel


> Also make some other minor changes to the structures.
>

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.
 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |   13 ++++++++-----
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    2 +-
>  .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    2 +-
>  drivers/staging/lustre/lnet/lnet/acceptor.c        |    4 ++--
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++--------
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |   16 ++++++++--------
>  drivers/staging/lustre/lnet/lnet/lo.c              |    2 +-
>  drivers/staging/lustre/lnet/lnet/router.c          |   10 +++++-----
>  drivers/staging/lustre/lnet/lnet/router_proc.c     |    2 +-
>  9 files changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index ead8a4e1125a..e170eb07a5bf 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -262,12 +262,17 @@ struct lnet_net {
>  	 * shouldn't be reset
>  	 */
>  	bool			  net_tunables_set;
> +	/* procedural interface */
> +	struct lnet_lnd		*net_lnd;
>  };
>  
>  struct lnet_ni {
> -	spinlock_t		  ni_lock;
> -	struct list_head	  ni_list;	/* chain on ln_nis */
> -	struct list_head	  ni_cptlist;	/* chain on ln_nis_cpt */
> +	/* chain on ln_nis */
> +	struct list_head	  ni_list;
> +	/* chain on ln_nis_cpt */
> +	struct list_head	ni_cptlist;
> +
> +	spinlock_t		ni_lock;
>  
>  	/* number of CPTs */
>  	int			ni_ncpts;
> @@ -281,8 +286,6 @@ struct lnet_ni {
>  	/* instance-specific data */
>  	void			*ni_data;
>  
> -	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
> -
>  	/* percpt TX queues */
>  	struct lnet_tx_queue	**ni_tx_queues;
>  
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index 0d17e22c4401..5e1592b398c1 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -2830,7 +2830,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
>  	int rc;
>  	int newdev;
>  
> -	LASSERT(ni->ni_lnd == &the_o2iblnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_o2iblnd);
>  
>  	if (kiblnd_data.kib_init == IBLND_INIT_NOTHING) {
>  		rc = kiblnd_base_startup();
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> index 4ad885f10235..2036a0ae5917 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> @@ -2726,7 +2726,7 @@ ksocknal_startup(struct lnet_ni *ni)
>  	int rc;
>  	int i;
>  
> -	LASSERT(ni->ni_lnd == &the_ksocklnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_ksocklnd);
>  
>  	if (ksocknal_data.ksnd_init == SOCKNAL_INIT_NOTHING) {
>  		rc = ksocknal_base_startup();
> diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> index 3ae3ca1311a1..f8c921f0221c 100644
> --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> @@ -306,7 +306,7 @@ lnet_accept(struct socket *sock, __u32 magic)
>  		return -EPERM;
>  	}
>  
> -	if (!ni->ni_lnd->lnd_accept) {
> +	if (!ni->ni_net->net_lnd->lnd_accept) {
>  		/* This catches a request for the loopback LND */
>  		lnet_ni_decref(ni);
>  		LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
> @@ -317,7 +317,7 @@ lnet_accept(struct socket *sock, __u32 magic)
>  	CDEBUG(D_NET, "Accept %s from %pI4h\n",
>  	       libcfs_nid2str(cr.acr_nid), &peer_ip);
>  
> -	rc = ni->ni_lnd->lnd_accept(ni, sock);
> +	rc = ni->ni_net->net_lnd->lnd_accept(ni, sock);
>  
>  	lnet_ni_decref(ni);
>  	return rc;
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index cd4189fa7acb..0896e75bc3d7 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -799,7 +799,7 @@ lnet_count_acceptor_nis(void)
>  
>  	cpt = lnet_net_lock_current();
>  	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_lnd->lnd_accept)
> +		if (ni->ni_net->net_lnd->lnd_accept)
>  			count++;
>  	}
>  
> @@ -1098,13 +1098,13 @@ lnet_clear_zombies_nis_locked(void)
>  			continue;
>  		}
>  
> -		ni->ni_lnd->lnd_refcount--;
> +		ni->ni_net->net_lnd->lnd_refcount--;
>  		lnet_net_unlock(LNET_LOCK_EX);
>  
> -		islo = ni->ni_lnd->lnd_type == LOLND;
> +		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
>  
>  		LASSERT(!in_interrupt());
> -		ni->ni_lnd->lnd_shutdown(ni);
> +		ni->ni_net->net_lnd->lnd_shutdown(ni);
>  
>  		/*
>  		 * can't deref lnd anymore now; it might have unregistered
> @@ -1248,7 +1248,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  	lnd->lnd_refcount++;
>  	lnet_net_unlock(LNET_LOCK_EX);
>  
> -	ni->ni_lnd = lnd;
> +	ni->ni_net->net_lnd = lnd;
>  
>  	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
>  		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
> @@ -1794,7 +1794,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  	if (rc)
>  		goto failed1;
>  
> -	if (ni->ni_lnd->lnd_accept) {
> +	if (ni->ni_net->net_lnd->lnd_accept) {
>  		rc = lnet_acceptor_start();
>  		if (rc < 0) {
>  			/* shutdown the ni that we just started */
> @@ -2074,10 +2074,10 @@ LNetCtl(unsigned int cmd, void *arg)
>  		if (!ni)
>  			return -EINVAL;
>  
> -		if (!ni->ni_lnd->lnd_ctl)
> +		if (!ni->ni_net->net_lnd->lnd_ctl)
>  			rc = -EINVAL;
>  		else
> -			rc = ni->ni_lnd->lnd_ctl(ni, cmd, arg);
> +			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
>  
>  		lnet_ni_decref(ni);
>  		return rc;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index f186e6a16d34..1bf12af87a20 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -406,7 +406,7 @@ lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
>  		iov_iter_bvec(&to, ITER_BVEC | READ, kiov, niov, mlen + offset);
>  		iov_iter_advance(&to, offset);
>  	}
> -	rc = ni->ni_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
> +	rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen);
>  	if (rc < 0)
>  		lnet_finalize(ni, msg, rc);
>  }
> @@ -461,7 +461,7 @@ lnet_ni_send(struct lnet_ni *ni, struct lnet_msg *msg)
>  	LASSERT(LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND ||
>  		(msg->msg_txcredit && msg->msg_peertxcredit));
>  
> -	rc = ni->ni_lnd->lnd_send(ni, priv, msg);
> +	rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg);
>  	if (rc < 0)
>  		lnet_finalize(ni, msg, rc);
>  }
> @@ -474,10 +474,10 @@ lnet_ni_eager_recv(struct lnet_ni *ni, struct lnet_msg *msg)
>  	LASSERT(!msg->msg_sending);
>  	LASSERT(msg->msg_receiving);
>  	LASSERT(!msg->msg_rx_ready_delay);
> -	LASSERT(ni->ni_lnd->lnd_eager_recv);
> +	LASSERT(ni->ni_net->net_lnd->lnd_eager_recv);
>  
>  	msg->msg_rx_ready_delay = 1;
> -	rc = ni->ni_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
> +	rc = ni->ni_net->net_lnd->lnd_eager_recv(ni, msg->msg_private, msg,
>  					&msg->msg_private);
>  	if (rc) {
>  		CERROR("recv from %s / send to %s aborted: eager_recv failed %d\n",
> @@ -496,10 +496,10 @@ lnet_ni_query_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>  	time64_t last_alive = 0;
>  
>  	LASSERT(lnet_peer_aliveness_enabled(lp));
> -	LASSERT(ni->ni_lnd->lnd_query);
> +	LASSERT(ni->ni_net->net_lnd->lnd_query);
>  
>  	lnet_net_unlock(lp->lp_cpt);
> -	ni->ni_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
> +	ni->ni_net->net_lnd->lnd_query(ni, lp->lp_nid, &last_alive);
>  	lnet_net_lock(lp->lp_cpt);
>  
>  	lp->lp_last_query = ktime_get_seconds();
> @@ -1287,7 +1287,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
>  	info.mi_roffset	= hdr->msg.put.offset;
>  	info.mi_mbits	= hdr->msg.put.match_bits;
>  
> -	msg->msg_rx_ready_delay = !ni->ni_lnd->lnd_eager_recv;
> +	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
>  	ready_delay = msg->msg_rx_ready_delay;
>  
>   again:
> @@ -1518,7 +1518,7 @@ lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg)
>  
>  	if (msg->msg_rxpeer->lp_rtrcredits <= 0 ||
>  	    lnet_msg2bufpool(msg)->rbp_credits <= 0) {
> -		if (!ni->ni_lnd->lnd_eager_recv) {
> +		if (!ni->ni_net->net_lnd->lnd_eager_recv) {
>  			msg->msg_rx_ready_delay = 1;
>  		} else {
>  			lnet_net_unlock(msg->msg_rx_cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c
> index eb14146bd879..8167980c2323 100644
> --- a/drivers/staging/lustre/lnet/lnet/lo.c
> +++ b/drivers/staging/lustre/lnet/lnet/lo.c
> @@ -83,7 +83,7 @@ lolnd_shutdown(struct lnet_ni *ni)
>  static int
>  lolnd_startup(struct lnet_ni *ni)
>  {
> -	LASSERT(ni->ni_lnd == &the_lolnd);
> +	LASSERT(ni->ni_net->net_lnd == &the_lolnd);
>  	LASSERT(!lolnd_instanced);
>  	lolnd_instanced = 1;
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 7d61c5d71426..0c0ec0b27982 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -154,14 +154,14 @@ lnet_ni_notify_locked(struct lnet_ni *ni, struct lnet_peer *lp)
>  		lp->lp_notifylnd = 0;
>  		lp->lp_notify    = 0;
>  
> -		if (notifylnd && ni->ni_lnd->lnd_notify) {
> +		if (notifylnd && ni->ni_net->net_lnd->lnd_notify) {
>  			lnet_net_unlock(lp->lp_cpt);
>  
>  			/*
>  			 * A new notification could happen now; I'll handle it
>  			 * when control returns to me
>  			 */
> -			ni->ni_lnd->lnd_notify(ni, lp->lp_nid, alive);
> +			ni->ni_net->net_lnd->lnd_notify(ni, lp->lp_nid, alive);
>  
>  			lnet_net_lock(lp->lp_cpt);
>  		}
> @@ -380,8 +380,8 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
>  		lnet_net_unlock(LNET_LOCK_EX);
>  
>  		/* XXX Assume alive */
> -		if (ni->ni_lnd->lnd_notify)
> -			ni->ni_lnd->lnd_notify(ni, gateway, 1);
> +		if (ni->ni_net->net_lnd->lnd_notify)
> +			ni->ni_net->net_lnd->lnd_notify(ni, gateway, 1);
>  
>  		lnet_net_lock(LNET_LOCK_EX);
>  	}
> @@ -818,7 +818,7 @@ lnet_update_ni_status_locked(void)
>  
>  	now = ktime_get_real_seconds();
>  	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_lnd->lnd_type == LOLND)
> +		if (ni->ni_net->net_lnd->lnd_type == LOLND)
>  			continue;
>  
>  		if (now < ni->ni_last_alive + timeout)
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index 19cea7076057..f3ccd6a2b70e 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -674,7 +674,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
>  				last_alive = now - ni->ni_last_alive;
>  
>  			/* @lo forever alive */
> -			if (ni->ni_lnd->lnd_type == LOLND)
> +			if (ni->ni_net->net_lnd->lnd_type == LOLND)
>  				last_alive = 0;
>  
>  			lnet_ni_lock(ni);
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info NeilBrown
@ 2018-09-10 23:25   ` Doug Oucharek
  2018-09-11  1:01   ` James Simmons
  2018-09-11  1:01   ` [lustre-devel] BRe: " James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:25 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

This seems to be a more direct way to get the cpt
needed in lnet_mt_of_match().

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-types.h  |    1 +
drivers/staging/lustre/lnet/lnet/lib-move.c        |    1 +
drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 +-
3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 255c6c4bbb89..2d2c066a11ba 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -511,6 +511,7 @@ enum lnet_match_flags {
struct lnet_match_info {
__u64 mi_mbits;
struct lnet_process_id mi_id;
+ unsigned int mi_cpt;
unsigned int mi_opc;
unsigned int mi_portal;
unsigned int mi_rlength;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index b8b15f56a275..b6e81a693fc3 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1303,6 +1303,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
info.mi_rlength = hdr->payload_length;
info.mi_roffset = hdr->msg.put.offset;
info.mi_mbits = hdr->msg.put.match_bits;
+ info.mi_cpt = msg->msg_rxpeer->lp_cpt;

msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
ready_delay = msg->msg_rx_ready_delay;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 4c5737083422..90ce51801726 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -292,7 +292,7 @@ lnet_mt_of_match(struct lnet_match_info *info, struct lnet_msg *msg)

rotor = ptl->ptl_rotor++; /* get round-robin factor */
if (portal_rotor == LNET_PTL_ROTOR_HASH_RT && routed)
- cpt = lnet_cpt_of_nid(msg->msg_hdr.src_nid);
+ cpt = info->mi_cpt;
else
cpt = rotor % LNET_CPT_NUMBER;




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/909dd227/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces".
  2018-09-07  0:49 ` [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces" NeilBrown
  2018-09-10 23:18   ` Doug Oucharek
@ 2018-09-10 23:27   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:27 UTC (permalink / raw)
  To: lustre-devel



On Fri, 7 Sep 2018, NeilBrown wrote:

> We already have "struct lnet_net" separate from "struct lnet_ni",
> but they are currently allocated together and freed together and
> it is assumed that they are 1-to-1.
> 
> This patch starts breaking that assumption.  We have separate
> lnet_net_alloc() and lnet_net_free() to alloc/free the new lnet_net,
> though they is currently called only when lnet_ni_alloc/free are
> called.
> 
> The netid is now stored in the lnet_net and fetched directly from
> there, rather than extracting it from the net-interface-id ni_nid.
> 
> The linkage between these two structures is now richer, lnet_net
> can link to a list of lnet_ni.  lnet_net now has a list of lnet_net,
> so to find all the lnet_ni, we need to walk a list of lists.
> This need to walk a list-of-lists occurs in several places, and new
> helpers like lnet_get_ni_idx_locked() and lnet_get_next_ni_locked are
> introduced.
> 
> Previously a list_head was passed to lnet_ni_alloc() for the new
> lnet_ni to be attached to.
> Now a list is passed to lnet_net_alloc() for the net to be attached
> to, and a lnet_net is passed to lnet_ni_alloc() for the ni to attach
> to.
> lnet_ni_alloc() also receives an interface name, but this is currently
> unused.

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.
 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-lnet.h   |   15 +
>  .../staging/lustre/include/linux/lnet/lib-types.h  |   23 +-
>  drivers/staging/lustre/lnet/lnet/acceptor.c        |    2 
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |  255 ++++++++++++++------
>  drivers/staging/lustre/lnet/lnet/config.c          |  135 +++++++----
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |    6 
>  drivers/staging/lustre/lnet/lnet/router.c          |   15 -
>  drivers/staging/lustre/lnet/lnet/router_proc.c     |   16 -
>  8 files changed, 308 insertions(+), 159 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> index 0fecf0d32c58..4440b87299c4 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> @@ -369,8 +369,14 @@ lnet_ni_decref(struct lnet_ni *ni)
>  }
>  
>  void lnet_ni_free(struct lnet_ni *ni);
> +void lnet_net_free(struct lnet_net *net);
> +
> +struct lnet_net *
> +lnet_net_alloc(__u32 net_type, struct list_head *netlist);
> +
>  struct lnet_ni *
> -lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
> +lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el,
> +	      char *iface);
>  
>  static inline int
>  lnet_nid2peerhash(lnet_nid_t nid)
> @@ -412,6 +418,9 @@ void lnet_destroy_routes(void);
>  int lnet_get_route(int idx, __u32 *net, __u32 *hops,
>  		   lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
>  int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
> +struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet,
> +					struct lnet_ni *prev);
> +struct lnet_ni *lnet_get_ni_idx_locked(int idx);
>  
>  void lnet_router_debugfs_init(void);
>  void lnet_router_debugfs_fini(void);
> @@ -584,7 +593,7 @@ int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
>  		 __u32 local_ip, __u32 peer_ip, int peer_port);
>  void lnet_connect_console_error(int rc, lnet_nid_t peer_nid,
>  				__u32 peer_ip, int port);
> -int lnet_count_acceptor_nis(void);
> +int lnet_count_acceptor_nets(void);
>  int lnet_acceptor_timeout(void);
>  int lnet_acceptor_port(void);
>  
> @@ -618,7 +627,7 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
>  int lnet_parse_ip2nets(char **networksp, char *ip2nets);
>  int lnet_parse_routes(char *route_str, int *im_a_router);
>  int lnet_parse_networks(struct list_head *nilist, char *networks);
> -int lnet_net_unique(__u32 net, struct list_head *nilist);
> +bool lnet_net_unique(__u32 net, struct list_head *nilist);
>  
>  int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
>  struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index c5e3363de727..5f0d4703bf86 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -254,6 +254,15 @@ struct lnet_tx_queue {
>  };
>  
>  struct lnet_net {
> +	/* chain on the ln_nets */
> +	struct list_head	net_list;
> +
> +	/* net ID, which is compoed of
> +	 * (net_type << 16) | net_num.
> +	 * net_type can be one of the enumarated types defined in
> +	 * lnet/include/lnet/nidstr.h */
> +	__u32			net_id;
> +
>  	/* network tunables */
>  	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
>  
> @@ -264,11 +273,13 @@ struct lnet_net {
>  	bool			  net_tunables_set;
>  	/* procedural interface */
>  	struct lnet_lnd		*net_lnd;
> +	/* list of NIs on this net */
> +	struct list_head	net_ni_list;
>  };
>  
>  struct lnet_ni {
> -	/* chain on ln_nis */
> -	struct list_head	  ni_list;
> +	/* chain on the lnet_net structure */
> +	struct list_head	  ni_netlist;
>  	/* chain on ln_nis_cpt */
>  	struct list_head	ni_cptlist;
>  
> @@ -626,14 +637,16 @@ struct lnet {
>  	/* failure simulation */
>  	struct list_head		  ln_test_peers;
>  	struct list_head		  ln_drop_rules;
> -	struct list_head		  ln_delay_rules;
> +	struct list_head		ln_delay_rules;
>  
> -	struct list_head		  ln_nis;	/* LND instances */
> +	/* LND instances */
> +	struct list_head		ln_nets;
>  	/* NIs bond on specific CPT(s) */
>  	struct list_head		  ln_nis_cpt;
>  	/* dying LND instances */
>  	struct list_head		  ln_nis_zombie;
> -	struct lnet_ni			 *ln_loni;	/* the loopback NI */
> +	/* the loopback NI */
> +	struct lnet_ni			*ln_loni;
>  
>  	/* remote networks with routes to them */
>  	struct list_head		 *ln_remote_nets_hash;
> diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> index f8c921f0221c..88b90c1fdbaf 100644
> --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> @@ -454,7 +454,7 @@ lnet_acceptor_start(void)
>  	if (rc <= 0)
>  		return rc;
>  
> -	if (!lnet_count_acceptor_nis())  /* not required */
> +	if (lnet_count_acceptor_nets() == 0)  /* not required */
>  		return 0;
>  
>  	task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure,
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index c944fbb155c8..05687278334a 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -537,7 +537,7 @@ lnet_prepare(lnet_pid_t requested_pid)
>  	the_lnet.ln_pid = requested_pid;
>  
>  	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
> -	INIT_LIST_HEAD(&the_lnet.ln_nis);
> +	INIT_LIST_HEAD(&the_lnet.ln_nets);
>  	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
>  	INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
>  	INIT_LIST_HEAD(&the_lnet.ln_routers);
> @@ -616,7 +616,7 @@ lnet_unprepare(void)
>  
>  	LASSERT(!the_lnet.ln_refcount);
>  	LASSERT(list_empty(&the_lnet.ln_test_peers));
> -	LASSERT(list_empty(&the_lnet.ln_nis));
> +	LASSERT(list_empty(&the_lnet.ln_nets));
>  	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
>  	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
>  
> @@ -648,14 +648,17 @@ lnet_unprepare(void)
>  }
>  
>  struct lnet_ni  *
> -lnet_net2ni_locked(__u32 net, int cpt)
> +lnet_net2ni_locked(__u32 net_id, int cpt)
>  {
> -	struct lnet_ni *ni;
> +	struct lnet_ni   *ni;
> +	struct lnet_net  *net;
>  
>  	LASSERT(cpt != LNET_LOCK_EX);
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (LNET_NIDNET(ni->ni_nid) == net) {
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		if (net->net_id == net_id) {
> +			ni = list_entry(net->net_ni_list.next, struct lnet_ni,
> +					ni_netlist);
>  			lnet_ni_addref_locked(ni, cpt);
>  			return ni;
>  		}
> @@ -760,14 +763,17 @@ lnet_islocalnet(__u32 net)
>  struct lnet_ni  *
>  lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
>  {
> -	struct lnet_ni *ni;
> +	struct lnet_net  *net;
> +	struct lnet_ni	 *ni;
>  
>  	LASSERT(cpt != LNET_LOCK_EX);
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_nid == nid) {
> -			lnet_ni_addref_locked(ni, cpt);
> -			return ni;
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +			if (ni->ni_nid == nid) {
> +				lnet_ni_addref_locked(ni, cpt);
> +				return ni;
> +			}
>  		}
>  	}
>  
> @@ -790,16 +796,18 @@ lnet_islocalnid(lnet_nid_t nid)
>  }
>  
>  int
> -lnet_count_acceptor_nis(void)
> +lnet_count_acceptor_nets(void)
>  {
>  	/* Return the # of NIs that need the acceptor. */
> -	int count = 0;
> -	struct lnet_ni *ni;
> -	int cpt;
> +	int		 count = 0;
> +	struct lnet_net  *net;
> +	int		 cpt;
>  
>  	cpt = lnet_net_lock_current();
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (ni->ni_net->net_lnd->lnd_accept)
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		/* all socklnd type networks should have the acceptor
> +		 * thread started */
> +		if (net->net_lnd->lnd_accept)
>  			count++;
>  	}
>  
> @@ -832,13 +840,16 @@ lnet_ping_info_create(int num_ni)
>  static inline int
>  lnet_get_ni_count(void)
>  {
> -	struct lnet_ni *ni;
> -	int count = 0;
> +	struct lnet_ni	*ni;
> +	struct lnet_net *net;
> +	int		count = 0;
>  
>  	lnet_net_lock(0);
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list)
> -		count++;
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
> +			count++;
> +	}
>  
>  	lnet_net_unlock(0);
>  
> @@ -854,14 +865,17 @@ lnet_ping_info_free(struct lnet_ping_info *pinfo)
>  static void
>  lnet_ping_info_destroy(void)
>  {
> +	struct lnet_net *net;
>  	struct lnet_ni *ni;
>  
>  	lnet_net_lock(LNET_LOCK_EX);
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		lnet_ni_lock(ni);
> -		ni->ni_status = NULL;
> -		lnet_ni_unlock(ni);
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +			lnet_ni_lock(ni);
> +			ni->ni_status = NULL;
> +			lnet_ni_unlock(ni);
> +		}
>  	}
>  
>  	lnet_ping_info_free(the_lnet.ln_ping_info);
> @@ -963,24 +977,28 @@ lnet_ping_md_unlink(struct lnet_ping_info *pinfo,
>  static void
>  lnet_ping_info_install_locked(struct lnet_ping_info *ping_info)
>  {
> +	int i = 0;
>  	struct lnet_ni_status *ns;
>  	struct lnet_ni *ni;
> -	int i = 0;
> +	struct lnet_net *net;
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		LASSERT(i < ping_info->pi_nnis);
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +			LASSERT(i < ping_info->pi_nnis);
>  
> -		ns = &ping_info->pi_ni[i];
> +			ns = &ping_info->pi_ni[i];
>  
> -		ns->ns_nid = ni->ni_nid;
> +			ns->ns_nid = ni->ni_nid;
>  
> -		lnet_ni_lock(ni);
> -		ns->ns_status = (ni->ni_status) ?
> -				 ni->ni_status->ns_status : LNET_NI_STATUS_UP;
> -		ni->ni_status = ns;
> -		lnet_ni_unlock(ni);
> +			lnet_ni_lock(ni);
> +			ns->ns_status = ni->ni_status ?
> +					ni->ni_status->ns_status :
> +						LNET_NI_STATUS_UP;
> +			ni->ni_status = ns;
> +			lnet_ni_unlock(ni);
>  
> -		i++;
> +			i++;
> +		}
>  	}
>  }
>  
> @@ -1054,9 +1072,9 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
>  	}
>  
>  	/* move it to zombie list and nobody can find it anymore */
> -	LASSERT(!list_empty(&ni->ni_list));
> -	list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
> -	lnet_ni_decref_locked(ni, 0);	/* drop ln_nis' ref */
> +	LASSERT(!list_empty(&ni->ni_netlist));
> +	list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
> +	lnet_ni_decref_locked(ni, 0);
>  }
>  
>  static void
> @@ -1076,17 +1094,17 @@ lnet_clear_zombies_nis_locked(void)
>  		int j;
>  
>  		ni = list_entry(the_lnet.ln_nis_zombie.next,
> -				struct lnet_ni, ni_list);
> -		list_del_init(&ni->ni_list);
> +				struct lnet_ni, ni_netlist);
> +		list_del_init(&ni->ni_netlist);
>  		cfs_percpt_for_each(ref, j, ni->ni_refs) {
>  			if (!*ref)
>  				continue;
>  			/* still busy, add it back to zombie list */
> -			list_add(&ni->ni_list, &the_lnet.ln_nis_zombie);
> +			list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
>  			break;
>  		}
>  
> -		if (!list_empty(&ni->ni_list)) {
> +		if (!list_empty(&ni->ni_netlist)) {
>  			lnet_net_unlock(LNET_LOCK_EX);
>  			++i;
>  			if ((i & (-i)) == i) {
> @@ -1126,6 +1144,7 @@ lnet_shutdown_lndnis(void)
>  {
>  	struct lnet_ni *ni;
>  	int i;
> +	struct lnet_net *net;
>  
>  	/* NB called holding the global mutex */
>  
> @@ -1138,10 +1157,14 @@ lnet_shutdown_lndnis(void)
>  	the_lnet.ln_shutdown = 1;	/* flag shutdown */
>  
>  	/* Unlink NIs from the global table */
> -	while (!list_empty(&the_lnet.ln_nis)) {
> -		ni = list_entry(the_lnet.ln_nis.next,
> -				struct lnet_ni, ni_list);
> -		lnet_ni_unlink_locked(ni);
> +	while (!list_empty(&the_lnet.ln_nets)) {
> +		net = list_entry(the_lnet.ln_nets.next,
> +				 struct lnet_net, net_list);
> +		while (!list_empty(&net->net_ni_list)) {
> +			ni = list_entry(net->net_ni_list.next,
> +					struct lnet_ni, ni_netlist);
> +			lnet_ni_unlink_locked(ni);
> +		}
>  	}
>  
>  	/* Drop the cached loopback NI. */
> @@ -1212,7 +1235,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  
>  	/* Make sure this new NI is unique. */
>  	lnet_net_lock(LNET_LOCK_EX);
> -	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis);
> +	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nets);
>  	lnet_net_unlock(LNET_LOCK_EX);
>  	if (!rc) {
>  		if (lnd_type == LOLND) {
> @@ -1297,7 +1320,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>  	lnet_net_lock(LNET_LOCK_EX);
>  	/* refcount for ln_nis */
>  	lnet_ni_addref_locked(ni, 0);
> -	list_add_tail(&ni->ni_list, &the_lnet.ln_nis);
> +	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
>  	if (ni->ni_cpts) {
>  		lnet_ni_addref_locked(ni, 0);
>  		list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
> @@ -1363,8 +1386,8 @@ lnet_startup_lndnis(struct list_head *nilist)
>  	int ni_count = 0;
>  
>  	while (!list_empty(nilist)) {
> -		ni = list_entry(nilist->next, struct lnet_ni, ni_list);
> -		list_del(&ni->ni_list);
> +		ni = list_entry(nilist->next, struct lnet_ni, ni_netlist);
> +		list_del(&ni->ni_netlist);
>  		rc = lnet_startup_lndni(ni, NULL);
>  
>  		if (rc < 0)
> @@ -1486,6 +1509,7 @@ LNetNIInit(lnet_pid_t requested_pid)
>  	struct lnet_ping_info *pinfo;
>  	struct lnet_handle_md md_handle;
>  	struct list_head net_head;
> +	struct lnet_net		*net;
>  
>  	INIT_LIST_HEAD(&net_head);
>  
> @@ -1505,8 +1529,15 @@ LNetNIInit(lnet_pid_t requested_pid)
>  		return rc;
>  	}
>  
> -	/* Add in the loopback network */
> -	if (!lnet_ni_alloc(LNET_MKNET(LOLND, 0), NULL, &net_head)) {
> +	/* create a network for Loopback network */
> +	net = lnet_net_alloc(LNET_MKNET(LOLND, 0), &net_head);
> +	if (net == NULL) {
> +		rc = -ENOMEM;
> +		goto err_empty_list;
> +	}
> +
> +	/* Add in the loopback NI */
> +	if (lnet_ni_alloc(net, NULL, NULL) == NULL) {
>  		rc = -ENOMEM;
>  		goto err_empty_list;
>  	}
> @@ -1584,11 +1615,11 @@ LNetNIInit(lnet_pid_t requested_pid)
>  	LASSERT(rc < 0);
>  	mutex_unlock(&the_lnet.ln_api_mutex);
>  	while (!list_empty(&net_head)) {
> -		struct lnet_ni *ni;
> +		struct lnet_net *net;
>  
> -		ni = list_entry(net_head.next, struct lnet_ni, ni_list);
> -		list_del_init(&ni->ni_list);
> -		lnet_ni_free(ni);
> +		net = list_entry(net_head.next, struct lnet_net, net_list);
> +		list_del_init(&net->net_list);
> +		lnet_net_free(net);
>  	}
>  	return rc;
>  }
> @@ -1714,25 +1745,83 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
>  	}
>  }
>  
> +struct lnet_ni *
> +lnet_get_ni_idx_locked(int idx)
> +{
> +	struct lnet_ni		*ni;
> +	struct lnet_net		*net;
> +
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +			if (idx-- == 0)
> +				return ni;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +struct lnet_ni *
> +lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev)
> +{
> +	struct lnet_ni		*ni;
> +	struct lnet_net		*net = mynet;
> +
> +	if (prev == NULL) {
> +		if (net == NULL)
> +			net = list_entry(the_lnet.ln_nets.next, struct lnet_net,
> +					net_list);
> +		ni = list_entry(net->net_ni_list.next, struct lnet_ni,
> +				ni_netlist);
> +
> +		return ni;
> +	}
> +
> +	if (prev->ni_netlist.next == &prev->ni_net->net_ni_list) {
> +		/* if you reached the end of the ni list and the net is
> +		 * specified, then there are no more nis in that net */
> +		if (net != NULL)
> +			return NULL;
> +
> +		/* we reached the end of this net ni list. move to the
> +		 * next net */
> +		if (prev->ni_net->net_list.next == &the_lnet.ln_nets)
> +			/* no more nets and no more NIs. */
> +			return NULL;
> +
> +		/* get the next net */
> +		net = list_entry(prev->ni_net->net_list.next, struct lnet_net,
> +				 net_list);
> +		/* get the ni on it */
> +		ni = list_entry(net->net_ni_list.next, struct lnet_ni,
> +				ni_netlist);
> +
> +		return ni;
> +	}
> +
> +	/* there are more nis left */
> +	ni = list_entry(prev->ni_netlist.next, struct lnet_ni, ni_netlist);
> +
> +	return ni;
> +}
> +
>  static int
>  lnet_get_net_config(struct lnet_ioctl_config_data *config)
>  {
>  	struct lnet_ni *ni;
> +	int cpt;
>  	int idx = config->cfg_count;
> -	int cpt, i = 0;
>  	int rc = -ENOENT;
>  
>  	cpt = lnet_net_lock_current();
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (i++ != idx)
> -			continue;
> +	ni = lnet_get_ni_idx_locked(idx);
>  
> +	if (ni != NULL) {
> +		rc = 0;
>  		lnet_ni_lock(ni);
>  		lnet_fill_ni_info(ni, config);
>  		lnet_ni_unlock(ni);
> -		rc = 0;
> -		break;
>  	}
>  
>  	lnet_net_unlock(cpt);
> @@ -1745,6 +1834,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  	char *nets = conf->cfg_config_u.cfg_net.net_intf;
>  	struct lnet_ping_info *pinfo;
>  	struct lnet_handle_md md_handle;
> +	struct lnet_net		*net;
>  	struct lnet_ni *ni;
>  	struct list_head net_head;
>  	struct lnet_remotenet *rnet;
> @@ -1752,7 +1842,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  
>  	INIT_LIST_HEAD(&net_head);
>  
> -	/* Create a ni structure for the network string */
> +	/* Create a net/ni structures for the network string */
>  	rc = lnet_parse_networks(&net_head, nets);
>  	if (rc <= 0)
>  		return !rc ? -EINVAL : rc;
> @@ -1760,14 +1850,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  	mutex_lock(&the_lnet.ln_api_mutex);
>  
>  	if (rc > 1) {
> -		rc = -EINVAL; /* only add one interface per call */
> +		rc = -EINVAL; /* only add one network per call */
>  		goto failed0;
>  	}
>  
> -	ni = list_entry(net_head.next, struct lnet_ni, ni_list);
> +	net = list_entry(net_head.next, struct lnet_net, net_list);
>  
>  	lnet_net_lock(LNET_LOCK_EX);
> -	rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid));
> +	rnet = lnet_find_net_locked(net->net_id);
>  	lnet_net_unlock(LNET_LOCK_EX);
>  	/*
>  	 * make sure that the net added doesn't invalidate the current
> @@ -1785,8 +1875,8 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  	if (rc)
>  		goto failed0;
>  
> -	list_del_init(&ni->ni_list);
> -
> +	list_del_init(&net->net_list);
> +	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
>  	rc = lnet_startup_lndni(ni, conf);
>  	if (rc)
>  		goto failed1;
> @@ -1812,9 +1902,9 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>  failed0:
>  	mutex_unlock(&the_lnet.ln_api_mutex);
>  	while (!list_empty(&net_head)) {
> -		ni = list_entry(net_head.next, struct lnet_ni, ni_list);
> -		list_del_init(&ni->ni_list);
> -		lnet_ni_free(ni);
> +		net = list_entry(net_head.next, struct lnet_net, net_list);
> +		list_del_init(&net->net_list);
> +		lnet_net_free(net);
>  	}
>  	return rc;
>  }
> @@ -1849,7 +1939,7 @@ lnet_dyn_del_ni(__u32 net)
>  
>  	lnet_shutdown_lndni(ni);
>  
> -	if (!lnet_count_acceptor_nis())
> +	if (!lnet_count_acceptor_nets())
>  		lnet_acceptor_stop();
>  
>  	lnet_ping_target_update(pinfo, md_handle);
> @@ -2103,7 +2193,8 @@ EXPORT_SYMBOL(LNetDebugPeer);
>  int
>  LNetGetId(unsigned int index, struct lnet_process_id *id)
>  {
> -	struct lnet_ni *ni;
> +	struct lnet_ni	 *ni;
> +	struct lnet_net  *net;
>  	int cpt;
>  	int rc = -ENOENT;
>  
> @@ -2111,14 +2202,16 @@ LNetGetId(unsigned int index, struct lnet_process_id *id)
>  
>  	cpt = lnet_net_lock_current();
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> -		if (index--)
> -			continue;
> +	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +			if (index-- != 0)
> +				continue;
>  
> -		id->nid = ni->ni_nid;
> -		id->pid = the_lnet.ln_pid;
> -		rc = 0;
> -		break;
> +			id->nid = ni->ni_nid;
> +			id->pid = the_lnet.ln_pid;
> +			rc = 0;
> +			break;
> +		}
>  	}
>  
>  	lnet_net_unlock(cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
> index 5646feeb433e..e83bdbec11e3 100644
> --- a/drivers/staging/lustre/lnet/lnet/config.c
> +++ b/drivers/staging/lustre/lnet/lnet/config.c
> @@ -78,17 +78,17 @@ lnet_issep(char c)
>  	}
>  }
>  
> -int
> -lnet_net_unique(__u32 net, struct list_head *nilist)
> +bool
> +lnet_net_unique(__u32 net, struct list_head *netlist)
>  {
> -	struct lnet_ni *ni;
> +	struct lnet_net	 *net_l;
>  
> -	list_for_each_entry(ni, nilist, ni_list) {
> -		if (LNET_NIDNET(ni->ni_nid) == net)
> -			return 0;
> +	list_for_each_entry(net_l, netlist, net_list) {
> +		if (net_l->net_id == net)
> +			return false;
>  	}
>  
> -	return 1;
> +	return true;
>  }
>  
>  void
> @@ -112,41 +112,78 @@ lnet_ni_free(struct lnet_ni *ni)
>  	if (ni->ni_net_ns)
>  		put_net(ni->ni_net_ns);
>  
> -	kvfree(ni->ni_net);
>  	kfree(ni);
>  }
>  
> -struct lnet_ni *
> -lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
> +void
> +lnet_net_free(struct lnet_net *net)
>  {
> -	struct lnet_tx_queue *tq;
> +	struct list_head *tmp, *tmp2;
>  	struct lnet_ni *ni;
> -	int rc;
> -	int i;
> +
> +	/* delete any nis which have been started. */
> +	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
> +		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
> +		list_del_init(&ni->ni_netlist);
> +		lnet_ni_free(ni);
> +	}
> +
> +	kfree(net);
> +}
> +
> +struct lnet_net *
> +lnet_net_alloc(__u32 net_id, struct list_head *net_list)
> +{
>  	struct lnet_net		*net;
>  
> -	if (!lnet_net_unique(net_id, nilist)) {
> -		LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
> -				   libcfs_net2str(net_id));
> +	if (!lnet_net_unique(net_id, net_list)) {
> +		CERROR("Duplicate net %s. Ignore\n",
> +		       libcfs_net2str(net_id));
>  		return NULL;
>  	}
>  
> -	ni = kzalloc(sizeof(*ni), GFP_NOFS);
>  	net = kzalloc(sizeof(*net), GFP_NOFS);
> -	if (!ni || !net) {
> -		kfree(ni); kfree(net);
> +	if (!net) {
>  		CERROR("Out of memory creating network %s\n",
>  		       libcfs_net2str(net_id));
>  		return NULL;
>  	}
> +
> +	INIT_LIST_HEAD(&net->net_list);
> +	INIT_LIST_HEAD(&net->net_ni_list);
> +
> +	net->net_id = net_id;
> +
>  	/* initialize global paramters to undefiend */
>  	net->net_tunables.lct_peer_timeout = -1;
>  	net->net_tunables.lct_max_tx_credits = -1;
>  	net->net_tunables.lct_peer_tx_credits = -1;
>  	net->net_tunables.lct_peer_rtr_credits = -1;
>  
> +	list_add_tail(&net->net_list, net_list);
> +
> +	return net;
> +}
> +
> +struct lnet_ni *
> +lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
> +{
> +	struct lnet_tx_queue	*tq;
> +	struct lnet_ni		*ni;
> +	int			rc;
> +	int			i;
> +
> +	ni = kzalloc(sizeof(*ni), GFP_KERNEL);
> +	if (ni == NULL) {
> +		CERROR("Out of memory creating network interface %s%s\n",
> +		       libcfs_net2str(net->net_id),
> +		       (iface != NULL) ? iface : "");
> +		return NULL;
> +	}
> +
>  	spin_lock_init(&ni->ni_lock);
>  	INIT_LIST_HEAD(&ni->ni_cptlist);
> +	INIT_LIST_HEAD(&ni->ni_netlist);
>  	ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(),
>  				       sizeof(*ni->ni_refs[0]));
>  	if (!ni->ni_refs)
> @@ -166,8 +203,9 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
>  	} else {
>  		rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
>  		if (rc <= 0) {
> -			CERROR("Failed to set CPTs for NI %s: %d\n",
> -			       libcfs_net2str(net_id), rc);
> +			CERROR("Failed to set CPTs for NI %s(%s): %d\n",
> +			       libcfs_net2str(net->net_id),
> +			       (iface != NULL) ? iface : "", rc);
>  			goto failed;
>  		}
>  
> @@ -182,7 +220,7 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
>  
>  	ni->ni_net = net;
>  	/* LND will fill in the address part of the NID */
> -	ni->ni_nid = LNET_MKNID(net_id, 0);
> +	ni->ni_nid = LNET_MKNID(net->net_id, 0);
>  
>  	/* Store net namespace in which current ni is being created */
>  	if (current->nsproxy->net_ns)
> @@ -191,22 +229,24 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
>  		ni->ni_net_ns = NULL;
>  
>  	ni->ni_last_alive = ktime_get_real_seconds();
> -	list_add_tail(&ni->ni_list, nilist);
> +	list_add_tail(&ni->ni_netlist, &net->net_ni_list);
> +
>  	return ni;
> - failed:
> +failed:
>  	lnet_ni_free(ni);
>  	return NULL;
>  }
>  
>  int
> -lnet_parse_networks(struct list_head *nilist, char *networks)
> +lnet_parse_networks(struct list_head *netlist, char *networks)
>  {
>  	struct cfs_expr_list *el = NULL;
>  	char *tokens;
>  	char *str;
>  	char *tmp;
> -	struct lnet_ni *ni;
> -	__u32 net;
> +	struct lnet_net *net;
> +	struct lnet_ni *ni = NULL;
> +	__u32 net_id;
>  	int nnets = 0;
>  	struct list_head *temp_node;
>  
> @@ -275,18 +315,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>  
>  			if (comma)
>  				*comma++ = 0;
> -			net = libcfs_str2net(strim(str));
> +			net_id = libcfs_str2net(strim(str));
>  
> -			if (net == LNET_NIDNET(LNET_NID_ANY)) {
> +			if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
>  				LCONSOLE_ERROR_MSG(0x113,
>  						   "Unrecognised network type\n");
>  				tmp = str;
>  				goto failed_syntax;
>  			}
>  
> -			if (LNET_NETTYP(net) != LOLND && /* LO is implicit */
> -			    !lnet_ni_alloc(net, el, nilist))
> -				goto failed;
> +			if (LNET_NETTYP(net_id) != LOLND) { /* LO is implicit */
> +				net = lnet_net_alloc(net_id, netlist);
> +				if (!net ||
> +				    !lnet_ni_alloc(net, el, NULL))
> +					goto failed;
> +			}
>  
>  			if (el) {
>  				cfs_expr_list_free(el);
> @@ -298,14 +341,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>  		}
>  
>  		*bracket = 0;
> -		net = libcfs_str2net(strim(str));
> -		if (net == LNET_NIDNET(LNET_NID_ANY)) {
> +		net_id = libcfs_str2net(strim(str));
> +		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
>  			tmp = str;
>  			goto failed_syntax;
>  		}
>  
> -		ni = lnet_ni_alloc(net, el, nilist);
> -		if (!ni)
> +		/* always allocate a net, since we will eventually add an
> +		 * interface to it, or we will fail, in which case we'll
> +		 * just delete it */
> +		net = lnet_net_alloc(net_id, netlist);
> +		if (IS_ERR_OR_NULL(net))
> +			goto failed;
> +
> +		ni = lnet_ni_alloc(net, el, NULL);
> +		if (IS_ERR_OR_NULL(ni))
>  			goto failed;
>  
>  		if (el) {
> @@ -337,7 +387,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>  			if (niface == LNET_MAX_INTERFACES) {
>  				LCONSOLE_ERROR_MSG(0x115,
>  						   "Too many interfaces for net %s\n",
> -						   libcfs_net2str(net));
> +						   libcfs_net2str(net_id));
>  				goto failed;
>  			}
>  
> @@ -378,7 +428,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>  		}
>  	}
>  
> -	list_for_each(temp_node, nilist)
> +	list_for_each(temp_node, netlist)
>  		nnets++;
>  
>  	kfree(tokens);
> @@ -387,11 +437,12 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>   failed_syntax:
>  	lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
>   failed:
> -	while (!list_empty(nilist)) {
> -		ni = list_entry(nilist->next, struct lnet_ni, ni_list);
> +	/* free the net list and all the nis on each net */
> +	while (!list_empty(netlist)) {
> +		net = list_entry(netlist->next, struct lnet_net, net_list);
>  
> -		list_del(&ni->ni_list);
> -		lnet_ni_free(ni);
> +		list_del_init(&net->net_list);
> +		lnet_net_free(net);
>  	}
>  
>  	if (el)
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index 1bf12af87a20..1c874025fa74 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -2289,7 +2289,7 @@ EXPORT_SYMBOL(LNetGet);
>  int
>  LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
>  {
> -	struct lnet_ni *ni;
> +	struct lnet_ni *ni = NULL;
>  	struct lnet_remotenet *rnet;
>  	__u32 dstnet = LNET_NIDNET(dstnid);
>  	int hops;
> @@ -2307,9 +2307,9 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
>  
>  	cpt = lnet_net_lock_current();
>  
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> +	while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
>  		if (ni->ni_nid == dstnid) {
> -			if (srcnidp)
> +			if (srcnidp != NULL)
>  				*srcnidp = dstnid;
>  			if (orderp) {
>  				if (LNET_NETTYP(LNET_NIDNET(dstnid)) == LOLND)
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 0c0ec0b27982..135dfe793b0b 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -245,13 +245,10 @@ static void lnet_shuffle_seed(void)
>  	if (seeded)
>  		return;
>  
> -	/*
> -	 * Nodes with small feet have little entropy
> -	 * the NID for this node gives the most entropy in the low bits
> -	 */
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> +	/* Nodes with small feet have little entropy
> +	 * the NID for this node gives the most entropy in the low bits */
> +	while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
>  		__u32 lnd_type, seed;
> -
>  		lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
>  		if (lnd_type != LOLND) {
>  			seed = (LNET_NIDADDR(ni->ni_nid) | lnd_type);
> @@ -807,8 +804,8 @@ lnet_router_ni_update_locked(struct lnet_peer *gw, __u32 net)
>  static void
>  lnet_update_ni_status_locked(void)
>  {
> -	struct lnet_ni *ni;
> -	time64_t now;
> +	struct lnet_ni *ni = NULL;
> +	time64_t	now;
>  	time64_t timeout;
>  
>  	LASSERT(the_lnet.ln_routing);
> @@ -817,7 +814,7 @@ lnet_update_ni_status_locked(void)
>  		  max(live_router_check_interval, dead_router_check_interval);
>  
>  	now = ktime_get_real_seconds();
> -	list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> +	while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
>  		if (ni->ni_net->net_lnd->lnd_type == LOLND)
>  			continue;
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index f3ccd6a2b70e..2a366e9a8627 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -641,26 +641,12 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
>  			      "rtr", "max", "tx", "min");
>  		LASSERT(tmpstr + tmpsiz - s > 0);
>  	} else {
> -		struct list_head *n;
>  		struct lnet_ni *ni = NULL;
>  		int skip = *ppos - 1;
>  
>  		lnet_net_lock(0);
>  
> -		n = the_lnet.ln_nis.next;
> -
> -		while (n != &the_lnet.ln_nis) {
> -			struct lnet_ni *a_ni;
> -
> -			a_ni = list_entry(n, struct lnet_ni, ni_list);
> -			if (!skip) {
> -				ni = a_ni;
> -				break;
> -			}
> -
> -			skip--;
> -			n = n->next;
> -		}
> +		ni = lnet_get_ni_idx_locked(skip);
>  
>  		if (ni) {
>  			struct lnet_tx_queue *tq;
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net NeilBrown
@ 2018-09-10 23:28   ` Doug Oucharek
  2018-09-12  2:16     ` NeilBrown
  2018-09-11  1:02   ` James Simmons
  1 sibling, 1 reply; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:28 UTC (permalink / raw)
  To: lustre-devel

I agree with a comment from James Simmons: __u32 should only be used when the variable is being shared with user space.  We need to start converting all uses of __uXX in LNet to just uXX.  Perhaps that should be a set of future patches once all of MR/DD has landed?

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

struct lnet_net now has a list of cpts, which is the union
of the cpts for each lnet_ni.

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-types.h  |    6 +
drivers/staging/lustre/lnet/lnet/config.c          |  164 ++++++++++++++++++++
2 files changed, 170 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 2d2c066a11ba..22957d142cc0 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -266,6 +266,12 @@ struct lnet_net {
* lnet/include/lnet/nidstr.h */
__u32 net_id;

+ /* total number of CPTs in the array */
+ __u32 net_ncpts;
+
+ /* cumulative CPTs of all NIs in this net */
+ __u32 *net_cpts;
+
/* network tunables */
struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;

diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index e83bdbec11e3..380a3fb1caba 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -91,11 +91,169 @@ lnet_net_unique(__u32 net, struct list_head *netlist)
return true;
}

+static bool
+in_array(__u32 *array, __u32 size, __u32 value)
+{
+ int i;
+
+ for (i = 0; i < size; i++) {
+ if (array[i] == value)
+ return false;
+ }
+
+ return true;
+}
+
+static int
+lnet_net_append_cpts(__u32 *cpts, __u32 ncpts, struct lnet_net *net)
+{
+ __u32 *added_cpts = NULL;
+ int i, j = 0, rc = 0;
+
+ /*
+ * no need to go futher since a subset of the NIs already exist on
+ * all CPTs
+ */
+ if (net->net_ncpts == LNET_CPT_NUMBER)
+ return 0;
+
+ if (cpts == NULL) {
+ /* there is an NI which will exist on all CPTs */
+ if (net->net_cpts != NULL)
+ kvfree(net->net_cpts);
+ net->net_cpts = NULL;
+ net->net_ncpts = LNET_CPT_NUMBER;
+ return 0;
+ }
+
+ if (net->net_cpts == NULL) {
+ net->net_cpts = kmalloc_array(ncpts, sizeof(net->net_cpts),
+      GFP_KERNEL);
+ if (net->net_cpts == NULL)
+ return -ENOMEM;
+ memcpy(net->net_cpts, cpts, ncpts);
+ return 0;
+ }
+
+ added_cpts = kmalloc_array(LNET_CPT_NUMBER, sizeof(*added_cpts),
+   GFP_KERNEL);
+ if (added_cpts == NULL)
+ return -ENOMEM;
+
+ for (i = 0; i < ncpts; i++) {
+ if (!in_array(net->net_cpts, net->net_ncpts, cpts[i])) {
+ added_cpts[j] = cpts[i];
+ j++;
+ }
+ }
+
+ /* append the new cpts if any to the list of cpts in the net */
+ if (j > 0) {
+ __u32 *array = NULL, *loc;
+ __u32 total_entries = j + net->net_ncpts;
+
+ array = kmalloc_array(total_entries, sizeof(*net->net_cpts),
+      GFP_KERNEL);
+ if (array == NULL) {
+ rc = -ENOMEM;
+ goto failed;
+ }
+
+ memcpy(array, net->net_cpts,
+       net->net_ncpts * sizeof(*net->net_cpts));
+ loc = array + net->net_ncpts;
+ memcpy(loc, added_cpts, j * sizeof(*net->net_cpts));
+
+ kfree(net->net_cpts);
+ net->net_ncpts = total_entries;
+ net->net_cpts = array;
+ }
+
+failed:
+ kfree(added_cpts);
+
+ return rc;
+}
+
+static void
+lnet_net_remove_cpts(__u32 *cpts, __u32 ncpts, struct lnet_net *net)
+{
+ struct lnet_ni *ni;
+ int rc;
+
+ /*
+ * Operation Assumption:
+ * This function is called after an NI has been removed from
+ * its parent net.
+ *
+ * if we're removing an NI which exists on all CPTs then
+ * we have to check if any of the other NIs on this net also
+ * exists on all CPTs. If none, then we need to build our Net CPT
+ * list based on the remaining NIs.
+ *
+ * If the NI being removed exist on a subset of the CPTs then we
+ * alo rebuild the Net CPT list based on the remaining NIs, which
+ * should resutl in the expected Net CPT list.
+ */
+
+ /*
+ * sometimes this function can be called due to some failure
+ * creating an NI, before any of the cpts are allocated, so check
+ * for that case and don't do anything
+ */
+ if (ncpts == 0)
+ return;
+
+ if (ncpts == LNET_CPT_NUMBER) {
+ /*
+ * first iteration through the NI list in the net to see
+ * if any of the NIs exist on all the CPTs. If one is
+ * found then our job is done.
+ */
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ if (ni->ni_ncpts == LNET_CPT_NUMBER)
+ return;
+ }
+ }
+
+ /*
+ * Rebuild the Net CPT list again, thereby only including only the
+ * CPTs which the remaining NIs are associated with.
+ */
+ if (net->net_cpts != NULL) {
+ kfree(net->net_cpts);
+ net->net_cpts = NULL;
+ }
+
+ list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+ rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts,
+  net);
+ if (rc != 0) {
+ CERROR("Out of Memory\n");
+ /*
+ * do our best to keep on going. Delete
+ * the net cpts and set it to NULL. This
+ * way we can keep on going but less
+ * efficiently, since memory accesses might be
+ * accross CPT lines.
+ */
+ if (net->net_cpts != NULL) {
+ kfree(net->net_cpts);
+ net->net_cpts = NULL;
+ net->net_ncpts = LNET_CPT_NUMBER;
+ }
+ return;
+ }
+ }
+}
+
void
lnet_ni_free(struct lnet_ni *ni)
{
int i;

+ lnet_net_remove_cpts(ni->ni_cpts, ni->ni_ncpts, ni->ni_net);
+
if (ni->ni_refs)
cfs_percpt_free(ni->ni_refs);

@@ -128,6 +286,9 @@ lnet_net_free(struct lnet_net *net)
lnet_ni_free(ni);
}

+ if (net->net_cpts != NULL)
+ kfree(net->net_cpts);
+
kfree(net);
}

@@ -229,6 +390,9 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
ni->ni_net_ns = NULL;

ni->ni_last_alive = ktime_get_real_seconds();
+ rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
+ if (rc != 0)
+ goto failed;
list_add_tail(&ni->ni_netlist, &net->net_ni_list);

return ni;



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/741a25c0/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message NeilBrown
  2018-09-10 23:24   ` Doug Oucharek
@ 2018-09-10 23:29   ` James Simmons
  2018-09-10 23:36   ` James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:29 UTC (permalink / raw)
  To: lustre-devel


> Currently we store the net-interface in the peer, but the
> peer should identify just the network, not the particular interface.
> To help track which actual interface is used for each
> message, store them explicitly.

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter. With a combo
patch the following works well.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18274
Reviewed-on: http://review.whamcloud.com/20729
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: NeilBrown <neilb@suse.com>
 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> and includes commit 63c3e5129873 ("LU-7734 lnet: Fix lnet_msg_free()")
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    3 +++
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |   21 ++++++++++++++++++--
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 5f0d4703bf86..16a493529a46 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -98,6 +98,9 @@ struct lnet_msg {
>  
>  	void			*msg_private;
>  	struct lnet_libmd	*msg_md;
> +	/* the NI the message was sent or received over */
> +	struct lnet_ni       *msg_txni;
> +	struct lnet_ni       *msg_rxni;
>  
>  	unsigned int		 msg_len;
>  	unsigned int		 msg_wanted;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index 1c874025fa74..b2a52ddcefcb 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -782,6 +782,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
>  {
>  	struct lnet_peer *txpeer = msg->msg_txpeer;
>  	struct lnet_msg *msg2;
> +	struct lnet_ni	*txni = msg->msg_txni;
>  
>  	if (msg->msg_txcredit) {
>  		struct lnet_ni *ni = txpeer->lp_ni;
> @@ -829,6 +830,11 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
>  		}
>  	}
>  
> +	if (txni != NULL) {
> +		msg->msg_txni = NULL;
> +		lnet_ni_decref_locked(txni, msg->msg_tx_cpt);
> +	}
> +
>  	if (txpeer) {
>  		msg->msg_txpeer = NULL;
>  		lnet_peer_decref_locked(txpeer);
> @@ -876,6 +882,7 @@ void
>  lnet_return_rx_credits_locked(struct lnet_msg *msg)
>  {
>  	struct lnet_peer *rxpeer = msg->msg_rxpeer;
> +	struct lnet_ni	*rxni = msg->msg_rxni;
>  	struct lnet_msg *msg2;
>  
>  	if (msg->msg_rtrcredit) {
> @@ -951,6 +958,10 @@ lnet_return_rx_credits_locked(struct lnet_msg *msg)
>  			(void)lnet_post_routed_recv_locked(msg2, 1);
>  		}
>  	}
> +	if (rxni != NULL) {
> +		msg->msg_rxni = NULL;
> +		lnet_ni_decref_locked(rxni, msg->msg_rx_cpt);
> +	}
>  	if (rxpeer) {
>  		msg->msg_rxpeer = NULL;
>  		lnet_peer_decref_locked(rxpeer);
> @@ -1218,9 +1229,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>  
>  	LASSERT(!msg->msg_peertxcredit);
>  	LASSERT(!msg->msg_txcredit);
> -	LASSERT(!msg->msg_txpeer);
> +	LASSERT(msg->msg_txpeer == NULL);
>  
> -	msg->msg_txpeer = lp;		   /* msg takes my ref on lp */
> +	msg->msg_txpeer = lp;                   /* msg takes my ref on lp */
> +	/* set the NI for this message */
> +	msg->msg_txni = src_ni;
> +	lnet_ni_addref_locked(msg->msg_txni, cpt);
>  
>  	rc = lnet_post_send_locked(msg, 0);
>  	lnet_net_unlock(cpt);
> @@ -1818,6 +1832,8 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
>  			return 0;
>  		goto drop;
>  	}
> +	msg->msg_rxni = ni;
> +	lnet_ni_addref_locked(ni, cpt);
>  
>  	if (lnet_isrouter(msg->msg_rxpeer)) {
>  		lnet_peer_set_alive(msg->msg_rxpeer);
> @@ -1934,6 +1950,7 @@ lnet_recv_delayed_msg_list(struct list_head *head)
>  		LASSERT(msg->msg_rx_delayed);
>  		LASSERT(msg->msg_md);
>  		LASSERT(msg->msg_rxpeer);
> +		LASSERT(msg->msg_rxni);
>  		LASSERT(msg->msg_hdr.type == LNET_MSG_PUT);
>  
>  		CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid()
  2018-09-07  0:49 ` [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid() NeilBrown
@ 2018-09-10 23:32   ` Doug Oucharek
  2018-09-11  1:03   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-10 23:32 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Doug

On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:

When choosing a cpt to use for a given network (identified by nid),
the choice might depend on a particular interface which has
already been identified - different interfaces can have different
sets of cpts.

So add an 'ni' arg to lnet_cpt_of_nid(). If given, choose a cpt
from the cpts of that interface. If not given, choose one from
the set of all cpts associated with any interface on the network.

This is part of
   8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
      LU-7734 lnet: Multi-Rail local NI split

Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
---
.../staging/lustre/include/linux/lnet/lib-lnet.h   |    4 +-
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    4 +-
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 -
.../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    4 +-
drivers/staging/lustre/lnet/lnet/api-ni.c          |   41 ++++++++++++--------
drivers/staging/lustre/lnet/lnet/lib-move.c        |   12 +++---
drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 -
drivers/staging/lustre/lnet/lnet/peer.c            |    4 +-
drivers/staging/lustre/lnet/lnet/router.c          |    4 +-
drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 -
drivers/staging/lustre/lnet/selftest/framework.c   |    3 +
drivers/staging/lustre/lnet/selftest/selftest.h    |    2 -
12 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
index 34509e52bac7..e32dbb854d80 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
@@ -395,8 +395,8 @@ lnet_net2rnethash(__u32 net)
extern struct lnet_lnd the_lolnd;
extern int avoid_asym_router_failure;

-int lnet_cpt_of_nid_locked(lnet_nid_t nid);
-int lnet_cpt_of_nid(lnet_nid_t nid);
+int lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni);
+int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
struct lnet_ni *lnet_net2ni(__u32 net);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index ade566d20c69..958ac9a99045 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -320,7 +320,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer **peerp,
{
struct kib_peer *peer;
struct kib_net *net = ni->ni_data;
- int cpt = lnet_cpt_of_nid(nid);
+ int cpt = lnet_cpt_of_nid(nid, ni);
unsigned long flags;

LASSERT(net);
@@ -643,7 +643,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer *peer, struct rdma_cm_id *cm

dev = net->ibn_dev;

- cpt = lnet_cpt_of_nid(peer->ibp_nid);
+ cpt = lnet_cpt_of_nid(peer->ibp_nid, peer->ibp_ni);
sched = kiblnd_data.kib_scheds[cpt];

LASSERT(sched->ibs_nthreads > 0);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index c266940cb2ae..e64c14914924 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -119,7 +119,7 @@ kiblnd_get_idle_tx(struct lnet_ni *ni, lnet_nid_t target)
struct kib_tx *tx;
struct kib_tx_poolset *tps;

- tps = net->ibn_tx_ps[lnet_cpt_of_nid(target)];
+ tps = net->ibn_tx_ps[lnet_cpt_of_nid(target, ni)];
node = kiblnd_pool_alloc_node(&tps->tps_poolset);
if (!node)
return NULL;
diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
index 2036a0ae5917..ba68bcee90bc 100644
--- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
+++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
@@ -101,7 +101,7 @@ static int
ksocknal_create_peer(struct ksock_peer **peerp, struct lnet_ni *ni,
    struct lnet_process_id id)
{
- int cpt = lnet_cpt_of_nid(id.nid);
+ int cpt = lnet_cpt_of_nid(id.nid, ni);
struct ksock_net *net = ni->ni_data;
struct ksock_peer *peer;

@@ -1099,7 +1099,7 @@ ksocknal_create_conn(struct lnet_ni *ni, struct ksock_route *route,
LASSERT(conn->ksnc_proto);
LASSERT(peerid.nid != LNET_NID_ANY);

- cpt = lnet_cpt_of_nid(peerid.nid);
+ cpt = lnet_cpt_of_nid(peerid.nid, ni);

if (active) {
ksocknal_peer_addref(peer);
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index c21aef32cdde..6e0b8310574d 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -713,31 +713,41 @@ lnet_nid_cpt_hash(lnet_nid_t nid, unsigned int number)
}

int
-lnet_cpt_of_nid_locked(lnet_nid_t nid)
+lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni)
{
- struct lnet_ni *ni;
+ struct lnet_net *net;

/* must called with hold of lnet_net_lock */
if (LNET_CPT_NUMBER == 1)
return 0; /* the only one */

- /* take lnet_net_lock(any) would be OK */
- if (!list_empty(&the_lnet.ln_nis_cpt)) {
- list_for_each_entry(ni, &the_lnet.ln_nis_cpt, ni_cptlist) {
- if (LNET_NIDNET(ni->ni_nid) != LNET_NIDNET(nid))
- continue;
+ /*
+ * If NI is provided then use the CPT identified in the NI cpt
+ * list if one exists. If one doesn't exist, then that NI is
+ * associated with all CPTs and it follows that the net it belongs
+ * to is implicitly associated with all CPTs, so just hash the nid
+ * and return that.
+ */
+ if (ni != NULL) {
+ if (ni->ni_cpts != NULL)
+ return ni->ni_cpts[lnet_nid_cpt_hash(nid,
+     ni->ni_ncpts)];
+ else
+ return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
+ }

- LASSERT(ni->ni_cpts);
- return ni->ni_cpts[lnet_nid_cpt_hash
-   (nid, ni->ni_ncpts)];
- }
+ /* no NI provided so look at the net */
+ net = lnet_get_net_locked(LNET_NIDNET(nid));
+
+ if (net != NULL && net->net_cpts) {
+ return net->net_cpts[lnet_nid_cpt_hash(nid, net->net_ncpts)];
}

return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
}

int
-lnet_cpt_of_nid(lnet_nid_t nid)
+lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni)
{
int cpt;
int cpt2;
@@ -745,11 +755,10 @@ lnet_cpt_of_nid(lnet_nid_t nid)
if (LNET_CPT_NUMBER == 1)
return 0; /* the only one */

- if (list_empty(&the_lnet.ln_nis_cpt))
- return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
-
cpt = lnet_net_lock_current();
- cpt2 = lnet_cpt_of_nid_locked(nid);
+
+ cpt2 = lnet_cpt_of_nid_locked(nid, ni);
+
lnet_net_unlock(cpt);

return cpt2;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
index b6e81a693fc3..02cd1a5a466f 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-move.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
@@ -1095,7 +1095,9 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
msg->msg_sending = 1;

LASSERT(!msg->msg_tx_committed);
- cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid);
+ local_ni = lnet_net2ni(LNET_NIDNET(dst_nid));
+ cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid,
+      local_ni);
 again:
lnet_net_lock(cpt);

@@ -1188,7 +1190,7 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
* was changed when we release the lock
*/
if (rtr_nid != lp->lp_nid) {
- cpt2 = lnet_cpt_of_nid_locked(lp->lp_nid);
+ cpt2 = lp->lp_cpt;
if (cpt2 != cpt) {
if (src_ni)
lnet_ni_decref_locked(src_ni, cpt);
@@ -1677,7 +1679,7 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
payload_length = le32_to_cpu(hdr->payload_length);

for_me = (ni->ni_nid == dest_nid);
- cpt = lnet_cpt_of_nid(from_nid);
+ cpt = lnet_cpt_of_nid(from_nid, ni);

switch (type) {
case LNET_MSG_ACK:
@@ -2149,7 +2151,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg)
lnet_msg_attach_md(msg, getmd, getmd->md_offset, getmd->md_length);
lnet_res_unlock(cpt);

- cpt = lnet_cpt_of_nid(peer_id.nid);
+ cpt = lnet_cpt_of_nid(peer_id.nid, ni);

lnet_net_lock(cpt);
lnet_msg_commit(msg, cpt);
@@ -2160,7 +2162,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg)
return msg;

 drop:
- cpt = lnet_cpt_of_nid(peer_id.nid);
+ cpt = lnet_cpt_of_nid(peer_id.nid, ni);

lnet_net_lock(cpt);
the_lnet.ln_counters[cpt]->drop_count++;
diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
index 90ce51801726..c8d8162cc706 100644
--- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
+++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
@@ -220,7 +220,7 @@ lnet_match2mt(struct lnet_portal *ptl, struct lnet_process_id id, __u64 mbits)

/* if it's a unique portal, return match-table hashed by NID */
return lnet_ptl_is_unique(ptl) ?
-       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid)] : NULL;
+       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid, NULL)] : NULL;
}

struct lnet_match_table *
diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
index ed29124ebded..808ce25f1f00 100644
--- a/drivers/staging/lustre/lnet/lnet/peer.c
+++ b/drivers/staging/lustre/lnet/lnet/peer.c
@@ -270,7 +270,7 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
return -ESHUTDOWN;

/* cpt can be LNET_LOCK_EX if it's called from router functions */
- cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid);
+ cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid, NULL);

ptable = the_lnet.ln_peer_tables[cpt2];
lp = lnet_find_peer_locked(ptable, nid);
@@ -362,7 +362,7 @@ lnet_debug_peer(lnet_nid_t nid)
int rc;
int cpt;

- cpt = lnet_cpt_of_nid(nid);
+ cpt = lnet_cpt_of_nid(nid, NULL);
lnet_net_lock(cpt);

rc = lnet_nid2peer_locked(&lp, nid, cpt);
diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index 72b8ca2b0fc6..5493d13de6d9 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1207,7 +1207,7 @@ lnet_router_checker(void *arg)
version = the_lnet.ln_routers_version;

list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) {
- cpt2 = lnet_cpt_of_nid_locked(rtr->lp_nid);
+ cpt2 = rtr->lp_cpt;
if (cpt != cpt2) {
lnet_net_unlock(cpt);
cpt = cpt2;
@@ -1693,7 +1693,7 @@ lnet_notify(struct lnet_ni *ni, lnet_nid_t nid, int alive, time64_t when)
{
struct lnet_peer *lp = NULL;
time64_t now = ktime_get_seconds();
- int cpt = lnet_cpt_of_nid(nid);
+ int cpt = lnet_cpt_of_nid(nid, ni);

LASSERT(!in_interrupt());

diff --git a/drivers/staging/lustre/lnet/selftest/brw_test.c b/drivers/staging/lustre/lnet/selftest/brw_test.c
index f1ee219bc8f3..e372ff3044c8 100644
--- a/drivers/staging/lustre/lnet/selftest/brw_test.c
+++ b/drivers/staging/lustre/lnet/selftest/brw_test.c
@@ -124,7 +124,7 @@ brw_client_init(struct sfw_test_instance *tsi)
return -EINVAL;

list_for_each_entry(tsu, &tsi->tsi_units, tsu_list) {
- bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid),
+ bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid, NULL),
      off, npg, len, opc == LST_BRW_READ);
if (!bulk) {
brw_client_fini(tsi);
diff --git a/drivers/staging/lustre/lnet/selftest/framework.c b/drivers/staging/lustre/lnet/selftest/framework.c
index 944a2a6598fa..a82efc394659 100644
--- a/drivers/staging/lustre/lnet/selftest/framework.c
+++ b/drivers/staging/lustre/lnet/selftest/framework.c
@@ -1013,7 +1013,8 @@ sfw_run_batch(struct sfw_batch *tsb)
tsu->tsu_loop = tsi->tsi_loop;
wi = &tsu->tsu_worker;
swi_init_workitem(wi, sfw_run_test,
-  lst_test_wq[lnet_cpt_of_nid(tsu->tsu_dest.nid)]);
+  lst_test_wq[lnet_cpt_of_nid(tsu->tsu_dest.nid,
+  NULL)]);
swi_schedule_workitem(wi);
}
}
diff --git a/drivers/staging/lustre/lnet/selftest/selftest.h b/drivers/staging/lustre/lnet/selftest/selftest.h
index 9dbb0a51d430..edf783af90e8 100644
--- a/drivers/staging/lustre/lnet/selftest/selftest.h
+++ b/drivers/staging/lustre/lnet/selftest/selftest.h
@@ -527,7 +527,7 @@ srpc_init_client_rpc(struct srpc_client_rpc *rpc, struct lnet_process_id peer,

INIT_LIST_HEAD(&rpc->crpc_list);
swi_init_workitem(&rpc->crpc_wi, srpc_send_rpc,
-  lst_test_wq[lnet_cpt_of_nid(peer.nid)]);
+  lst_test_wq[lnet_cpt_of_nid(peer.nid, NULL)]);
spin_lock_init(&rpc->crpc_lock);
atomic_set(&rpc->crpc_refcount, 1); /* 1 ref for caller */




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180910/89c3c6e7/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message NeilBrown
  2018-09-10 23:24   ` Doug Oucharek
  2018-09-10 23:29   ` James Simmons
@ 2018-09-10 23:36   ` James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-10 23:36 UTC (permalink / raw)
  To: lustre-devel


> Currently we store the net-interface in the peer, but the
> peer should identify just the network, not the particular interface.
> To help track which actual interface is used for each
> message, store them explicitly.

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter. With a combo
patch the following works well.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18274
Reviewed-on: http://review.whamcloud.com/20729
Reviewed-by: Doug Oucharek <dougso@me.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Signed-off-by: NeilBrown <neilb@suse.com>
 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> and includes commit 63c3e5129873 ("LU-7734 lnet: Fix lnet_msg_free()")
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    3 +++
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |   21 ++++++++++++++++++--
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 5f0d4703bf86..16a493529a46 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -98,6 +98,9 @@ struct lnet_msg {
>  
>  	void			*msg_private;
>  	struct lnet_libmd	*msg_md;
> +	/* the NI the message was sent or received over */
> +	struct lnet_ni       *msg_txni;
> +	struct lnet_ni       *msg_rxni;
>  
>  	unsigned int		 msg_len;
>  	unsigned int		 msg_wanted;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index 1c874025fa74..b2a52ddcefcb 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -782,6 +782,7 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
>  {
>  	struct lnet_peer *txpeer = msg->msg_txpeer;
>  	struct lnet_msg *msg2;
> +	struct lnet_ni	*txni = msg->msg_txni;
>  
>  	if (msg->msg_txcredit) {
>  		struct lnet_ni *ni = txpeer->lp_ni;
> @@ -829,6 +830,11 @@ lnet_return_tx_credits_locked(struct lnet_msg *msg)
>  		}
>  	}
>  
> +	if (txni != NULL) {
> +		msg->msg_txni = NULL;
> +		lnet_ni_decref_locked(txni, msg->msg_tx_cpt);
> +	}
> +
>  	if (txpeer) {
>  		msg->msg_txpeer = NULL;
>  		lnet_peer_decref_locked(txpeer);
> @@ -876,6 +882,7 @@ void
>  lnet_return_rx_credits_locked(struct lnet_msg *msg)
>  {
>  	struct lnet_peer *rxpeer = msg->msg_rxpeer;
> +	struct lnet_ni	*rxni = msg->msg_rxni;
>  	struct lnet_msg *msg2;
>  
>  	if (msg->msg_rtrcredit) {
> @@ -951,6 +958,10 @@ lnet_return_rx_credits_locked(struct lnet_msg *msg)
>  			(void)lnet_post_routed_recv_locked(msg2, 1);
>  		}
>  	}
> +	if (rxni != NULL) {
> +		msg->msg_rxni = NULL;
> +		lnet_ni_decref_locked(rxni, msg->msg_rx_cpt);
> +	}
>  	if (rxpeer) {
>  		msg->msg_rxpeer = NULL;
>  		lnet_peer_decref_locked(rxpeer);
> @@ -1218,9 +1229,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>  
>  	LASSERT(!msg->msg_peertxcredit);
>  	LASSERT(!msg->msg_txcredit);
> -	LASSERT(!msg->msg_txpeer);
> +	LASSERT(msg->msg_txpeer == NULL);
>  
> -	msg->msg_txpeer = lp;		   /* msg takes my ref on lp */
> +	msg->msg_txpeer = lp;                   /* msg takes my ref on lp */
> +	/* set the NI for this message */
> +	msg->msg_txni = src_ni;
> +	lnet_ni_addref_locked(msg->msg_txni, cpt);
>  
>  	rc = lnet_post_send_locked(msg, 0);
>  	lnet_net_unlock(cpt);
> @@ -1818,6 +1832,8 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
>  			return 0;
>  		goto drop;
>  	}
> +	msg->msg_rxni = ni;
> +	lnet_ni_addref_locked(ni, cpt);
>  
>  	if (lnet_isrouter(msg->msg_rxpeer)) {
>  		lnet_peer_set_alive(msg->msg_rxpeer);
> @@ -1934,6 +1950,7 @@ lnet_recv_delayed_msg_list(struct list_head *head)
>  		LASSERT(msg->msg_rx_delayed);
>  		LASSERT(msg->msg_md);
>  		LASSERT(msg->msg_rxpeer);
> +		LASSERT(msg->msg_rxni);
>  		LASSERT(msg->msg_hdr.type == LNET_MSG_PUT);
>  
>  		CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info NeilBrown
  2018-09-10 23:25   ` Doug Oucharek
@ 2018-09-11  1:01   ` James Simmons
  2018-09-11  1:01   ` [lustre-devel] BRe: " James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-11  1:01 UTC (permalink / raw)
  To: lustre-devel


> This seems to be a more direct way to get the cpt
> needed in lnet_mt_of_match().

Talking to Doug a better comment would be:

This allows LNet to handle a change to a different CPT if the peer
changes. The NID we are sending to can change based on multirail
behavior.
 
I CC Olaf as well for his comment. Code wise its good.

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.

> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    1 +
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |    1 +
>  drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 255c6c4bbb89..2d2c066a11ba 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -511,6 +511,7 @@ enum lnet_match_flags {
>  struct lnet_match_info {
>  	__u64			mi_mbits;
>  	struct lnet_process_id	mi_id;
> +	unsigned int		mi_cpt;
>  	unsigned int		mi_opc;
>  	unsigned int		mi_portal;
>  	unsigned int		mi_rlength;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index b8b15f56a275..b6e81a693fc3 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -1303,6 +1303,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
>  	info.mi_rlength	= hdr->payload_length;
>  	info.mi_roffset	= hdr->msg.put.offset;
>  	info.mi_mbits	= hdr->msg.put.match_bits;
> +	info.mi_cpt	= msg->msg_rxpeer->lp_cpt;
>  
>  	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
>  	ready_delay = msg->msg_rx_ready_delay;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index 4c5737083422..90ce51801726 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -292,7 +292,7 @@ lnet_mt_of_match(struct lnet_match_info *info, struct lnet_msg *msg)
>  
>  	rotor = ptl->ptl_rotor++; /* get round-robin factor */
>  	if (portal_rotor == LNET_PTL_ROTOR_HASH_RT && routed)
> -		cpt = lnet_cpt_of_nid(msg->msg_hdr.src_nid);
> +		cpt = info->mi_cpt;
>  	else
>  		cpt = rotor % LNET_CPT_NUMBER;
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] BRe: [PATCH 08/34] lnet: add cpt to lnet_match_info.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info NeilBrown
  2018-09-10 23:25   ` Doug Oucharek
  2018-09-11  1:01   ` James Simmons
@ 2018-09-11  1:01   ` James Simmons
  2 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-11  1:01 UTC (permalink / raw)
  To: lustre-devel


> This seems to be a more direct way to get the cpt
> needed in lnet_mt_of_match().

Talking to Doug a better comment would be:

This allows LNet to handle a change to a different CPT if the peer
changes. The NID we are sending to can change based on multirail
behavior.
 
I CC Olaf as well for his comment. Code wise its good.

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.

> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    1 +
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |    1 +
>  drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 255c6c4bbb89..2d2c066a11ba 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -511,6 +511,7 @@ enum lnet_match_flags {
>  struct lnet_match_info {
>  	__u64			mi_mbits;
>  	struct lnet_process_id	mi_id;
> +	unsigned int		mi_cpt;
>  	unsigned int		mi_opc;
>  	unsigned int		mi_portal;
>  	unsigned int		mi_rlength;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index b8b15f56a275..b6e81a693fc3 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -1303,6 +1303,7 @@ lnet_parse_put(struct lnet_ni *ni, struct lnet_msg *msg)
>  	info.mi_rlength	= hdr->payload_length;
>  	info.mi_roffset	= hdr->msg.put.offset;
>  	info.mi_mbits	= hdr->msg.put.match_bits;
> +	info.mi_cpt	= msg->msg_rxpeer->lp_cpt;
>  
>  	msg->msg_rx_ready_delay = !ni->ni_net->net_lnd->lnd_eager_recv;
>  	ready_delay = msg->msg_rx_ready_delay;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index 4c5737083422..90ce51801726 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -292,7 +292,7 @@ lnet_mt_of_match(struct lnet_match_info *info, struct lnet_msg *msg)
>  
>  	rotor = ptl->ptl_rotor++; /* get round-robin factor */
>  	if (portal_rotor == LNET_PTL_ROTOR_HASH_RT && routed)
> -		cpt = lnet_cpt_of_nid(msg->msg_hdr.src_nid);
> +		cpt = info->mi_cpt;
>  	else
>  		cpt = rotor % LNET_CPT_NUMBER;
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net NeilBrown
  2018-09-10 23:28   ` Doug Oucharek
@ 2018-09-11  1:02   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-11  1:02 UTC (permalink / raw)
  To: lustre-devel


> struct lnet_net now has a list of cpts, which is the union
> of the cpts for each lnet_ni.

Reviewed-by: James Simmons <jsimmons@infradead.org>
The below needs fixing based on response to cover letter.
 
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    6 +
>  drivers/staging/lustre/lnet/lnet/config.c          |  164 ++++++++++++++++++++
>  2 files changed, 170 insertions(+)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 2d2c066a11ba..22957d142cc0 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -266,6 +266,12 @@ struct lnet_net {
>  	 * lnet/include/lnet/nidstr.h */
>  	__u32			net_id;
>  
> +	/* total number of CPTs in the array */
> +	__u32			net_ncpts;
> +
> +	/* cumulative CPTs of all NIs in this net */
> +	__u32			*net_cpts;
> +
>  	/* network tunables */
>  	struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
> index e83bdbec11e3..380a3fb1caba 100644
> --- a/drivers/staging/lustre/lnet/lnet/config.c
> +++ b/drivers/staging/lustre/lnet/lnet/config.c
> @@ -91,11 +91,169 @@ lnet_net_unique(__u32 net, struct list_head *netlist)
>  	return true;
>  }
>  
> +static bool
> +in_array(__u32 *array, __u32 size, __u32 value)
> +{
> +	int i;
> +
> +	for (i = 0; i < size; i++) {
> +		if (array[i] == value)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +static int
> +lnet_net_append_cpts(__u32 *cpts, __u32 ncpts, struct lnet_net *net)
> +{
> +	__u32 *added_cpts = NULL;
> +	int i, j = 0, rc = 0;
> +
> +	/*
> +	 * no need to go futher since a subset of the NIs already exist on
> +	 * all CPTs
> +	 */
> +	if (net->net_ncpts == LNET_CPT_NUMBER)
> +		return 0;
> +
> +	if (cpts == NULL) {
> +		/* there is an NI which will exist on all CPTs */
> +		if (net->net_cpts != NULL)
> +			kvfree(net->net_cpts);
> +		net->net_cpts = NULL;
> +		net->net_ncpts = LNET_CPT_NUMBER;
> +		return 0;
> +	}
> +
> +	if (net->net_cpts == NULL) {
> +		net->net_cpts = kmalloc_array(ncpts, sizeof(net->net_cpts),
> +					      GFP_KERNEL);
> +		if (net->net_cpts == NULL)
> +			return -ENOMEM;
> +		memcpy(net->net_cpts, cpts, ncpts);
> +		return 0;
> +	}
> +
> +	added_cpts = kmalloc_array(LNET_CPT_NUMBER, sizeof(*added_cpts),
> +				   GFP_KERNEL);
> +	if (added_cpts == NULL)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < ncpts; i++) {
> +		if (!in_array(net->net_cpts, net->net_ncpts, cpts[i])) {
> +			added_cpts[j] = cpts[i];
> +			j++;
> +		}
> +	}
> +
> +	/* append the new cpts if any to the list of cpts in the net */
> +	if (j > 0) {
> +		__u32 *array = NULL, *loc;
> +		__u32 total_entries = j + net->net_ncpts;
> +
> +		array = kmalloc_array(total_entries, sizeof(*net->net_cpts),
> +				      GFP_KERNEL);
> +		if (array == NULL) {
> +			rc = -ENOMEM;
> +			goto failed;
> +		}
> +
> +		memcpy(array, net->net_cpts,
> +		       net->net_ncpts * sizeof(*net->net_cpts));
> +		loc = array + net->net_ncpts;
> +		memcpy(loc, added_cpts, j * sizeof(*net->net_cpts));
> +
> +		kfree(net->net_cpts);
> +		net->net_ncpts = total_entries;
> +		net->net_cpts = array;
> +	}
> +
> +failed:
> +	kfree(added_cpts);
> +
> +	return rc;
> +}
> +
> +static void
> +lnet_net_remove_cpts(__u32 *cpts, __u32 ncpts, struct lnet_net *net)
> +{
> +	struct lnet_ni *ni;
> +	int rc;
> +
> +	/*
> +	 * Operation Assumption:
> +	 *	This function is called after an NI has been removed from
> +	 *	its parent net.
> +	 *
> +	 * if we're removing an NI which exists on all CPTs then
> +	 * we have to check if any of the other NIs on this net also
> +	 * exists on all CPTs. If none, then we need to build our Net CPT
> +	 * list based on the remaining NIs.
> +	 *
> +	 * If the NI being removed exist on a subset of the CPTs then we
> +	 * alo rebuild the Net CPT list based on the remaining NIs, which
> +	 * should resutl in the expected Net CPT list.
> +	 */
> +
> +	/*
> +	 * sometimes this function can be called due to some failure
> +	 * creating an NI, before any of the cpts are allocated, so check
> +	 * for that case and don't do anything
> +	 */
> +	if (ncpts == 0)
> +		return;
> +
> +	if (ncpts == LNET_CPT_NUMBER) {
> +		/*
> +		 * first iteration through the NI list in the net to see
> +		 * if any of the NIs exist on all the CPTs. If one is
> +		 * found then our job is done.
> +		 */
> +		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +			if (ni->ni_ncpts == LNET_CPT_NUMBER)
> +				return;
> +		}
> +	}
> +
> +	/*
> +	 * Rebuild the Net CPT list again, thereby only including only the
> +	 * CPTs which the remaining NIs are associated with.
> +	 */
> +	if (net->net_cpts != NULL) {
> +		kfree(net->net_cpts);
> +		net->net_cpts = NULL;
> +	}
> +
> +	list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> +		rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts,
> +					  net);
> +		if (rc != 0) {
> +			CERROR("Out of Memory\n");
> +			/*
> +			 * do our best to keep on going. Delete
> +			 * the net cpts and set it to NULL. This
> +			 * way we can keep on going but less
> +			 * efficiently, since memory accesses might be
> +			 * accross CPT lines.
> +			 */
> +			if (net->net_cpts != NULL) {
> +				kfree(net->net_cpts);
> +				net->net_cpts = NULL;
> +				net->net_ncpts = LNET_CPT_NUMBER;
> +			}
> +			return;
> +		}
> +	}
> +}
> +
>  void
>  lnet_ni_free(struct lnet_ni *ni)
>  {
>  	int i;
>  
> +	lnet_net_remove_cpts(ni->ni_cpts, ni->ni_ncpts, ni->ni_net);
> +
>  	if (ni->ni_refs)
>  		cfs_percpt_free(ni->ni_refs);
>  
> @@ -128,6 +286,9 @@ lnet_net_free(struct lnet_net *net)
>  		lnet_ni_free(ni);
>  	}
>  
> +	if (net->net_cpts != NULL)
> +		kfree(net->net_cpts);
> +
>  	kfree(net);
>  }
>  
> @@ -229,6 +390,9 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
>  		ni->ni_net_ns = NULL;
>  
>  	ni->ni_last_alive = ktime_get_real_seconds();
> +	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
> +	if (rc != 0)
> +		goto failed;
>  	list_add_tail(&ni->ni_netlist, &net->net_ni_list);
>  
>  	return ni;
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid()
  2018-09-07  0:49 ` [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid() NeilBrown
  2018-09-10 23:32   ` Doug Oucharek
@ 2018-09-11  1:03   ` James Simmons
  1 sibling, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-11  1:03 UTC (permalink / raw)
  To: lustre-devel


> When choosing a cpt to use for a given network (identified by nid),
> the choice might depend on a particular interface which has
> already been identified - different interfaces can have different
> sets of cpts.
> 
> So add an 'ni' arg to lnet_cpt_of_nid(). If given, choose a cpt
> from the cpts of that interface. If not given, choose one from
> the set of all cpts associated with any interface on the network.

Reviewed-by: James Simmons <jsimmons@infradead.org>

The below needs fixing based on response to cover letter.

> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-lnet.h   |    4 +-
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |    4 +-
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 -
>  .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |    4 +-
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |   41 ++++++++++++--------
>  drivers/staging/lustre/lnet/lnet/lib-move.c        |   12 +++---
>  drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    2 -
>  drivers/staging/lustre/lnet/lnet/peer.c            |    4 +-
>  drivers/staging/lustre/lnet/lnet/router.c          |    4 +-
>  drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 -
>  drivers/staging/lustre/lnet/selftest/framework.c   |    3 +
>  drivers/staging/lustre/lnet/selftest/selftest.h    |    2 -
>  12 files changed, 48 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> index 34509e52bac7..e32dbb854d80 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> @@ -395,8 +395,8 @@ lnet_net2rnethash(__u32 net)
>  extern struct lnet_lnd the_lolnd;
>  extern int avoid_asym_router_failure;
>  
> -int lnet_cpt_of_nid_locked(lnet_nid_t nid);
> -int lnet_cpt_of_nid(lnet_nid_t nid);
> +int lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni);
> +int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
>  struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
>  struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
>  struct lnet_ni *lnet_net2ni(__u32 net);
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index ade566d20c69..958ac9a99045 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -320,7 +320,7 @@ int kiblnd_create_peer(struct lnet_ni *ni, struct kib_peer **peerp,
>  {
>  	struct kib_peer *peer;
>  	struct kib_net *net = ni->ni_data;
> -	int cpt = lnet_cpt_of_nid(nid);
> +	int cpt = lnet_cpt_of_nid(nid, ni);
>  	unsigned long flags;
>  
>  	LASSERT(net);
> @@ -643,7 +643,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer *peer, struct rdma_cm_id *cm
>  
>  	dev = net->ibn_dev;
>  
> -	cpt = lnet_cpt_of_nid(peer->ibp_nid);
> +	cpt = lnet_cpt_of_nid(peer->ibp_nid, peer->ibp_ni);
>  	sched = kiblnd_data.kib_scheds[cpt];
>  
>  	LASSERT(sched->ibs_nthreads > 0);
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> index c266940cb2ae..e64c14914924 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> @@ -119,7 +119,7 @@ kiblnd_get_idle_tx(struct lnet_ni *ni, lnet_nid_t target)
>  	struct kib_tx *tx;
>  	struct kib_tx_poolset *tps;
>  
> -	tps = net->ibn_tx_ps[lnet_cpt_of_nid(target)];
> +	tps = net->ibn_tx_ps[lnet_cpt_of_nid(target, ni)];
>  	node = kiblnd_pool_alloc_node(&tps->tps_poolset);
>  	if (!node)
>  		return NULL;
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> index 2036a0ae5917..ba68bcee90bc 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
> @@ -101,7 +101,7 @@ static int
>  ksocknal_create_peer(struct ksock_peer **peerp, struct lnet_ni *ni,
>  		     struct lnet_process_id id)
>  {
> -	int cpt = lnet_cpt_of_nid(id.nid);
> +	int cpt = lnet_cpt_of_nid(id.nid, ni);
>  	struct ksock_net *net = ni->ni_data;
>  	struct ksock_peer *peer;
>  
> @@ -1099,7 +1099,7 @@ ksocknal_create_conn(struct lnet_ni *ni, struct ksock_route *route,
>  	LASSERT(conn->ksnc_proto);
>  	LASSERT(peerid.nid != LNET_NID_ANY);
>  
> -	cpt = lnet_cpt_of_nid(peerid.nid);
> +	cpt = lnet_cpt_of_nid(peerid.nid, ni);
>  
>  	if (active) {
>  		ksocknal_peer_addref(peer);
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index c21aef32cdde..6e0b8310574d 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -713,31 +713,41 @@ lnet_nid_cpt_hash(lnet_nid_t nid, unsigned int number)
>  }
>  
>  int
> -lnet_cpt_of_nid_locked(lnet_nid_t nid)
> +lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni)
>  {
> -	struct lnet_ni *ni;
> +	struct lnet_net *net;
>  
>  	/* must called with hold of lnet_net_lock */
>  	if (LNET_CPT_NUMBER == 1)
>  		return 0; /* the only one */
>  
> -	/* take lnet_net_lock(any) would be OK */
> -	if (!list_empty(&the_lnet.ln_nis_cpt)) {
> -		list_for_each_entry(ni, &the_lnet.ln_nis_cpt, ni_cptlist) {
> -			if (LNET_NIDNET(ni->ni_nid) != LNET_NIDNET(nid))
> -				continue;
> +	/*
> +	 * If NI is provided then use the CPT identified in the NI cpt
> +	 * list if one exists. If one doesn't exist, then that NI is
> +	 * associated with all CPTs and it follows that the net it belongs
> +	 * to is implicitly associated with all CPTs, so just hash the nid
> +	 * and return that.
> +	 */
> +	if (ni != NULL) {
> +		if (ni->ni_cpts != NULL)
> +			return ni->ni_cpts[lnet_nid_cpt_hash(nid,
> +							     ni->ni_ncpts)];
> +		else
> +			return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
> +	}
>  
> -			LASSERT(ni->ni_cpts);
> -			return ni->ni_cpts[lnet_nid_cpt_hash
> -					   (nid, ni->ni_ncpts)];
> -		}
> +	/* no NI provided so look at the net */
> +	net = lnet_get_net_locked(LNET_NIDNET(nid));
> +
> +	if (net != NULL && net->net_cpts) {
> +		return net->net_cpts[lnet_nid_cpt_hash(nid, net->net_ncpts)];
>  	}
>  
>  	return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
>  }
>  
>  int
> -lnet_cpt_of_nid(lnet_nid_t nid)
> +lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni)
>  {
>  	int cpt;
>  	int cpt2;
> @@ -745,11 +755,10 @@ lnet_cpt_of_nid(lnet_nid_t nid)
>  	if (LNET_CPT_NUMBER == 1)
>  		return 0; /* the only one */
>  
> -	if (list_empty(&the_lnet.ln_nis_cpt))
> -		return lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER);
> -
>  	cpt = lnet_net_lock_current();
> -	cpt2 = lnet_cpt_of_nid_locked(nid);
> +
> +	cpt2 = lnet_cpt_of_nid_locked(nid, ni);
> +
>  	lnet_net_unlock(cpt);
>  
>  	return cpt2;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index b6e81a693fc3..02cd1a5a466f 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -1095,7 +1095,9 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>  	msg->msg_sending = 1;
>  
>  	LASSERT(!msg->msg_tx_committed);
> -	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid);
> +	local_ni = lnet_net2ni(LNET_NIDNET(dst_nid));
> +	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid,
> +			      local_ni);
>   again:
>  	lnet_net_lock(cpt);
>  
> @@ -1188,7 +1190,7 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>  		 * was changed when we release the lock
>  		 */
>  		if (rtr_nid != lp->lp_nid) {
> -			cpt2 = lnet_cpt_of_nid_locked(lp->lp_nid);
> +			cpt2 = lp->lp_cpt;
>  			if (cpt2 != cpt) {
>  				if (src_ni)
>  					lnet_ni_decref_locked(src_ni, cpt);
> @@ -1677,7 +1679,7 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid,
>  	payload_length = le32_to_cpu(hdr->payload_length);
>  
>  	for_me = (ni->ni_nid == dest_nid);
> -	cpt = lnet_cpt_of_nid(from_nid);
> +	cpt = lnet_cpt_of_nid(from_nid, ni);
>  
>  	switch (type) {
>  	case LNET_MSG_ACK:
> @@ -2149,7 +2151,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg)
>  	lnet_msg_attach_md(msg, getmd, getmd->md_offset, getmd->md_length);
>  	lnet_res_unlock(cpt);
>  
> -	cpt = lnet_cpt_of_nid(peer_id.nid);
> +	cpt = lnet_cpt_of_nid(peer_id.nid, ni);
>  
>  	lnet_net_lock(cpt);
>  	lnet_msg_commit(msg, cpt);
> @@ -2160,7 +2162,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg)
>  	return msg;
>  
>   drop:
> -	cpt = lnet_cpt_of_nid(peer_id.nid);
> +	cpt = lnet_cpt_of_nid(peer_id.nid, ni);
>  
>  	lnet_net_lock(cpt);
>  	the_lnet.ln_counters[cpt]->drop_count++;
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index 90ce51801726..c8d8162cc706 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -220,7 +220,7 @@ lnet_match2mt(struct lnet_portal *ptl, struct lnet_process_id id, __u64 mbits)
>  
>  	/* if it's a unique portal, return match-table hashed by NID */
>  	return lnet_ptl_is_unique(ptl) ?
> -	       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid)] : NULL;
> +	       ptl->ptl_mtables[lnet_cpt_of_nid(id.nid, NULL)] : NULL;
>  }
>  
>  struct lnet_match_table *
> diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c
> index ed29124ebded..808ce25f1f00 100644
> --- a/drivers/staging/lustre/lnet/lnet/peer.c
> +++ b/drivers/staging/lustre/lnet/lnet/peer.c
> @@ -270,7 +270,7 @@ lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
>  		return -ESHUTDOWN;
>  
>  	/* cpt can be LNET_LOCK_EX if it's called from router functions */
> -	cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid);
> +	cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid, NULL);
>  
>  	ptable = the_lnet.ln_peer_tables[cpt2];
>  	lp = lnet_find_peer_locked(ptable, nid);
> @@ -362,7 +362,7 @@ lnet_debug_peer(lnet_nid_t nid)
>  	int rc;
>  	int cpt;
>  
> -	cpt = lnet_cpt_of_nid(nid);
> +	cpt = lnet_cpt_of_nid(nid, NULL);
>  	lnet_net_lock(cpt);
>  
>  	rc = lnet_nid2peer_locked(&lp, nid, cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 72b8ca2b0fc6..5493d13de6d9 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -1207,7 +1207,7 @@ lnet_router_checker(void *arg)
>  		version = the_lnet.ln_routers_version;
>  
>  		list_for_each_entry(rtr, &the_lnet.ln_routers, lp_rtr_list) {
> -			cpt2 = lnet_cpt_of_nid_locked(rtr->lp_nid);
> +			cpt2 = rtr->lp_cpt;
>  			if (cpt != cpt2) {
>  				lnet_net_unlock(cpt);
>  				cpt = cpt2;
> @@ -1693,7 +1693,7 @@ lnet_notify(struct lnet_ni *ni, lnet_nid_t nid, int alive, time64_t when)
>  {
>  	struct lnet_peer *lp = NULL;
>  	time64_t now = ktime_get_seconds();
> -	int cpt = lnet_cpt_of_nid(nid);
> +	int cpt = lnet_cpt_of_nid(nid, ni);
>  
>  	LASSERT(!in_interrupt());
>  
> diff --git a/drivers/staging/lustre/lnet/selftest/brw_test.c b/drivers/staging/lustre/lnet/selftest/brw_test.c
> index f1ee219bc8f3..e372ff3044c8 100644
> --- a/drivers/staging/lustre/lnet/selftest/brw_test.c
> +++ b/drivers/staging/lustre/lnet/selftest/brw_test.c
> @@ -124,7 +124,7 @@ brw_client_init(struct sfw_test_instance *tsi)
>  		return -EINVAL;
>  
>  	list_for_each_entry(tsu, &tsi->tsi_units, tsu_list) {
> -		bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid),
> +		bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid, NULL),
>  				       off, npg, len, opc == LST_BRW_READ);
>  		if (!bulk) {
>  			brw_client_fini(tsi);
> diff --git a/drivers/staging/lustre/lnet/selftest/framework.c b/drivers/staging/lustre/lnet/selftest/framework.c
> index 944a2a6598fa..a82efc394659 100644
> --- a/drivers/staging/lustre/lnet/selftest/framework.c
> +++ b/drivers/staging/lustre/lnet/selftest/framework.c
> @@ -1013,7 +1013,8 @@ sfw_run_batch(struct sfw_batch *tsb)
>  			tsu->tsu_loop = tsi->tsi_loop;
>  			wi = &tsu->tsu_worker;
>  			swi_init_workitem(wi, sfw_run_test,
> -					  lst_test_wq[lnet_cpt_of_nid(tsu->tsu_dest.nid)]);
> +					  lst_test_wq[lnet_cpt_of_nid(tsu->tsu_dest.nid,
> +							  NULL)]);
>  			swi_schedule_workitem(wi);
>  		}
>  	}
> diff --git a/drivers/staging/lustre/lnet/selftest/selftest.h b/drivers/staging/lustre/lnet/selftest/selftest.h
> index 9dbb0a51d430..edf783af90e8 100644
> --- a/drivers/staging/lustre/lnet/selftest/selftest.h
> +++ b/drivers/staging/lustre/lnet/selftest/selftest.h
> @@ -527,7 +527,7 @@ srpc_init_client_rpc(struct srpc_client_rpc *rpc, struct lnet_process_id peer,
>  
>  	INIT_LIST_HEAD(&rpc->crpc_list);
>  	swi_init_workitem(&rpc->crpc_wi, srpc_send_rpc,
> -			  lst_test_wq[lnet_cpt_of_nid(peer.nid)]);
> +			  lst_test_wq[lnet_cpt_of_nid(peer.nid, NULL)]);
>  	spin_lock_init(&rpc->crpc_lock);
>  	atomic_set(&rpc->crpc_refcount, 1); /* 1 ref for caller */
>  
> 
> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf
  2018-09-07  0:49 ` [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf NeilBrown
@ 2018-09-11 18:31   ` Amir Shehata
  2018-09-12  4:03     ` NeilBrown
  2018-09-12  3:30   ` Doug Oucharek
  1 sibling, 1 reply; 98+ messages in thread
From: Amir Shehata @ 2018-09-11 18:31 UTC (permalink / raw)
  To: lustre-devel

This block logic that was removed from lnet_startup_lndni() is done in the
LND. Each LND has its own defaults. As an example look at
kiblnd_tunables_setup() and ksocknal_startup().
These tunables are LND specific and have different values per LND. So
instead of configuring it in the common LNet function and then it gets
overwritten again the LND. We let the LND take care of initializing to the
default values that they use for that LND, if they haven't already been set
by the user.
Note currently dynamic configuration of these parameters work only for the
o2iblnd. Socklnd and gnilnd appear to not make use of the dynamic ability.
I'll create an LU ticket to add the ability to dynamically set these values
to the socklnd.
The tunables are divided into two parts, a common set of tunables that are
common to all the LND (although each LND could have different default
values), and a specific set of LND tunables which pertain to a specific
LND, again that's only used by the o2iblnd at the moment.

On Thu, 6 Sep 2018 at 18:00, NeilBrown <neilb@suse.com> wrote:

> I don't understand parts of this change.
> Particularly the removal for
>        /* If given some LND tunable parameters, parse those now to
>         * override the values in the NI structure. */
>
> isn't clear to me.
>
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/staging/lustre/lnet/lnet/api-ni.c |   41
> ++++++++---------------------
>  1 file changed, 12 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c
> b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index 6e0b8310574d..53ecfd700db3 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -1240,10 +1240,8 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
>  }
>
>  static int
> -lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data
> *conf)
> +lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
>  {
> -       struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
> -       struct lnet_lnd_tunables *tun = NULL;
>         int rc = -EINVAL;
>         int lnd_type;
>         struct lnet_lnd *lnd;
> @@ -1296,36 +1294,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct
> lnet_ioctl_config_data *conf)
>
>         ni->ni_net->net_lnd = lnd;
>
> -       if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf)) {
> -               lnd_tunables = (struct lnet_ioctl_config_lnd_tunables
> *)conf->cfg_bulk;
> -               tun = &lnd_tunables->lt_tun;
> -       }
> -
>         if (tun) {
>                 memcpy(&ni->ni_lnd_tunables, tun,
>                        sizeof(*tun));
>                 ni->ni_lnd_tunables_set = true;
>         }
>
> -       /*
> -        * If given some LND tunable parameters, parse those now to
> -        * override the values in the NI structure.
> -        */
> -       if (conf) {
> -               if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
> -                       ni->ni_net->net_tunables.lct_peer_rtr_credits =
> -
>  conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
> -               if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
> -                       ni->ni_net->net_tunables.lct_peer_timeout =
> -
>  conf->cfg_config_u.cfg_net.net_peer_timeout;
> -               if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
> -                       ni->ni_net->net_tunables.lct_peer_tx_credits =
> -
>  conf->cfg_config_u.cfg_net.net_peer_tx_credits;
> -               if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
> -                       ni->ni_net->net_tunables.lct_max_tx_credits =
> -
>  conf->cfg_config_u.cfg_net.net_max_tx_credits;
> -       }
> -
>         rc = lnd->lnd_startup(ni);
>
>         mutex_unlock(&the_lnet.ln_lnd_mutex);
> @@ -1861,9 +1835,13 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct
> lnet_ioctl_config_data *conf)
>         struct list_head net_head;
>         struct lnet_remotenet *rnet;
>         int rc;
> +       struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
>
>         INIT_LIST_HEAD(&net_head);
>
> +       if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
> +               lnd_tunables = (struct lnet_ioctl_config_lnd_tunables
> *)conf->cfg_bulk;
> +
>         /* Create a net/ni structures for the network string */
>         rc = lnet_parse_networks(&net_head, nets);
>         if (rc <= 0)
> @@ -1898,9 +1876,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct
> lnet_ioctl_config_data *conf)
>                 goto failed0;
>
>         list_del_init(&net->net_list);
> +       if (lnd_tunables)
> +               memcpy(&net->net_tunables,
> +                      &lnd_tunables->lt_cmn,
> sizeof(lnd_tunables->lt_cmn));
> +
>         ni = list_first_entry(&net->net_ni_list, struct lnet_ni,
> ni_netlist);
> -       rc = lnet_startup_lndni(ni, conf);
> -       if (rc)
> +       rc = lnet_startup_lndni(ni, (lnd_tunables ?
> +                                    &lnd_tunables->lt_tun : NULL));
> +       if (rc < 0)
>                 goto failed1;
>
>         if (ni->ni_net->net_lnd->lnd_accept) {
>
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180911/e199bf2c/attachment.html>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net.
  2018-09-10 23:28   ` Doug Oucharek
@ 2018-09-12  2:16     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  2:16 UTC (permalink / raw)
  To: lustre-devel

On Mon, Sep 10 2018, Doug Oucharek wrote:

> I agree with a comment from James Simmons: __u32 should only be used when the variable is being shared with user space.  We need to start converting all uses of __uXX in LNet to just uXX.  Perhaps that should be a set of future patches once all of MR/DD has landed?
>

That seems reasonable.  I dont't think this series adds significant new
uses of __u32.
Changing them all to u32 should, as you suggest, come later.

> Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/5da806f7/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments.
  2018-09-10 23:17   ` James Simmons
@ 2018-09-12  2:44     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  2:44 UTC (permalink / raw)
  To: lustre-devel

On Tue, Sep 11 2018, James Simmons wrote:

>> This is part of
>> 
>> 8cbb8cd3e771e7f7e0f99cafc19fad32770dc015 LU-7734 lnet: Multi-Rail
>> local NI split
>
> Better commit message would be:
>
> Rework the commonents in lib-types.h to limit the checkpatch
> chatter of being over 80 characters.

Thanks, I added that text (except "commonents") and applied the
same change to a bunch of other comments, so that no other patch moves
comments like this.

NeilBrown


>
> Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
> Reviewed-on: http://review.whamcloud.com/18274
> Reviewed-by: Doug Oucharek <dougso@me.com>
> Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
>
>> ---
>>  .../staging/lustre/include/linux/lnet/lib-types.h  |   38 +++++++++++++++-----
>>  1 file changed, 29 insertions(+), 9 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>> index 6d4106fd9039..078bc97a9ebf 100644
>> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
>> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>> @@ -263,18 +263,38 @@ struct lnet_ni {
>>  	int			  ni_peerrtrcredits;
>>  	/* seconds to consider peer dead */
>>  	int			  ni_peertimeout;
>> -	int			  ni_ncpts;	/* number of CPTs */
>> -	__u32			 *ni_cpts;	/* bond NI on some CPTs */
>> -	lnet_nid_t		  ni_nid;	/* interface's NID */
>> -	void			 *ni_data;	/* instance-specific data */
>> +	/* number of CPTs */
>> +	int			ni_ncpts;
>> +
>> +	/* bond NI on some CPTs */
>> +	__u32			*ni_cpts;
>> +
>> +	/* interface's NID */
>> +	lnet_nid_t		ni_nid;
>> +
>> +	/* instance-specific data */
>> +	void			*ni_data;
>> +
>>  	struct lnet_lnd		 *ni_lnd;	/* procedural interface */
>> -	struct lnet_tx_queue	**ni_tx_queues;	/* percpt TX queues */
>> -	int			**ni_refs;	/* percpt reference count */
>> -	time64_t		  ni_last_alive;/* when I was last alive */
>> -	struct lnet_ni_status	 *ni_status;	/* my health status */
>> +
>> +	/* percpt TX queues */
>> +	struct lnet_tx_queue	**ni_tx_queues;
>> +
>> +	/* percpt reference count */
>> +	int			**ni_refs;
>> +
>> +	/* when I was last alive */
>> +	time64_t		ni_last_alive;
>> +
>> +	/* my health status */
>> +	struct lnet_ni_status	*ni_status;
>> +
>>  	/* per NI LND tunables */
>>  	struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
>> -	/* equivalent interfaces to use */
>> +	/*
>> +	 * equivalent interfaces to use
>> +	 * This is an array because socklnd bonding can still be configured
>> +	 */
>>  	char			 *ni_interfaces[LNET_MAX_INTERFACES];
>>  	/* original net namespace */
>>  	struct net		 *ni_net_ns;
>> 
>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/3ad8a599/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces".
  2018-09-10 23:18   ` Doug Oucharek
@ 2018-09-12  2:48     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  2:48 UTC (permalink / raw)
  To: lustre-devel

On Mon, Sep 10 2018, Doug Oucharek wrote:

> This patch is fine and can land, just one request: Please keep style improvement like how comments look in separate patches from functional changes.  Keeping them separate makes it much easier to review.  Style patches take a different reviewer mindset than functional changes.  The original MR patches mixes these and that made them hard to review too.
>
> Reviewed-by: Doug Oucharek <dougso at me.com<mailto:dougso@me.com>>
>

Thanks.
I agree about style changes.  I was applying the patches
semi-automatically and so got some style changes mixed in with the good
stuff.
I'm gone through the series and sorted most of that out now.
I'll do that with the next series *before* I post it :-)

Thanks,
NeilBrown


> Doug
>
> On Sep 6, 2018, at 5:49 PM, NeilBrown <neilb at suse.com<mailto:neilb@suse.com>> wrote:
>
> We already have "struct lnet_net" separate from "struct lnet_ni",
> but they are currently allocated together and freed together and
> it is assumed that they are 1-to-1.
>
> This patch starts breaking that assumption.  We have separate
> lnet_net_alloc() and lnet_net_free() to alloc/free the new lnet_net,
> though they is currently called only when lnet_ni_alloc/free are
> called.
>
> The netid is now stored in the lnet_net and fetched directly from
> there, rather than extracting it from the net-interface-id ni_nid.
>
> The linkage between these two structures is now richer, lnet_net
> can link to a list of lnet_ni.  lnet_net now has a list of lnet_net,
> so to find all the lnet_ni, we need to walk a list of lists.
> This need to walk a list-of-lists occurs in several places, and new
> helpers like lnet_get_ni_idx_locked() and lnet_get_next_ni_locked are
> introduced.
>
> Previously a list_head was passed to lnet_ni_alloc() for the new
> lnet_ni to be attached to.
> Now a list is passed to lnet_net_alloc() for the net to be attached
> to, and a lnet_net is passed to lnet_ni_alloc() for the ni to attach
> to.
> lnet_ni_alloc() also receives an interface name, but this is currently
> unused.
>
> This is part of
>    8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>       LU-7734 lnet: Multi-Rail local NI split
>
> Signed-off-by: NeilBrown <neilb at suse.com<mailto:neilb@suse.com>>
> ---
> .../staging/lustre/include/linux/lnet/lib-lnet.h   |   15 +
> .../staging/lustre/include/linux/lnet/lib-types.h  |   23 +-
> drivers/staging/lustre/lnet/lnet/acceptor.c        |    2
> drivers/staging/lustre/lnet/lnet/api-ni.c          |  255 ++++++++++++++------
> drivers/staging/lustre/lnet/lnet/config.c          |  135 +++++++----
> drivers/staging/lustre/lnet/lnet/lib-move.c        |    6
> drivers/staging/lustre/lnet/lnet/router.c          |   15 -
> drivers/staging/lustre/lnet/lnet/router_proc.c     |   16 -
> 8 files changed, 308 insertions(+), 159 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> index 0fecf0d32c58..4440b87299c4 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
> @@ -369,8 +369,14 @@ lnet_ni_decref(struct lnet_ni *ni)
> }
>
> void lnet_ni_free(struct lnet_ni *ni);
> +void lnet_net_free(struct lnet_net *net);
> +
> +struct lnet_net *
> +lnet_net_alloc(__u32 net_type, struct list_head *netlist);
> +
> struct lnet_ni *
> -lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
> +lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el,
> +      char *iface);
>
> static inline int
> lnet_nid2peerhash(lnet_nid_t nid)
> @@ -412,6 +418,9 @@ void lnet_destroy_routes(void);
> int lnet_get_route(int idx, __u32 *net, __u32 *hops,
>   lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
> int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
> +struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet,
> + struct lnet_ni *prev);
> +struct lnet_ni *lnet_get_ni_idx_locked(int idx);
>
> void lnet_router_debugfs_init(void);
> void lnet_router_debugfs_fini(void);
> @@ -584,7 +593,7 @@ int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
> __u32 local_ip, __u32 peer_ip, int peer_port);
> void lnet_connect_console_error(int rc, lnet_nid_t peer_nid,
> __u32 peer_ip, int port);
> -int lnet_count_acceptor_nis(void);
> +int lnet_count_acceptor_nets(void);
> int lnet_acceptor_timeout(void);
> int lnet_acceptor_port(void);
>
> @@ -618,7 +627,7 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
> int lnet_parse_ip2nets(char **networksp, char *ip2nets);
> int lnet_parse_routes(char *route_str, int *im_a_router);
> int lnet_parse_networks(struct list_head *nilist, char *networks);
> -int lnet_net_unique(__u32 net, struct list_head *nilist);
> +bool lnet_net_unique(__u32 net, struct list_head *nilist);
>
> int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
> struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index c5e3363de727..5f0d4703bf86 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -254,6 +254,15 @@ struct lnet_tx_queue {
> };
>
> struct lnet_net {
> + /* chain on the ln_nets */
> + struct list_head net_list;
> +
> + /* net ID, which is compoed of
> + * (net_type << 16) | net_num.
> + * net_type can be one of the enumarated types defined in
> + * lnet/include/lnet/nidstr.h */
> + __u32 net_id;
> +
> /* network tunables */
> struct lnet_ioctl_config_lnd_cmn_tunables net_tunables;
>
> @@ -264,11 +273,13 @@ struct lnet_net {
> bool  net_tunables_set;
> /* procedural interface */
> struct lnet_lnd *net_lnd;
> + /* list of NIs on this net */
> + struct list_head net_ni_list;
> };
>
> struct lnet_ni {
> - /* chain on ln_nis */
> - struct list_head  ni_list;
> + /* chain on the lnet_net structure */
> + struct list_head  ni_netlist;
> /* chain on ln_nis_cpt */
> struct list_head ni_cptlist;
>
> @@ -626,14 +637,16 @@ struct lnet {
> /* failure simulation */
> struct list_head  ln_test_peers;
> struct list_head  ln_drop_rules;
> - struct list_head  ln_delay_rules;
> + struct list_head ln_delay_rules;
>
> - struct list_head  ln_nis; /* LND instances */
> + /* LND instances */
> + struct list_head ln_nets;
> /* NIs bond on specific CPT(s) */
> struct list_head  ln_nis_cpt;
> /* dying LND instances */
> struct list_head  ln_nis_zombie;
> - struct lnet_ni *ln_loni; /* the loopback NI */
> + /* the loopback NI */
> + struct lnet_ni *ln_loni;
>
> /* remote networks with routes to them */
> struct list_head *ln_remote_nets_hash;
> diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
> index f8c921f0221c..88b90c1fdbaf 100644
> --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
> +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
> @@ -454,7 +454,7 @@ lnet_acceptor_start(void)
> if (rc <= 0)
> return rc;
>
> - if (!lnet_count_acceptor_nis())  /* not required */
> + if (lnet_count_acceptor_nets() == 0)  /* not required */
> return 0;
>
> task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure,
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index c944fbb155c8..05687278334a 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -537,7 +537,7 @@ lnet_prepare(lnet_pid_t requested_pid)
> the_lnet.ln_pid = requested_pid;
>
> INIT_LIST_HEAD(&the_lnet.ln_test_peers);
> - INIT_LIST_HEAD(&the_lnet.ln_nis);
> + INIT_LIST_HEAD(&the_lnet.ln_nets);
> INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
> INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
> INIT_LIST_HEAD(&the_lnet.ln_routers);
> @@ -616,7 +616,7 @@ lnet_unprepare(void)
>
> LASSERT(!the_lnet.ln_refcount);
> LASSERT(list_empty(&the_lnet.ln_test_peers));
> - LASSERT(list_empty(&the_lnet.ln_nis));
> + LASSERT(list_empty(&the_lnet.ln_nets));
> LASSERT(list_empty(&the_lnet.ln_nis_cpt));
> LASSERT(list_empty(&the_lnet.ln_nis_zombie));
>
> @@ -648,14 +648,17 @@ lnet_unprepare(void)
> }
>
> struct lnet_ni  *
> -lnet_net2ni_locked(__u32 net, int cpt)
> +lnet_net2ni_locked(__u32 net_id, int cpt)
> {
> - struct lnet_ni *ni;
> + struct lnet_ni   *ni;
> + struct lnet_net  *net;
>
> LASSERT(cpt != LNET_LOCK_EX);
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - if (LNET_NIDNET(ni->ni_nid) == net) {
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + if (net->net_id == net_id) {
> + ni = list_entry(net->net_ni_list.next, struct lnet_ni,
> + ni_netlist);
> lnet_ni_addref_locked(ni, cpt);
> return ni;
> }
> @@ -760,14 +763,17 @@ lnet_islocalnet(__u32 net)
> struct lnet_ni  *
> lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
> {
> - struct lnet_ni *ni;
> + struct lnet_net  *net;
> + struct lnet_ni *ni;
>
> LASSERT(cpt != LNET_LOCK_EX);
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - if (ni->ni_nid == nid) {
> - lnet_ni_addref_locked(ni, cpt);
> - return ni;
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> + if (ni->ni_nid == nid) {
> + lnet_ni_addref_locked(ni, cpt);
> + return ni;
> + }
> }
> }
>
> @@ -790,16 +796,18 @@ lnet_islocalnid(lnet_nid_t nid)
> }
>
> int
> -lnet_count_acceptor_nis(void)
> +lnet_count_acceptor_nets(void)
> {
> /* Return the # of NIs that need the acceptor. */
> - int count = 0;
> - struct lnet_ni *ni;
> - int cpt;
> + int count = 0;
> + struct lnet_net  *net;
> + int cpt;
>
> cpt = lnet_net_lock_current();
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - if (ni->ni_net->net_lnd->lnd_accept)
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + /* all socklnd type networks should have the acceptor
> + * thread started */
> + if (net->net_lnd->lnd_accept)
> count++;
> }
>
> @@ -832,13 +840,16 @@ lnet_ping_info_create(int num_ni)
> static inline int
> lnet_get_ni_count(void)
> {
> - struct lnet_ni *ni;
> - int count = 0;
> + struct lnet_ni *ni;
> + struct lnet_net *net;
> + int count = 0;
>
> lnet_net_lock(0);
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list)
> - count++;
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
> + count++;
> + }
>
> lnet_net_unlock(0);
>
> @@ -854,14 +865,17 @@ lnet_ping_info_free(struct lnet_ping_info *pinfo)
> static void
> lnet_ping_info_destroy(void)
> {
> + struct lnet_net *net;
> struct lnet_ni *ni;
>
> lnet_net_lock(LNET_LOCK_EX);
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - lnet_ni_lock(ni);
> - ni->ni_status = NULL;
> - lnet_ni_unlock(ni);
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> + lnet_ni_lock(ni);
> + ni->ni_status = NULL;
> + lnet_ni_unlock(ni);
> + }
> }
>
> lnet_ping_info_free(the_lnet.ln_ping_info);
> @@ -963,24 +977,28 @@ lnet_ping_md_unlink(struct lnet_ping_info *pinfo,
> static void
> lnet_ping_info_install_locked(struct lnet_ping_info *ping_info)
> {
> + int i = 0;
> struct lnet_ni_status *ns;
> struct lnet_ni *ni;
> - int i = 0;
> + struct lnet_net *net;
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - LASSERT(i < ping_info->pi_nnis);
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> + LASSERT(i < ping_info->pi_nnis);
>
> - ns = &ping_info->pi_ni[i];
> + ns = &ping_info->pi_ni[i];
>
> - ns->ns_nid = ni->ni_nid;
> + ns->ns_nid = ni->ni_nid;
>
> - lnet_ni_lock(ni);
> - ns->ns_status = (ni->ni_status) ?
> - ni->ni_status->ns_status : LNET_NI_STATUS_UP;
> - ni->ni_status = ns;
> - lnet_ni_unlock(ni);
> + lnet_ni_lock(ni);
> + ns->ns_status = ni->ni_status ?
> + ni->ni_status->ns_status :
> + LNET_NI_STATUS_UP;
> + ni->ni_status = ns;
> + lnet_ni_unlock(ni);
>
> - i++;
> + i++;
> + }
> }
> }
>
> @@ -1054,9 +1072,9 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
> }
>
> /* move it to zombie list and nobody can find it anymore */
> - LASSERT(!list_empty(&ni->ni_list));
> - list_move(&ni->ni_list, &the_lnet.ln_nis_zombie);
> - lnet_ni_decref_locked(ni, 0); /* drop ln_nis' ref */
> + LASSERT(!list_empty(&ni->ni_netlist));
> + list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
> + lnet_ni_decref_locked(ni, 0);
> }
>
> static void
> @@ -1076,17 +1094,17 @@ lnet_clear_zombies_nis_locked(void)
> int j;
>
> ni = list_entry(the_lnet.ln_nis_zombie.next,
> - struct lnet_ni, ni_list);
> - list_del_init(&ni->ni_list);
> + struct lnet_ni, ni_netlist);
> + list_del_init(&ni->ni_netlist);
> cfs_percpt_for_each(ref, j, ni->ni_refs) {
> if (!*ref)
> continue;
> /* still busy, add it back to zombie list */
> - list_add(&ni->ni_list, &the_lnet.ln_nis_zombie);
> + list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
> break;
> }
>
> - if (!list_empty(&ni->ni_list)) {
> + if (!list_empty(&ni->ni_netlist)) {
> lnet_net_unlock(LNET_LOCK_EX);
> ++i;
> if ((i & (-i)) == i) {
> @@ -1126,6 +1144,7 @@ lnet_shutdown_lndnis(void)
> {
> struct lnet_ni *ni;
> int i;
> + struct lnet_net *net;
>
> /* NB called holding the global mutex */
>
> @@ -1138,10 +1157,14 @@ lnet_shutdown_lndnis(void)
> the_lnet.ln_shutdown = 1; /* flag shutdown */
>
> /* Unlink NIs from the global table */
> - while (!list_empty(&the_lnet.ln_nis)) {
> - ni = list_entry(the_lnet.ln_nis.next,
> - struct lnet_ni, ni_list);
> - lnet_ni_unlink_locked(ni);
> + while (!list_empty(&the_lnet.ln_nets)) {
> + net = list_entry(the_lnet.ln_nets.next,
> + struct lnet_net, net_list);
> + while (!list_empty(&net->net_ni_list)) {
> + ni = list_entry(net->net_ni_list.next,
> + struct lnet_ni, ni_netlist);
> + lnet_ni_unlink_locked(ni);
> + }
> }
>
> /* Drop the cached loopback NI. */
> @@ -1212,7 +1235,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
>
> /* Make sure this new NI is unique. */
> lnet_net_lock(LNET_LOCK_EX);
> - rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nis);
> + rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nets);
> lnet_net_unlock(LNET_LOCK_EX);
> if (!rc) {
> if (lnd_type == LOLND) {
> @@ -1297,7 +1320,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
> lnet_net_lock(LNET_LOCK_EX);
> /* refcount for ln_nis */
> lnet_ni_addref_locked(ni, 0);
> - list_add_tail(&ni->ni_list, &the_lnet.ln_nis);
> + list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
> if (ni->ni_cpts) {
> lnet_ni_addref_locked(ni, 0);
> list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
> @@ -1363,8 +1386,8 @@ lnet_startup_lndnis(struct list_head *nilist)
> int ni_count = 0;
>
> while (!list_empty(nilist)) {
> - ni = list_entry(nilist->next, struct lnet_ni, ni_list);
> - list_del(&ni->ni_list);
> + ni = list_entry(nilist->next, struct lnet_ni, ni_netlist);
> + list_del(&ni->ni_netlist);
> rc = lnet_startup_lndni(ni, NULL);
>
> if (rc < 0)
> @@ -1486,6 +1509,7 @@ LNetNIInit(lnet_pid_t requested_pid)
> struct lnet_ping_info *pinfo;
> struct lnet_handle_md md_handle;
> struct list_head net_head;
> + struct lnet_net *net;
>
> INIT_LIST_HEAD(&net_head);
>
> @@ -1505,8 +1529,15 @@ LNetNIInit(lnet_pid_t requested_pid)
> return rc;
> }
>
> - /* Add in the loopback network */
> - if (!lnet_ni_alloc(LNET_MKNET(LOLND, 0), NULL, &net_head)) {
> + /* create a network for Loopback network */
> + net = lnet_net_alloc(LNET_MKNET(LOLND, 0), &net_head);
> + if (net == NULL) {
> + rc = -ENOMEM;
> + goto err_empty_list;
> + }
> +
> + /* Add in the loopback NI */
> + if (lnet_ni_alloc(net, NULL, NULL) == NULL) {
> rc = -ENOMEM;
> goto err_empty_list;
> }
> @@ -1584,11 +1615,11 @@ LNetNIInit(lnet_pid_t requested_pid)
> LASSERT(rc < 0);
> mutex_unlock(&the_lnet.ln_api_mutex);
> while (!list_empty(&net_head)) {
> - struct lnet_ni *ni;
> + struct lnet_net *net;
>
> - ni = list_entry(net_head.next, struct lnet_ni, ni_list);
> - list_del_init(&ni->ni_list);
> - lnet_ni_free(ni);
> + net = list_entry(net_head.next, struct lnet_net, net_list);
> + list_del_init(&net->net_list);
> + lnet_net_free(net);
> }
> return rc;
> }
> @@ -1714,25 +1745,83 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
> }
> }
>
> +struct lnet_ni *
> +lnet_get_ni_idx_locked(int idx)
> +{
> + struct lnet_ni *ni;
> + struct lnet_net *net;
> +
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> + if (idx-- == 0)
> + return ni;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +struct lnet_ni *
> +lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev)
> +{
> + struct lnet_ni *ni;
> + struct lnet_net *net = mynet;
> +
> + if (prev == NULL) {
> + if (net == NULL)
> + net = list_entry(the_lnet.ln_nets.next, struct lnet_net,
> + net_list);
> + ni = list_entry(net->net_ni_list.next, struct lnet_ni,
> + ni_netlist);
> +
> + return ni;
> + }
> +
> + if (prev->ni_netlist.next == &prev->ni_net->net_ni_list) {
> + /* if you reached the end of the ni list and the net is
> + * specified, then there are no more nis in that net */
> + if (net != NULL)
> + return NULL;
> +
> + /* we reached the end of this net ni list. move to the
> + * next net */
> + if (prev->ni_net->net_list.next == &the_lnet.ln_nets)
> + /* no more nets and no more NIs. */
> + return NULL;
> +
> + /* get the next net */
> + net = list_entry(prev->ni_net->net_list.next, struct lnet_net,
> + net_list);
> + /* get the ni on it */
> + ni = list_entry(net->net_ni_list.next, struct lnet_ni,
> + ni_netlist);
> +
> + return ni;
> + }
> +
> + /* there are more nis left */
> + ni = list_entry(prev->ni_netlist.next, struct lnet_ni, ni_netlist);
> +
> + return ni;
> +}
> +
> static int
> lnet_get_net_config(struct lnet_ioctl_config_data *config)
> {
> struct lnet_ni *ni;
> + int cpt;
> int idx = config->cfg_count;
> - int cpt, i = 0;
> int rc = -ENOENT;
>
> cpt = lnet_net_lock_current();
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - if (i++ != idx)
> - continue;
> + ni = lnet_get_ni_idx_locked(idx);
>
> + if (ni != NULL) {
> + rc = 0;
> lnet_ni_lock(ni);
> lnet_fill_ni_info(ni, config);
> lnet_ni_unlock(ni);
> - rc = 0;
> - break;
> }
>
> lnet_net_unlock(cpt);
> @@ -1745,6 +1834,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> char *nets = conf->cfg_config_u.cfg_net.net_intf;
> struct lnet_ping_info *pinfo;
> struct lnet_handle_md md_handle;
> + struct lnet_net *net;
> struct lnet_ni *ni;
> struct list_head net_head;
> struct lnet_remotenet *rnet;
> @@ -1752,7 +1842,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
>
> INIT_LIST_HEAD(&net_head);
>
> - /* Create a ni structure for the network string */
> + /* Create a net/ni structures for the network string */
> rc = lnet_parse_networks(&net_head, nets);
> if (rc <= 0)
> return !rc ? -EINVAL : rc;
> @@ -1760,14 +1850,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> mutex_lock(&the_lnet.ln_api_mutex);
>
> if (rc > 1) {
> - rc = -EINVAL; /* only add one interface per call */
> + rc = -EINVAL; /* only add one network per call */
> goto failed0;
> }
>
> - ni = list_entry(net_head.next, struct lnet_ni, ni_list);
> + net = list_entry(net_head.next, struct lnet_net, net_list);
>
> lnet_net_lock(LNET_LOCK_EX);
> - rnet = lnet_find_net_locked(LNET_NIDNET(ni->ni_nid));
> + rnet = lnet_find_net_locked(net->net_id);
> lnet_net_unlock(LNET_LOCK_EX);
> /*
> * make sure that the net added doesn't invalidate the current
> @@ -1785,8 +1875,8 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> if (rc)
> goto failed0;
>
> - list_del_init(&ni->ni_list);
> -
> + list_del_init(&net->net_list);
> + ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
> rc = lnet_startup_lndni(ni, conf);
> if (rc)
> goto failed1;
> @@ -1812,9 +1902,9 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
> failed0:
> mutex_unlock(&the_lnet.ln_api_mutex);
> while (!list_empty(&net_head)) {
> - ni = list_entry(net_head.next, struct lnet_ni, ni_list);
> - list_del_init(&ni->ni_list);
> - lnet_ni_free(ni);
> + net = list_entry(net_head.next, struct lnet_net, net_list);
> + list_del_init(&net->net_list);
> + lnet_net_free(net);
> }
> return rc;
> }
> @@ -1849,7 +1939,7 @@ lnet_dyn_del_ni(__u32 net)
>
> lnet_shutdown_lndni(ni);
>
> - if (!lnet_count_acceptor_nis())
> + if (!lnet_count_acceptor_nets())
> lnet_acceptor_stop();
>
> lnet_ping_target_update(pinfo, md_handle);
> @@ -2103,7 +2193,8 @@ EXPORT_SYMBOL(LNetDebugPeer);
> int
> LNetGetId(unsigned int index, struct lnet_process_id *id)
> {
> - struct lnet_ni *ni;
> + struct lnet_ni *ni;
> + struct lnet_net  *net;
> int cpt;
> int rc = -ENOENT;
>
> @@ -2111,14 +2202,16 @@ LNetGetId(unsigned int index, struct lnet_process_id *id)
>
> cpt = lnet_net_lock_current();
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> - if (index--)
> - continue;
> + list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
> + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
> + if (index-- != 0)
> + continue;
>
> - id->nid = ni->ni_nid;
> - id->pid = the_lnet.ln_pid;
> - rc = 0;
> - break;
> + id->nid = ni->ni_nid;
> + id->pid = the_lnet.ln_pid;
> + rc = 0;
> + break;
> + }
> }
>
> lnet_net_unlock(cpt);
> diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
> index 5646feeb433e..e83bdbec11e3 100644
> --- a/drivers/staging/lustre/lnet/lnet/config.c
> +++ b/drivers/staging/lustre/lnet/lnet/config.c
> @@ -78,17 +78,17 @@ lnet_issep(char c)
> }
> }
>
> -int
> -lnet_net_unique(__u32 net, struct list_head *nilist)
> +bool
> +lnet_net_unique(__u32 net, struct list_head *netlist)
> {
> - struct lnet_ni *ni;
> + struct lnet_net *net_l;
>
> - list_for_each_entry(ni, nilist, ni_list) {
> - if (LNET_NIDNET(ni->ni_nid) == net)
> - return 0;
> + list_for_each_entry(net_l, netlist, net_list) {
> + if (net_l->net_id == net)
> + return false;
> }
>
> - return 1;
> + return true;
> }
>
> void
> @@ -112,41 +112,78 @@ lnet_ni_free(struct lnet_ni *ni)
> if (ni->ni_net_ns)
> put_net(ni->ni_net_ns);
>
> - kvfree(ni->ni_net);
> kfree(ni);
> }
>
> -struct lnet_ni *
> -lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
> +void
> +lnet_net_free(struct lnet_net *net)
> {
> - struct lnet_tx_queue *tq;
> + struct list_head *tmp, *tmp2;
> struct lnet_ni *ni;
> - int rc;
> - int i;
> +
> + /* delete any nis which have been started. */
> + list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
> + ni = list_entry(tmp, struct lnet_ni, ni_netlist);
> + list_del_init(&ni->ni_netlist);
> + lnet_ni_free(ni);
> + }
> +
> + kfree(net);
> +}
> +
> +struct lnet_net *
> +lnet_net_alloc(__u32 net_id, struct list_head *net_list)
> +{
> struct lnet_net *net;
>
> - if (!lnet_net_unique(net_id, nilist)) {
> - LCONSOLE_ERROR_MSG(0x111, "Duplicate network specified: %s\n",
> -   libcfs_net2str(net_id));
> + if (!lnet_net_unique(net_id, net_list)) {
> + CERROR("Duplicate net %s. Ignore\n",
> +       libcfs_net2str(net_id));
> return NULL;
> }
>
> - ni = kzalloc(sizeof(*ni), GFP_NOFS);
> net = kzalloc(sizeof(*net), GFP_NOFS);
> - if (!ni || !net) {
> - kfree(ni); kfree(net);
> + if (!net) {
> CERROR("Out of memory creating network %s\n",
>       libcfs_net2str(net_id));
> return NULL;
> }
> +
> + INIT_LIST_HEAD(&net->net_list);
> + INIT_LIST_HEAD(&net->net_ni_list);
> +
> + net->net_id = net_id;
> +
> /* initialize global paramters to undefiend */
> net->net_tunables.lct_peer_timeout = -1;
> net->net_tunables.lct_max_tx_credits = -1;
> net->net_tunables.lct_peer_tx_credits = -1;
> net->net_tunables.lct_peer_rtr_credits = -1;
>
> + list_add_tail(&net->net_list, net_list);
> +
> + return net;
> +}
> +
> +struct lnet_ni *
> +lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
> +{
> + struct lnet_tx_queue *tq;
> + struct lnet_ni *ni;
> + int rc;
> + int i;
> +
> + ni = kzalloc(sizeof(*ni), GFP_KERNEL);
> + if (ni == NULL) {
> + CERROR("Out of memory creating network interface %s%s\n",
> +       libcfs_net2str(net->net_id),
> +       (iface != NULL) ? iface : "");
> + return NULL;
> + }
> +
> spin_lock_init(&ni->ni_lock);
> INIT_LIST_HEAD(&ni->ni_cptlist);
> + INIT_LIST_HEAD(&ni->ni_netlist);
> ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(),
>       sizeof(*ni->ni_refs[0]));
> if (!ni->ni_refs)
> @@ -166,8 +203,9 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
> } else {
> rc = cfs_expr_list_values(el, LNET_CPT_NUMBER, &ni->ni_cpts);
> if (rc <= 0) {
> - CERROR("Failed to set CPTs for NI %s: %d\n",
> -       libcfs_net2str(net_id), rc);
> + CERROR("Failed to set CPTs for NI %s(%s): %d\n",
> +       libcfs_net2str(net->net_id),
> +       (iface != NULL) ? iface : "", rc);
> goto failed;
> }
>
> @@ -182,7 +220,7 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
>
> ni->ni_net = net;
> /* LND will fill in the address part of the NID */
> - ni->ni_nid = LNET_MKNID(net_id, 0);
> + ni->ni_nid = LNET_MKNID(net->net_id, 0);
>
> /* Store net namespace in which current ni is being created */
> if (current->nsproxy->net_ns)
> @@ -191,22 +229,24 @@ lnet_ni_alloc(__u32 net_id, struct cfs_expr_list *el, struct list_head *nilist)
> ni->ni_net_ns = NULL;
>
> ni->ni_last_alive = ktime_get_real_seconds();
> - list_add_tail(&ni->ni_list, nilist);
> + list_add_tail(&ni->ni_netlist, &net->net_ni_list);
> +
> return ni;
> - failed:
> +failed:
> lnet_ni_free(ni);
> return NULL;
> }
>
> int
> -lnet_parse_networks(struct list_head *nilist, char *networks)
> +lnet_parse_networks(struct list_head *netlist, char *networks)
> {
> struct cfs_expr_list *el = NULL;
> char *tokens;
> char *str;
> char *tmp;
> - struct lnet_ni *ni;
> - __u32 net;
> + struct lnet_net *net;
> + struct lnet_ni *ni = NULL;
> + __u32 net_id;
> int nnets = 0;
> struct list_head *temp_node;
>
> @@ -275,18 +315,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>
> if (comma)
> *comma++ = 0;
> - net = libcfs_str2net(strim(str));
> + net_id = libcfs_str2net(strim(str));
>
> - if (net == LNET_NIDNET(LNET_NID_ANY)) {
> + if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
> LCONSOLE_ERROR_MSG(0x113,
>   "Unrecognised network type\n");
> tmp = str;
> goto failed_syntax;
> }
>
> - if (LNET_NETTYP(net) != LOLND && /* LO is implicit */
> -    !lnet_ni_alloc(net, el, nilist))
> - goto failed;
> + if (LNET_NETTYP(net_id) != LOLND) { /* LO is implicit */
> + net = lnet_net_alloc(net_id, netlist);
> + if (!net ||
> +    !lnet_ni_alloc(net, el, NULL))
> + goto failed;
> + }
>
> if (el) {
> cfs_expr_list_free(el);
> @@ -298,14 +341,21 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
> }
>
> *bracket = 0;
> - net = libcfs_str2net(strim(str));
> - if (net == LNET_NIDNET(LNET_NID_ANY)) {
> + net_id = libcfs_str2net(strim(str));
> + if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
> tmp = str;
> goto failed_syntax;
> }
>
> - ni = lnet_ni_alloc(net, el, nilist);
> - if (!ni)
> + /* always allocate a net, since we will eventually add an
> + * interface to it, or we will fail, in which case we'll
> + * just delete it */
> + net = lnet_net_alloc(net_id, netlist);
> + if (IS_ERR_OR_NULL(net))
> + goto failed;
> +
> + ni = lnet_ni_alloc(net, el, NULL);
> + if (IS_ERR_OR_NULL(ni))
> goto failed;
>
> if (el) {
> @@ -337,7 +387,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
> if (niface == LNET_MAX_INTERFACES) {
> LCONSOLE_ERROR_MSG(0x115,
>   "Too many interfaces for net %s\n",
> -   libcfs_net2str(net));
> +   libcfs_net2str(net_id));
> goto failed;
> }
>
> @@ -378,7 +428,7 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
> }
> }
>
> - list_for_each(temp_node, nilist)
> + list_for_each(temp_node, netlist)
> nnets++;
>
> kfree(tokens);
> @@ -387,11 +437,12 @@ lnet_parse_networks(struct list_head *nilist, char *networks)
>  failed_syntax:
> lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
>  failed:
> - while (!list_empty(nilist)) {
> - ni = list_entry(nilist->next, struct lnet_ni, ni_list);
> + /* free the net list and all the nis on each net */
> + while (!list_empty(netlist)) {
> + net = list_entry(netlist->next, struct lnet_net, net_list);
>
> - list_del(&ni->ni_list);
> - lnet_ni_free(ni);
> + list_del_init(&net->net_list);
> + lnet_net_free(net);
> }
>
> if (el)
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
> index 1bf12af87a20..1c874025fa74 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
> @@ -2289,7 +2289,7 @@ EXPORT_SYMBOL(LNetGet);
> int
> LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
> {
> - struct lnet_ni *ni;
> + struct lnet_ni *ni = NULL;
> struct lnet_remotenet *rnet;
> __u32 dstnet = LNET_NIDNET(dstnid);
> int hops;
> @@ -2307,9 +2307,9 @@ LNetDist(lnet_nid_t dstnid, lnet_nid_t *srcnidp, __u32 *orderp)
>
> cpt = lnet_net_lock_current();
>
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> + while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
> if (ni->ni_nid == dstnid) {
> - if (srcnidp)
> + if (srcnidp != NULL)
> *srcnidp = dstnid;
> if (orderp) {
> if (LNET_NETTYP(LNET_NIDNET(dstnid)) == LOLND)
> diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
> index 0c0ec0b27982..135dfe793b0b 100644
> --- a/drivers/staging/lustre/lnet/lnet/router.c
> +++ b/drivers/staging/lustre/lnet/lnet/router.c
> @@ -245,13 +245,10 @@ static void lnet_shuffle_seed(void)
> if (seeded)
> return;
>
> - /*
> - * Nodes with small feet have little entropy
> - * the NID for this node gives the most entropy in the low bits
> - */
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> + /* Nodes with small feet have little entropy
> + * the NID for this node gives the most entropy in the low bits */
> + while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
> __u32 lnd_type, seed;
> -
> lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
> if (lnd_type != LOLND) {
> seed = (LNET_NIDADDR(ni->ni_nid) | lnd_type);
> @@ -807,8 +804,8 @@ lnet_router_ni_update_locked(struct lnet_peer *gw, __u32 net)
> static void
> lnet_update_ni_status_locked(void)
> {
> - struct lnet_ni *ni;
> - time64_t now;
> + struct lnet_ni *ni = NULL;
> + time64_t now;
> time64_t timeout;
>
> LASSERT(the_lnet.ln_routing);
> @@ -817,7 +814,7 @@ lnet_update_ni_status_locked(void)
>  max(live_router_check_interval, dead_router_check_interval);
>
> now = ktime_get_real_seconds();
> - list_for_each_entry(ni, &the_lnet.ln_nis, ni_list) {
> + while ((ni = lnet_get_next_ni_locked(NULL, ni))) {
> if (ni->ni_net->net_lnd->lnd_type == LOLND)
> continue;
>
> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
> index f3ccd6a2b70e..2a366e9a8627 100644
> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
> @@ -641,26 +641,12 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
>      "rtr", "max", "tx", "min");
> LASSERT(tmpstr + tmpsiz - s > 0);
> } else {
> - struct list_head *n;
> struct lnet_ni *ni = NULL;
> int skip = *ppos - 1;
>
> lnet_net_lock(0);
>
> - n = the_lnet.ln_nis.next;
> -
> - while (n != &the_lnet.ln_nis) {
> - struct lnet_ni *a_ni;
> -
> - a_ni = list_entry(n, struct lnet_ni, ni_list);
> - if (!skip) {
> - ni = a_ni;
> - break;
> - }
> -
> - skip--;
> - n = n->next;
> - }
> + ni = lnet_get_ni_idx_locked(skip);
>
> if (ni) {
> struct lnet_tx_queue *tq;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/45e07d47/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni.
  2018-09-10 23:17   ` James Simmons
@ 2018-09-12  2:56     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  2:56 UTC (permalink / raw)
  To: lustre-devel

On Tue, Sep 11 2018, James Simmons wrote:
>> @@ -1164,10 +1164,12 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>>  			/* ENOMEM or shutting down */
>>  			return rc;
>>  		}
>> -		LASSERT(lp->lp_ni == src_ni);
>> +		LASSERT(lp->lp_net == src_ni->ni_net);
>>  	} else {
>>  		/* sending to a remote network */
>> -		lp = lnet_find_route_locked(src_ni, dst_nid, rtr_nid);
>> +		lp = lnet_find_route_locked(src_ni != NULL ?
>> +					    src_ni->ni_net : NULL,
>> +					    dst_nid, rtr_nid);
>>  		if (!lp) {
>>  			if (src_ni)
>>  				lnet_ni_decref_locked(src_ni, cpt);
>> @@ -1203,10 +1205,11 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>>  		       lnet_msgtyp2str(msg->msg_type), msg->msg_len);
>>  
>>  		if (!src_ni) {
>> -			src_ni = lp->lp_ni;
>> +			src_ni = lnet_get_next_ni_locked(lp->lp_net, NULL);
>> +			LASSERT(src_ni != NULL);
>
> Checkpatch will not like the above.

I think checkpatch is sometimes wrong.  However I went through the
series removing all "== NULL" and "!= NULL".


>>  
>> -	lp->lp_ni = lnet_net2ni_locked(LNET_NIDNET(nid), cpt2);
>> -	if (!lp->lp_ni) {
>> -		rc = -EHOSTUNREACH;
>> -		goto out;
>> -	}
>> -
>> -	lp->lp_txcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
>> -	lp->lp_mintxcredits = lp->lp_ni->ni_net->net_tunables.lct_peer_tx_credits;
>> -	lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
>> -	lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
>> +	lp->lp_net = lnet_get_net_locked(LNET_NIDNET(!lp->lp_nid));
>
> This is the single error in your port that broke stuff. The correct code 
> is:
>
> lp->lp_net = lnet_get_net_locked(LNET_NIDNET(lp->lp_nid));
>

Thanks for spotting that!!

>> @@ -952,6 +950,7 @@ lnet_ping_router_locked(struct lnet_peer *rtr)
>>  	struct lnet_rc_data *rcd = NULL;
>>  	time64_t now = ktime_get_seconds();
>>  	time64_t secs;
>> +	struct lnet_ni  *ni;
>
> Another grep from Greg was the spacing in declared variables. As I port
> patches new code removes the spacing. Newer lustre code no long does
> this kind of spacing. Well most of it :-)
>

I went through the series are removed all the stray space in local
variable decls.

Thanks,
NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/0b5bb37a/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf
  2018-09-07  0:49 ` [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf NeilBrown
  2018-09-11 18:31   ` Amir Shehata
@ 2018-09-12  3:30   ` Doug Oucharek
  1 sibling, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:30 UTC (permalink / raw)
  To: lustre-devel

With the suggested commit message from Amir:

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:53 PM, "NeilBrown" <neilb@suse.com> wrote:

    I don't understand parts of this change.
    Particularly the removal for
           /* If given some LND tunable parameters, parse those now to
            * override the values in the NI structure. */
    
    isn't clear to me.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   41 ++++++++---------------------
     1 file changed, 12 insertions(+), 29 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 6e0b8310574d..53ecfd700db3 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1240,10 +1240,8 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
     }
     
     static int
    -lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
    +lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     {
    -	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
    -	struct lnet_lnd_tunables *tun = NULL;
     	int rc = -EINVAL;
     	int lnd_type;
     	struct lnet_lnd *lnd;
    @@ -1296,36 +1294,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data *conf)
     
     	ni->ni_net->net_lnd = lnd;
     
    -	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf)) {
    -		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
    -		tun = &lnd_tunables->lt_tun;
    -	}
    -
     	if (tun) {
     		memcpy(&ni->ni_lnd_tunables, tun,
     		       sizeof(*tun));
     		ni->ni_lnd_tunables_set = true;
     	}
     
    -	/*
    -	 * If given some LND tunable parameters, parse those now to
    -	 * override the values in the NI structure.
    -	 */
    -	if (conf) {
    -		if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
    -			ni->ni_net->net_tunables.lct_peer_rtr_credits =
    -				conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
    -		if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
    -			ni->ni_net->net_tunables.lct_peer_timeout =
    -				conf->cfg_config_u.cfg_net.net_peer_timeout;
    -		if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
    -			ni->ni_net->net_tunables.lct_peer_tx_credits =
    -				conf->cfg_config_u.cfg_net.net_peer_tx_credits;
    -		if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
    -			ni->ni_net->net_tunables.lct_max_tx_credits =
    -				conf->cfg_config_u.cfg_net.net_max_tx_credits;
    -	}
    -
     	rc = lnd->lnd_startup(ni);
     
     	mutex_unlock(&the_lnet.ln_lnd_mutex);
    @@ -1861,9 +1835,13 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     	struct list_head net_head;
     	struct lnet_remotenet *rnet;
     	int rc;
    +	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
     
     	INIT_LIST_HEAD(&net_head);
     
    +	if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
    +		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
    +
     	/* Create a net/ni structures for the network string */
     	rc = lnet_parse_networks(&net_head, nets);
     	if (rc <= 0)
    @@ -1898,9 +1876,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     		goto failed0;
     
     	list_del_init(&net->net_list);
    +	if (lnd_tunables)
    +		memcpy(&net->net_tunables,
    +		       &lnd_tunables->lt_cmn, sizeof(lnd_tunables->lt_cmn));
    +
     	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
    -	rc = lnet_startup_lndni(ni, conf);
    -	if (rc)
    +	rc = lnet_startup_lndni(ni, (lnd_tunables ?
    +				     &lnd_tunables->lt_tun : NULL));
    +	if (rc < 0)
     		goto failed1;
     
     	if (ni->ni_net->net_lnd->lnd_accept) {
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 12/34] lnet: split lnet_startup_lndni
  2018-09-07  0:49 ` [lustre-devel] [PATCH 12/34] lnet: split lnet_startup_lndni NeilBrown
@ 2018-09-12  3:39   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:39 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:53 PM, "NeilBrown" <neilb@suse.com> wrote:

    Split into
      lnet_startup_lndnet
    which starts all nis in a net, and
      lnet_startup_lndni
    which starts an individual ni.
    
    lnet_startup_lndni()  returns 0 on success, or -ve error.
    lnet_startup_lndnis() returned the count of interfaces started.
    
    The new lnet_startup_lndnet() returns the count of started interfaces,
    
    This requires adding lnet_shutdown_lndnet() to handle errors
    in lnet_dyn_add_ni(), which now uses the new lnet_startup_lndnet().
    
    We now drop the ln_lnd_mutex near the end of lnet_startup_lndnet(),
    and re-claim it for each lnet_startup_lndni().
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |  142 +++++++++++++++++++++++------
     1 file changed, 111 insertions(+), 31 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 53ecfd700db3..8afddf11b5e2 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1239,32 +1239,61 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
     	lnet_net_unlock(LNET_LOCK_EX);
     }
     
    +static void
    +lnet_shutdown_lndnet(struct lnet_net *net)
    +{
    +	struct lnet_ni *ni;
    +
    +	lnet_net_lock(LNET_LOCK_EX);
    +
    +	list_del_init(&net->net_list);
    +
    +	while (!list_empty(&net->net_ni_list)) {
    +		ni = list_entry(net->net_ni_list.next,
    +				struct lnet_ni, ni_netlist);
    +		lnet_net_unlock(LNET_LOCK_EX);
    +		lnet_shutdown_lndni(ni);
    +		lnet_net_lock(LNET_LOCK_EX);
    +	}
    +
    +	/*
    +	 * decrement ref count on lnd only when the entire network goes
    +	 * away
    +	 */
    +	net->net_lnd->lnd_refcount--;
    +
    +	lnet_net_unlock(LNET_LOCK_EX);
    +
    +	lnet_net_free(net);
    +}
    +
     static int
    -lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
    +lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun);
    +
    +static int
    +lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     {
    -	int rc = -EINVAL;
    -	int lnd_type;
    -	struct lnet_lnd *lnd;
    -	struct lnet_tx_queue *tq;
    -	int i;
    -	u32 seed;
    +	struct lnet_ni		*ni;
    +	__u32			lnd_type;
    +	struct lnet_lnd		*lnd;
    +	int rc;
     
    -	lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
    +	lnd_type = LNET_NETTYP(net->net_id);
     
     	LASSERT(libcfs_isknown_lnd(lnd_type));
     
     	/* Make sure this new NI is unique. */
     	lnet_net_lock(LNET_LOCK_EX);
    -	rc = lnet_net_unique(LNET_NIDNET(ni->ni_nid), &the_lnet.ln_nets);
    +	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
     	lnet_net_unlock(LNET_LOCK_EX);
     	if (!rc) {
     		if (lnd_type == LOLND) {
    -			lnet_ni_free(ni);
    +			lnet_net_free(net);
     			return 0;
     		}
     
     		CERROR("Net %s is not unique\n",
    -		       libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
    +		       libcfs_net2str(net->net_id));
     		rc = -EEXIST;
     		goto failed0;
     	}
    @@ -1291,8 +1320,32 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     	lnet_net_lock(LNET_LOCK_EX);
     	lnd->lnd_refcount++;
     	lnet_net_unlock(LNET_LOCK_EX);
    +	net->net_lnd = lnd;
    +	mutex_unlock(&the_lnet.ln_lnd_mutex);
     
    -	ni->ni_net->net_lnd = lnd;
    +	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
    +
    +	rc = lnet_startup_lndni(ni, tun);
    +	if (rc < 0)
    +		return rc;
    +	return 1;
    +
    +failed0:
    +	lnet_net_free(net);
    +
    +	return rc;
    +}
    +
    +static int
    +lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
    +{
    +	int			rc = -EINVAL;
    +	struct lnet_tx_queue	*tq;
    +	int			i;
    +	struct lnet_net		*net = ni->ni_net;
    +	u32			seed;
    +
    +	mutex_lock(&the_lnet.ln_lnd_mutex);
     
     	if (tun) {
     		memcpy(&ni->ni_lnd_tunables, tun,
    @@ -1300,15 +1353,15 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     		ni->ni_lnd_tunables_set = true;
     	}
     
    -	rc = lnd->lnd_startup(ni);
    +	rc = net->net_lnd->lnd_startup(ni);
     
     	mutex_unlock(&the_lnet.ln_lnd_mutex);
     
     	if (rc) {
     		LCONSOLE_ERROR_MSG(0x105, "Error %d starting up LNI %s\n",
    -				   rc, libcfs_lnd2str(lnd->lnd_type));
    +				   rc, libcfs_lnd2str(net->net_lnd->lnd_type));
     		lnet_net_lock(LNET_LOCK_EX);
    -		lnd->lnd_refcount--;
    +		net->net_lnd->lnd_refcount--;
     		lnet_net_unlock(LNET_LOCK_EX);
     		goto failed0;
     	}
    @@ -1324,7 +1377,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     
     	lnet_net_unlock(LNET_LOCK_EX);
     
    -	if (lnd->lnd_type == LOLND) {
    +	if (net->net_lnd->lnd_type == LOLND) {
     		lnet_ni_addref(ni);
     		LASSERT(!the_lnet.ln_loni);
     		the_lnet.ln_loni = ni;
    @@ -1338,7 +1391,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     	if (!ni->ni_net->net_tunables.lct_peer_tx_credits ||
     	    !ni->ni_net->net_tunables.lct_max_tx_credits) {
     		LCONSOLE_ERROR_MSG(0x107, "LNI %s has no %scredits\n",
    -				   libcfs_lnd2str(lnd->lnd_type),
    +				   libcfs_lnd2str(net->net_lnd->lnd_type),
     				   !ni->ni_net->net_tunables.lct_peer_tx_credits ?
     				   "" : "per-peer ");
     		/*
    @@ -1375,21 +1428,22 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     }
     
     static int
    -lnet_startup_lndnis(struct list_head *nilist)
    +lnet_startup_lndnets(struct list_head *netlist)
     {
    -	struct lnet_ni *ni;
    +	struct lnet_net *net;
     	int rc;
     	int ni_count = 0;
     
    -	while (!list_empty(nilist)) {
    -		ni = list_entry(nilist->next, struct lnet_ni, ni_netlist);
    -		list_del(&ni->ni_netlist);
    -		rc = lnet_startup_lndni(ni, NULL);
    +	while (!list_empty(netlist)) {
    +		net = list_entry(netlist->next, struct lnet_net, net_list);
    +		list_del_init(&net->net_list);
    +
    +		rc = lnet_startup_lndnet(net, NULL);
     
     		if (rc < 0)
     			goto failed;
     
    -		ni_count++;
    +		ni_count += rc;
     	}
     
     	return ni_count;
    @@ -1552,7 +1606,7 @@ LNetNIInit(lnet_pid_t requested_pid)
     			goto err_empty_list;
     	}
     
    -	ni_count = lnet_startup_lndnis(&net_head);
    +	ni_count = lnet_startup_lndnets(&net_head);
     	if (ni_count < 0) {
     		rc = ni_count;
     		goto err_empty_list;
    @@ -1831,10 +1885,11 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     	struct lnet_ping_info *pinfo;
     	struct lnet_handle_md md_handle;
     	struct lnet_net		*net;
    -	struct lnet_ni *ni;
     	struct list_head net_head;
     	struct lnet_remotenet *rnet;
     	int rc;
    +	int			num_acceptor_nets;
    +	__u32			net_type;
     	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
     
     	INIT_LIST_HEAD(&net_head);
    @@ -1876,22 +1931,47 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     		goto failed0;
     
     	list_del_init(&net->net_list);
    +
     	if (lnd_tunables)
     		memcpy(&net->net_tunables,
     		       &lnd_tunables->lt_cmn, sizeof(lnd_tunables->lt_cmn));
     
    -	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
    -	rc = lnet_startup_lndni(ni, (lnd_tunables ?
    +	/*
    +	 * before starting this network get a count of the current TCP
    +	 * networks which require the acceptor thread running. If that
    +	 * count is == 0 before we start up this network, then we'd want to
    +	 * start up the acceptor thread after starting up this network
    +	 */
    +	num_acceptor_nets = lnet_count_acceptor_nets();
    +
    +	/*
    +	 * lnd_startup_lndnet() can deallocate 'net' even if it it returns
    +	 * success, because we endded up adding interfaces to an existing
    +	 * network. So grab the net_type now
    +	 */
    +	net_type = LNET_NETTYP(net->net_id);
    +
    +	rc = lnet_startup_lndnet(net, (lnd_tunables ?
     				     &lnd_tunables->lt_tun : NULL));
     	if (rc < 0)
     		goto failed1;
     
    -	if (ni->ni_net->net_lnd->lnd_accept) {
    +	/*
    +	 * Start the acceptor thread if this is the first network
    +	 * being added that requires the thread.
    +	 */
    +	if (net_type == SOCKLND && num_acceptor_nets == 0) {
     		rc = lnet_acceptor_start();
     		if (rc < 0) {
    -			/* shutdown the ni that we just started */
    +			/* shutdown the net that we just started */
     			CERROR("Failed to start up acceptor thread\n");
    -			lnet_shutdown_lndni(ni);
    +			/*
    +			 * Note that if we needed to start the acceptor
    +			 * thread, then 'net' must have been the first TCP
    +			 * network, therefore was unique, and therefore
    +			 * wasn't deallocated by lnet_startup_lndnet()
    +			 */
    +			lnet_shutdown_lndnet(net);
     			goto failed1;
     		}
     	}
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 13/34] lnet: reverse order of lnet_startup_lnd{net, ni}
  2018-09-07  0:49 ` [lustre-devel] [PATCH 13/34] lnet: reverse order of lnet_startup_lnd{net, ni} NeilBrown
@ 2018-09-12  3:39   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:39 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:53 PM, "NeilBrown" <neilb@suse.com> wrote:

    Change the order - no other change.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |  135 ++++++++++++++---------------
     1 file changed, 66 insertions(+), 69 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 8afddf11b5e2..09ea7e506128 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1267,75 +1267,6 @@ lnet_shutdown_lndnet(struct lnet_net *net)
     	lnet_net_free(net);
     }
     
    -static int
    -lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun);
    -
    -static int
    -lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
    -{
    -	struct lnet_ni		*ni;
    -	__u32			lnd_type;
    -	struct lnet_lnd		*lnd;
    -	int rc;
    -
    -	lnd_type = LNET_NETTYP(net->net_id);
    -
    -	LASSERT(libcfs_isknown_lnd(lnd_type));
    -
    -	/* Make sure this new NI is unique. */
    -	lnet_net_lock(LNET_LOCK_EX);
    -	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
    -	lnet_net_unlock(LNET_LOCK_EX);
    -	if (!rc) {
    -		if (lnd_type == LOLND) {
    -			lnet_net_free(net);
    -			return 0;
    -		}
    -
    -		CERROR("Net %s is not unique\n",
    -		       libcfs_net2str(net->net_id));
    -		rc = -EEXIST;
    -		goto failed0;
    -	}
    -
    -	mutex_lock(&the_lnet.ln_lnd_mutex);
    -	lnd = lnet_find_lnd_by_type(lnd_type);
    -
    -	if (!lnd) {
    -		mutex_unlock(&the_lnet.ln_lnd_mutex);
    -		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
    -		mutex_lock(&the_lnet.ln_lnd_mutex);
    -
    -		lnd = lnet_find_lnd_by_type(lnd_type);
    -		if (!lnd) {
    -			mutex_unlock(&the_lnet.ln_lnd_mutex);
    -			CERROR("Can't load LND %s, module %s, rc=%d\n",
    -			       libcfs_lnd2str(lnd_type),
    -			       libcfs_lnd2modname(lnd_type), rc);
    -			rc = -EINVAL;
    -			goto failed0;
    -		}
    -	}
    -
    -	lnet_net_lock(LNET_LOCK_EX);
    -	lnd->lnd_refcount++;
    -	lnet_net_unlock(LNET_LOCK_EX);
    -	net->net_lnd = lnd;
    -	mutex_unlock(&the_lnet.ln_lnd_mutex);
    -
    -	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
    -
    -	rc = lnet_startup_lndni(ni, tun);
    -	if (rc < 0)
    -		return rc;
    -	return 1;
    -
    -failed0:
    -	lnet_net_free(net);
    -
    -	return rc;
    -}
    -
     static int
     lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     {
    @@ -1427,6 +1358,72 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     	return rc;
     }
     
    +static int
    +lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
    +{
    +	struct lnet_ni		*ni;
    +	__u32			lnd_type;
    +	struct lnet_lnd		*lnd;
    +	int			rc;
    +
    +	lnd_type = LNET_NETTYP(net->net_id);
    +
    +	LASSERT(libcfs_isknown_lnd(lnd_type));
    +
    +	/* Make sure this new NI is unique. */
    +	lnet_net_lock(LNET_LOCK_EX);
    +	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
    +	lnet_net_unlock(LNET_LOCK_EX);
    +	if (!rc) {
    +		if (lnd_type == LOLND) {
    +			lnet_net_free(net);
    +			return 0;
    +		}
    +
    +		CERROR("Net %s is not unique\n",
    +		       libcfs_net2str(net->net_id));
    +		rc = -EEXIST;
    +		goto failed0;
    +	}
    +
    +	mutex_lock(&the_lnet.ln_lnd_mutex);
    +	lnd = lnet_find_lnd_by_type(lnd_type);
    +
    +	if (!lnd) {
    +		mutex_unlock(&the_lnet.ln_lnd_mutex);
    +		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
    +		mutex_lock(&the_lnet.ln_lnd_mutex);
    +
    +		lnd = lnet_find_lnd_by_type(lnd_type);
    +		if (!lnd) {
    +			mutex_unlock(&the_lnet.ln_lnd_mutex);
    +			CERROR("Can't load LND %s, module %s, rc=%d\n",
    +			       libcfs_lnd2str(lnd_type),
    +			       libcfs_lnd2modname(lnd_type), rc);
    +			rc = -EINVAL;
    +			goto failed0;
    +		}
    +	}
    +
    +	lnet_net_lock(LNET_LOCK_EX);
    +	lnd->lnd_refcount++;
    +	lnet_net_unlock(LNET_LOCK_EX);
    +	net->net_lnd = lnd;
    +	mutex_unlock(&the_lnet.ln_lnd_mutex);
    +
    +	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
    +
    +	rc = lnet_startup_lndni(ni, tun);
    +	if (rc < 0)
    +		return rc;
    +	return 1;
    +
    +failed0:
    +	lnet_net_free(net);
    +
    +	return rc;
    +}
    +
     static int
     lnet_startup_lndnets(struct list_head *netlist)
     {
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 14/34] lnet: rename lnet_find_net_locked to lnet_find_rnet_locked
  2018-09-07  0:49 ` [lustre-devel] [PATCH 14/34] lnet: rename lnet_find_net_locked to lnet_find_rnet_locked NeilBrown
@ 2018-09-12  3:40   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:40 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:53 PM, "NeilBrown" <neilb@suse.com> wrote:

    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-lnet.h   |    2 +-
     drivers/staging/lustre/lnet/lnet/api-ni.c          |    2 +-
     drivers/staging/lustre/lnet/lnet/lib-move.c        |    2 +-
     drivers/staging/lustre/lnet/lnet/router.c          |    4 ++--
     4 files changed, 5 insertions(+), 5 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    index e32dbb854d80..faa3f19dd844 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    @@ -430,7 +430,7 @@ int lnet_rtrpools_adjust(int tiny, int small, int large);
     int lnet_rtrpools_enable(void);
     void lnet_rtrpools_disable(void);
     void lnet_rtrpools_free(int keep_pools);
    -struct lnet_remotenet *lnet_find_net_locked(__u32 net);
    +struct lnet_remotenet *lnet_find_rnet_locked(__u32 net);
     int lnet_dyn_add_ni(lnet_pid_t requested_pid,
     		    struct lnet_ioctl_config_data *conf);
     int lnet_dyn_del_ni(__u32 net);
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 09ea7e506128..c3c568e63342 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1909,7 +1909,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     	net = list_entry(net_head.next, struct lnet_net, net_list);
     
     	lnet_net_lock(LNET_LOCK_EX);
    -	rnet = lnet_find_net_locked(net->net_id);
    +	rnet = lnet_find_rnet_locked(net->net_id);
     	lnet_net_unlock(LNET_LOCK_EX);
     	/*
     	 * make sure that the net added doesn't invalidate the current
    diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
    index 02cd1a5a466f..00a89221c9b3 100644
    --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
    +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
    @@ -1022,7 +1022,7 @@ lnet_find_route_locked(struct lnet_net *net, lnet_nid_t target,
     	 * If @rtr_nid is not LNET_NID_ANY, return the gateway with
     	 * rtr_nid nid, otherwise find the best gateway I can use
     	 */
    -	rnet = lnet_find_net_locked(LNET_NIDNET(target));
    +	rnet = lnet_find_rnet_locked(LNET_NIDNET(target));
     	if (!rnet)
     		return NULL;
     
    diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
    index 5493d13de6d9..1fce991fcb0e 100644
    --- a/drivers/staging/lustre/lnet/lnet/router.c
    +++ b/drivers/staging/lustre/lnet/lnet/router.c
    @@ -220,7 +220,7 @@ lnet_rtr_decref_locked(struct lnet_peer *lp)
     }
     
     struct lnet_remotenet *
    -lnet_find_net_locked(__u32 net)
    +lnet_find_rnet_locked(__u32 net)
     {
     	struct lnet_remotenet *rnet;
     	struct list_head *rn_list;
    @@ -347,7 +347,7 @@ lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway,
     
     	LASSERT(!the_lnet.ln_shutdown);
     
    -	rnet2 = lnet_find_net_locked(net);
    +	rnet2 = lnet_find_rnet_locked(net);
     	if (!rnet2) {
     		/* new network */
     		list_add_tail(&rnet->lrn_list, lnet_net2rnethash(net));
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis
  2018-09-07  0:49 ` [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis NeilBrown
@ 2018-09-12  3:53   ` Doug Oucharek
  2018-09-12  4:10     ` NeilBrown
  0 siblings, 1 reply; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:53 UTC (permalink / raw)
  To: lustre-devel

Which refcount line are you referring to?  The call to lnet_ni_unlink_locked()?

Reviewed-by: Doug Oucharek <dougso@me.com>k

Doug

?On 9/6/18, 5:53 PM, "NeilBrown" <neilb@suse.com> wrote:

    A zombie lnet_ni is now attached to the lnet_net rather than the
    global the_lnet.  The zombie lnet_net are attached to the_lnet.
    
    For some reason, we don't drop the refcount on the lnd before shutting
    it down now.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-types.h  |    9 ++-
     drivers/staging/lustre/lnet/lnet/api-ni.c          |   65 ++++++++++----------
     drivers/staging/lustre/lnet/lnet/config.c          |    3 +
     3 files changed, 42 insertions(+), 35 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    index 22957d142cc0..1d372672e2de 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    @@ -284,6 +284,9 @@ struct lnet_net {
     	struct lnet_lnd		*net_lnd;
     	/* list of NIs on this net */
     	struct list_head	net_ni_list;
    +
    +	/* dying LND instances */
    +	struct list_head	net_ni_zombie;
     };
     
     struct lnet_ni {
    @@ -653,11 +656,11 @@ struct lnet {
     	/* LND instances */
     	struct list_head		ln_nets;
     	/* NIs bond on specific CPT(s) */
    -	struct list_head		  ln_nis_cpt;
    -	/* dying LND instances */
    -	struct list_head		  ln_nis_zombie;
    +	struct list_head		ln_nis_cpt;
     	/* the loopback NI */
     	struct lnet_ni			*ln_loni;
    +	/* network zombie list */
    +	struct list_head		ln_net_zombie;
     
     	/* remote networks with routes to them */
     	struct list_head		 *ln_remote_nets_hash;
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index c3c568e63342..18d111cb826b 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -539,7 +539,6 @@ lnet_prepare(lnet_pid_t requested_pid)
     	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
     	INIT_LIST_HEAD(&the_lnet.ln_nets);
     	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
    -	INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
     	INIT_LIST_HEAD(&the_lnet.ln_routers);
     	INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
     	INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
    @@ -618,7 +617,6 @@ lnet_unprepare(void)
     	LASSERT(list_empty(&the_lnet.ln_test_peers));
     	LASSERT(list_empty(&the_lnet.ln_nets));
     	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
    -	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
     
     	lnet_portals_destroy();
     
    @@ -1095,34 +1093,35 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
     
     	/* move it to zombie list and nobody can find it anymore */
     	LASSERT(!list_empty(&ni->ni_netlist));
    -	list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
    +	list_move(&ni->ni_netlist, &ni->ni_net->net_ni_zombie);
     	lnet_ni_decref_locked(ni, 0);
     }
     
     static void
    -lnet_clear_zombies_nis_locked(void)
    +lnet_clear_zombies_nis_locked(struct lnet_net *net)
     {
     	int i;
     	int islo;
     	struct lnet_ni *ni;
    +	struct list_head *zombie_list = &net->net_ni_zombie;
     
     	/*
    -	 * Now wait for the NI's I just nuked to show up on ln_zombie_nis
    -	 * and shut them down in guaranteed thread context
    +	 * Now wait for the NIs I just nuked to show up on the zombie
    +	 * list and shut them down in guaranteed thread context
     	 */
     	i = 2;
    -	while (!list_empty(&the_lnet.ln_nis_zombie)) {
    +	while (!list_empty(zombie_list)) {
     		int *ref;
     		int j;
     
    -		ni = list_entry(the_lnet.ln_nis_zombie.next,
    +		ni = list_entry(zombie_list->next,
     				struct lnet_ni, ni_netlist);
     		list_del_init(&ni->ni_netlist);
     		cfs_percpt_for_each(ref, j, ni->ni_refs) {
     			if (!*ref)
     				continue;
     			/* still busy, add it back to zombie list */
    -			list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
    +			list_add(&ni->ni_netlist, zombie_list);
     			break;
     		}
     
    @@ -1138,18 +1137,13 @@ lnet_clear_zombies_nis_locked(void)
     			continue;
     		}
     
    -		ni->ni_net->net_lnd->lnd_refcount--;
     		lnet_net_unlock(LNET_LOCK_EX);
     
     		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
     
     		LASSERT(!in_interrupt());
    -		ni->ni_net->net_lnd->lnd_shutdown(ni);
    +		net->net_lnd->lnd_shutdown(ni);
     
    -		/*
    -		 * can't deref lnd anymore now; it might have unregistered
    -		 * itself...
    -		 */
     		if (!islo)
     			CDEBUG(D_LNI, "Removed LNI %s\n",
     			       libcfs_nid2str(ni->ni_nid));
    @@ -1162,9 +1156,11 @@ lnet_clear_zombies_nis_locked(void)
     }
     
     static void
    -lnet_shutdown_lndnis(void)
    +lnet_shutdown_lndnet(struct lnet_net *net);
    +
    +static void
    +lnet_shutdown_lndnets(void)
     {
    -	struct lnet_ni *ni;
     	int i;
     	struct lnet_net *net;
     
    @@ -1173,30 +1169,35 @@ lnet_shutdown_lndnis(void)
     	/* All quiet on the API front */
     	LASSERT(!the_lnet.ln_shutdown);
     	LASSERT(!the_lnet.ln_refcount);
    -	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
     
     	lnet_net_lock(LNET_LOCK_EX);
     	the_lnet.ln_shutdown = 1;	/* flag shutdown */
     
    -	/* Unlink NIs from the global table */
     	while (!list_empty(&the_lnet.ln_nets)) {
    +		/*
    +		 * move the nets to the zombie list to avoid them being
    +		 * picked up for new work. LONET is also included in the
    +		 * Nets that will be moved to the zombie list
    +		 */
     		net = list_entry(the_lnet.ln_nets.next,
     				 struct lnet_net, net_list);
    -		while (!list_empty(&net->net_ni_list)) {
    -			ni = list_entry(net->net_ni_list.next,
    -					struct lnet_ni, ni_netlist);
    -			lnet_ni_unlink_locked(ni);
    -		}
    +		list_move(&net->net_list, &the_lnet.ln_net_zombie);
     	}
     
    -	/* Drop the cached loopback NI. */
    +	/* Drop the cached loopback Net. */
     	if (the_lnet.ln_loni) {
     		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
     		the_lnet.ln_loni = NULL;
     	}
    -
     	lnet_net_unlock(LNET_LOCK_EX);
     
    +	/* iterate through the net zombie list and delete each net */
    +	while (!list_empty(&the_lnet.ln_net_zombie)) {
    +		net = list_entry(the_lnet.ln_net_zombie.next,
    +				 struct lnet_net, net_list);
    +		lnet_shutdown_lndnet(net);
    +	}
    +
     	/*
     	 * Clear lazy portals and drop delayed messages which hold refs
     	 * on their lnet_msg::msg_rxpeer
    @@ -1211,8 +1212,6 @@ lnet_shutdown_lndnis(void)
     	lnet_peer_tables_cleanup(NULL);
     
     	lnet_net_lock(LNET_LOCK_EX);
    -
    -	lnet_clear_zombies_nis_locked();
     	the_lnet.ln_shutdown = 0;
     	lnet_net_unlock(LNET_LOCK_EX);
     }
    @@ -1222,6 +1221,7 @@ static void
     lnet_shutdown_lndni(struct lnet_ni *ni)
     {
     	int i;
    +	struct lnet_net *net = ni->ni_net;
     
     	lnet_net_lock(LNET_LOCK_EX);
     	lnet_ni_unlink_locked(ni);
    @@ -1235,7 +1235,7 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
     	lnet_peer_tables_cleanup(ni);
     
     	lnet_net_lock(LNET_LOCK_EX);
    -	lnet_clear_zombies_nis_locked();
    +	lnet_clear_zombies_nis_locked(net);
     	lnet_net_unlock(LNET_LOCK_EX);
     }
     
    @@ -1445,7 +1445,7 @@ lnet_startup_lndnets(struct list_head *netlist)
     
     	return ni_count;
     failed:
    -	lnet_shutdown_lndnis();
    +	lnet_shutdown_lndnets();
     
     	return rc;
     }
    @@ -1492,6 +1492,7 @@ int lnet_lib_init(void)
     	the_lnet.ln_refcount = 0;
     	LNetInvalidateEQHandle(&the_lnet.ln_rc_eqh);
     	INIT_LIST_HEAD(&the_lnet.ln_lnds);
    +	INIT_LIST_HEAD(&the_lnet.ln_net_zombie);
     	INIT_LIST_HEAD(&the_lnet.ln_rcd_zombie);
     	INIT_LIST_HEAD(&the_lnet.ln_rcd_deathrow);
     
    @@ -1656,7 +1657,7 @@ LNetNIInit(lnet_pid_t requested_pid)
     	if (!the_lnet.ln_nis_from_mod_params)
     		lnet_destroy_routes();
     err_shutdown_lndnis:
    -	lnet_shutdown_lndnis();
    +	lnet_shutdown_lndnets();
     err_empty_list:
     	lnet_unprepare();
     	LASSERT(rc < 0);
    @@ -1703,7 +1704,7 @@ LNetNIFini(void)
     
     		lnet_acceptor_stop();
     		lnet_destroy_routes();
    -		lnet_shutdown_lndnis();
    +		lnet_shutdown_lndnets();
     		lnet_unprepare();
     	}
     
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index 380a3fb1caba..2588d67fea1b 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -279,6 +279,8 @@ lnet_net_free(struct lnet_net *net)
     	struct list_head *tmp, *tmp2;
     	struct lnet_ni *ni;
     
    +	LASSERT(list_empty(&net->net_ni_zombie));
    +
     	/* delete any nis which have been started. */
     	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
     		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
    @@ -312,6 +314,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
     
     	INIT_LIST_HEAD(&net->net_list);
     	INIT_LIST_HEAD(&net->net_ni_list);
    +	INIT_LIST_HEAD(&net->net_ni_zombie);
     
     	net->net_id = net_id;
     
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 17/34] lnet: move lnet_shutdown_lndnets down to after first use
  2018-09-07  0:49 ` [lustre-devel] [PATCH 17/34] lnet: move lnet_shutdown_lndnets down to after first use NeilBrown
@ 2018-09-12  3:55   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:55 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   91 ++++++++++++++---------------
     1 file changed, 44 insertions(+), 47 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 2529a11c6c59..46c5ca71bc07 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1155,53 +1155,6 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
     	}
     }
     
    -static void
    -lnet_shutdown_lndnet(struct lnet_net *net);
    -
    -static void
    -lnet_shutdown_lndnets(void)
    -{
    -	struct lnet_net *net;
    -
    -	/* NB called holding the global mutex */
    -
    -	/* All quiet on the API front */
    -	LASSERT(!the_lnet.ln_shutdown);
    -	LASSERT(!the_lnet.ln_refcount);
    -
    -	lnet_net_lock(LNET_LOCK_EX);
    -	the_lnet.ln_shutdown = 1;	/* flag shutdown */
    -
    -	while (!list_empty(&the_lnet.ln_nets)) {
    -		/*
    -		 * move the nets to the zombie list to avoid them being
    -		 * picked up for new work. LONET is also included in the
    -		 * Nets that will be moved to the zombie list
    -		 */
    -		net = list_entry(the_lnet.ln_nets.next,
    -				 struct lnet_net, net_list);
    -		list_move(&net->net_list, &the_lnet.ln_net_zombie);
    -	}
    -
    -	/* Drop the cached loopback Net. */
    -	if (the_lnet.ln_loni) {
    -		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
    -		the_lnet.ln_loni = NULL;
    -	}
    -	lnet_net_unlock(LNET_LOCK_EX);
    -
    -	/* iterate through the net zombie list and delete each net */
    -	while (!list_empty(&the_lnet.ln_net_zombie)) {
    -		net = list_entry(the_lnet.ln_net_zombie.next,
    -				 struct lnet_net, net_list);
    -		lnet_shutdown_lndnet(net);
    -	}
    -
    -	lnet_net_lock(LNET_LOCK_EX);
    -	the_lnet.ln_shutdown = 0;
    -	lnet_net_unlock(LNET_LOCK_EX);
    -}
    -
     /* shutdown down the NI and release refcount */
     static void
     lnet_shutdown_lndni(struct lnet_ni *ni)
    @@ -1253,6 +1206,50 @@ lnet_shutdown_lndnet(struct lnet_net *net)
     	lnet_net_free(net);
     }
     
    +static void
    +lnet_shutdown_lndnets(void)
    +{
    +	struct lnet_net *net;
    +
    +	/* NB called holding the global mutex */
    +
    +	/* All quiet on the API front */
    +	LASSERT(!the_lnet.ln_shutdown);
    +	LASSERT(!the_lnet.ln_refcount);
    +
    +	lnet_net_lock(LNET_LOCK_EX);
    +	the_lnet.ln_shutdown = 1;	/* flag shutdown */
    +
    +	while (!list_empty(&the_lnet.ln_nets)) {
    +		/*
    +		 * move the nets to the zombie list to avoid them being
    +		 * picked up for new work. LONET is also included in the
    +		 * Nets that will be moved to the zombie list
    +		 */
    +		net = list_entry(the_lnet.ln_nets.next,
    +				 struct lnet_net, net_list);
    +		list_move(&net->net_list, &the_lnet.ln_net_zombie);
    +	}
    +
    +	/* Drop the cached loopback Net. */
    +	if (the_lnet.ln_loni) {
    +		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
    +		the_lnet.ln_loni = NULL;
    +	}
    +	lnet_net_unlock(LNET_LOCK_EX);
    +
    +	/* iterate through the net zombie list and delete each net */
    +	while (!list_empty(&the_lnet.ln_net_zombie)) {
    +		net = list_entry(the_lnet.ln_net_zombie.next,
    +				 struct lnet_net, net_list);
    +		lnet_shutdown_lndnet(net);
    +	}
    +
    +	lnet_net_lock(LNET_LOCK_EX);
    +	the_lnet.ln_shutdown = 0;
    +	lnet_net_unlock(LNET_LOCK_EX);
    +}
    +
     static int
     lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     {
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 18/34] lnet: add ni_state
  2018-09-07  0:49 ` [lustre-devel] [PATCH 18/34] lnet: add ni_state NeilBrown
@ 2018-09-12  3:59   ` Doug Oucharek
  2018-09-12  4:25     ` NeilBrown
  0 siblings, 1 reply; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  3:59 UTC (permalink / raw)
  To: lustre-devel

I believe the introduction of this state machine is to help us understand how healthy an NI is so we can avoid if it is not healthy and we have other paths which are still ok.

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    This is barely used.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
     .../staging/lustre/include/linux/lnet/lib-types.h  |   16 ++++++++++++++++
     drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++++++++++
     drivers/staging/lustre/lnet/lnet/config.c          |    1 +
     4 files changed, 34 insertions(+)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    index faa3f19dd844..54a93235834c 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    @@ -400,6 +400,7 @@ int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
     struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
     struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
     struct lnet_ni *lnet_net2ni(__u32 net);
    +bool lnet_is_ni_healthy_locked(struct lnet_ni *ni);
     
     extern int portal_rotor;
     
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    index 1d372672e2de..6c34ecf22021 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    @@ -256,6 +256,19 @@ struct lnet_tx_queue {
     	struct list_head	tq_delayed;	/* delayed TXs */
     };
     
    +enum lnet_ni_state {
    +	/* set when NI block is allocated */
    +	LNET_NI_STATE_INIT = 0,
    +	/* set when NI is started successfully */
    +	LNET_NI_STATE_ACTIVE,
    +	/* set when LND notifies NI failed */
    +	LNET_NI_STATE_FAILED,
    +	/* set when LND notifies NI degraded */
    +	LNET_NI_STATE_DEGRADED,
    +	/* set when shuttding down NI */
    +	LNET_NI_STATE_DELETING
    +};
    +
     struct lnet_net {
     	/* chain on the ln_nets */
     	struct list_head	net_list;
    @@ -324,6 +337,9 @@ struct lnet_ni {
     	/* my health status */
     	struct lnet_ni_status	*ni_status;
     
    +	/* NI FSM */
    +	enum lnet_ni_state	ni_state;
    +
     	/* per NI LND tunables */
     	struct lnet_lnd_tunables ni_lnd_tunables;
     
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 46c5ca71bc07..618fdf8141f0 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -780,6 +780,16 @@ lnet_islocalnet(__u32 net)
     	return !!ni;
     }
     
    +bool
    +lnet_is_ni_healthy_locked(struct lnet_ni *ni)
    +{
    +	if (ni->ni_state == LNET_NI_STATE_ACTIVE ||
    +	    ni->ni_state == LNET_NI_STATE_DEGRADED)
    +		return true;
    +
    +	return false;
    +}
    +
     struct lnet_ni  *
     lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
     {
    @@ -1117,6 +1127,9 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
     		ni = list_entry(zombie_list->next,
     				struct lnet_ni, ni_netlist);
     		list_del_init(&ni->ni_netlist);
    +		/* the ni should be in deleting state. If it's not it's
    +		 * a bug */
    +		LASSERT(ni->ni_state == LNET_NI_STATE_DELETING);
     		cfs_percpt_for_each(ref, j, ni->ni_refs) {
     			if (!*ref)
     				continue;
    @@ -1163,6 +1176,7 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
     	struct lnet_net *net = ni->ni_net;
     
     	lnet_net_lock(LNET_LOCK_EX);
    +	ni->ni_state = LNET_NI_STATE_DELETING;
     	lnet_ni_unlink_locked(ni);
     	lnet_net_unlock(LNET_LOCK_EX);
     
    @@ -1291,6 +1305,8 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     
     	lnet_net_unlock(LNET_LOCK_EX);
     
    +	ni->ni_state = LNET_NI_STATE_ACTIVE;
    +
     	if (net->net_lnd->lnd_type == LOLND) {
     		lnet_ni_addref(ni);
     		LASSERT(!the_lnet.ln_loni);
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index 2588d67fea1b..081812e19b13 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -393,6 +393,7 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
     		ni->ni_net_ns = NULL;
     
     	ni->ni_last_alive = ktime_get_real_seconds();
    +	ni->ni_state = LNET_NI_STATE_INIT;
     	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
     	if (rc != 0)
     		goto failed;
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 19/34] lnet: simplify lnet_islocalnet()
  2018-09-07  0:49 ` [lustre-devel] [PATCH 19/34] lnet: simplify lnet_islocalnet() NeilBrown
@ 2018-09-12  4:02   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:02 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    Having lnet_get_net_locked() makes this (a little) simpler.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   14 +++++---------
     1 file changed, 5 insertions(+), 9 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 618fdf8141f0..546d5101360f 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -764,20 +764,16 @@ lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni)
     EXPORT_SYMBOL(lnet_cpt_of_nid);
     
     int
    -lnet_islocalnet(__u32 net)
    +lnet_islocalnet(__u32 net_id)
     {
    -	struct lnet_ni *ni;
    -	int cpt;
    +	struct lnet_net *net;
    +	int		cpt;
     
     	cpt = lnet_net_lock_current();
    -
    -	ni = lnet_net2ni_locked(net, cpt);
    -	if (ni)
    -		lnet_ni_decref_locked(ni, cpt);
    -
    +	net = lnet_get_net_locked(net_id);
     	lnet_net_unlock(cpt);
     
    -	return !!ni;
    +	return !!net;
     }
     
     bool
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf
  2018-09-11 18:31   ` Amir Shehata
@ 2018-09-12  4:03     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  4:03 UTC (permalink / raw)
  To: lustre-devel

On Tue, Sep 11 2018, Amir Shehata wrote:

> This block logic that was removed from lnet_startup_lndni() is done in the
> LND. Each LND has its own defaults. As an example look at
> kiblnd_tunables_setup() and ksocknal_startup().
> These tunables are LND specific and have different values per LND. So
> instead of configuring it in the common LNet function and then it gets
> overwritten again the LND. We let the LND take care of initializing to the
> default values that they use for that LND, if they haven't already been set
> by the user.
> Note currently dynamic configuration of these parameters work only for the
> o2iblnd. Socklnd and gnilnd appear to not make use of the dynamic ability.
> I'll create an LU ticket to add the ability to dynamically set these values
> to the socklnd.
> The tunables are divided into two parts, a common set of tunables that are
> common to all the LND (although each LND could have different default
> values), and a specific set of LND tunables which pertain to a specific
> LND, again that's only used by the o2iblnd at the moment.

Thanks a lot.  That helps.

NeilBrown

>
> On Thu, 6 Sep 2018 at 18:00, NeilBrown <neilb@suse.com> wrote:
>
>> I don't understand parts of this change.
>> Particularly the removal for
>>        /* If given some LND tunable parameters, parse those now to
>>         * override the values in the NI structure. */
>>
>> isn't clear to me.
>>
>> This is part of
>>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>>        LU-7734 lnet: Multi-Rail local NI split
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/staging/lustre/lnet/lnet/api-ni.c |   41
>> ++++++++---------------------
>>  1 file changed, 12 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c
>> b/drivers/staging/lustre/lnet/lnet/api-ni.c
>> index 6e0b8310574d..53ecfd700db3 100644
>> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
>> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
>> @@ -1240,10 +1240,8 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
>>  }
>>
>>  static int
>> -lnet_startup_lndni(struct lnet_ni *ni, struct lnet_ioctl_config_data
>> *conf)
>> +lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
>>  {
>> -       struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
>> -       struct lnet_lnd_tunables *tun = NULL;
>>         int rc = -EINVAL;
>>         int lnd_type;
>>         struct lnet_lnd *lnd;
>> @@ -1296,36 +1294,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct
>> lnet_ioctl_config_data *conf)
>>
>>         ni->ni_net->net_lnd = lnd;
>>
>> -       if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf)) {
>> -               lnd_tunables = (struct lnet_ioctl_config_lnd_tunables
>> *)conf->cfg_bulk;
>> -               tun = &lnd_tunables->lt_tun;
>> -       }
>> -
>>         if (tun) {
>>                 memcpy(&ni->ni_lnd_tunables, tun,
>>                        sizeof(*tun));
>>                 ni->ni_lnd_tunables_set = true;
>>         }
>>
>> -       /*
>> -        * If given some LND tunable parameters, parse those now to
>> -        * override the values in the NI structure.
>> -        */
>> -       if (conf) {
>> -               if (conf->cfg_config_u.cfg_net.net_peer_rtr_credits >= 0)
>> -                       ni->ni_net->net_tunables.lct_peer_rtr_credits =
>> -
>>  conf->cfg_config_u.cfg_net.net_peer_rtr_credits;
>> -               if (conf->cfg_config_u.cfg_net.net_peer_timeout >= 0)
>> -                       ni->ni_net->net_tunables.lct_peer_timeout =
>> -
>>  conf->cfg_config_u.cfg_net.net_peer_timeout;
>> -               if (conf->cfg_config_u.cfg_net.net_peer_tx_credits != -1)
>> -                       ni->ni_net->net_tunables.lct_peer_tx_credits =
>> -
>>  conf->cfg_config_u.cfg_net.net_peer_tx_credits;
>> -               if (conf->cfg_config_u.cfg_net.net_max_tx_credits >= 0)
>> -                       ni->ni_net->net_tunables.lct_max_tx_credits =
>> -
>>  conf->cfg_config_u.cfg_net.net_max_tx_credits;
>> -       }
>> -
>>         rc = lnd->lnd_startup(ni);
>>
>>         mutex_unlock(&the_lnet.ln_lnd_mutex);
>> @@ -1861,9 +1835,13 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct
>> lnet_ioctl_config_data *conf)
>>         struct list_head net_head;
>>         struct lnet_remotenet *rnet;
>>         int rc;
>> +       struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
>>
>>         INIT_LIST_HEAD(&net_head);
>>
>> +       if (conf && conf->cfg_hdr.ioc_len > sizeof(*conf))
>> +               lnd_tunables = (struct lnet_ioctl_config_lnd_tunables
>> *)conf->cfg_bulk;
>> +
>>         /* Create a net/ni structures for the network string */
>>         rc = lnet_parse_networks(&net_head, nets);
>>         if (rc <= 0)
>> @@ -1898,9 +1876,14 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct
>> lnet_ioctl_config_data *conf)
>>                 goto failed0;
>>
>>         list_del_init(&net->net_list);
>> +       if (lnd_tunables)
>> +               memcpy(&net->net_tunables,
>> +                      &lnd_tunables->lt_cmn,
>> sizeof(lnd_tunables->lt_cmn));
>> +
>>         ni = list_first_entry(&net->net_ni_list, struct lnet_ni,
>> ni_netlist);
>> -       rc = lnet_startup_lndni(ni, conf);
>> -       if (rc)
>> +       rc = lnet_startup_lndni(ni, (lnd_tunables ?
>> +                                    &lnd_tunables->lt_tun : NULL));
>> +       if (rc < 0)
>>                 goto failed1;
>>
>>         if (ni->ni_net->net_lnd->lnd_accept) {
>>
>>
>> _______________________________________________
>> lustre-devel mailing list
>> lustre-devel at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/1e402864/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list
  2018-09-07  0:49 ` [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list NeilBrown
@ 2018-09-12  4:07   ` Doug Oucharek
  2018-09-12  5:48     ` NeilBrown
  2018-09-12 16:29   ` Amir Shehata
  1 sibling, 1 reply; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:07 UTC (permalink / raw)
  To: lustre-devel

I'm assuming that a future patch will be chaining the NI structure on to the NET structure it belongs to.  This patch is just not chaining the NIs on a global NIS list anymore.  As such, ni_cptlist is being "repurposed".

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    This isn't used any more.
    The new comment is odd - this is no net_ni_cpt !!
    The ni_cptlist linkage is no longer used - should it go too?
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-types.h  |    4 +---
     drivers/staging/lustre/lnet/lnet/api-ni.c          |    7 -------
     2 files changed, 1 insertion(+), 10 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    index 6c34ecf22021..dc15fa75a9d2 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    @@ -305,7 +305,7 @@ struct lnet_net {
     struct lnet_ni {
     	/* chain on the lnet_net structure */
     	struct list_head	  ni_netlist;
    -	/* chain on ln_nis_cpt */
    +	/* chain on net_ni_cpt */
     	struct list_head	ni_cptlist;
     
     	spinlock_t		ni_lock;
    @@ -671,8 +671,6 @@ struct lnet {
     
     	/* LND instances */
     	struct list_head		ln_nets;
    -	/* NIs bond on specific CPT(s) */
    -	struct list_head		ln_nis_cpt;
     	/* the loopback NI */
     	struct lnet_ni			*ln_loni;
     	/* network zombie list */
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 546d5101360f..960f235df5e7 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -538,7 +538,6 @@ lnet_prepare(lnet_pid_t requested_pid)
     
     	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
     	INIT_LIST_HEAD(&the_lnet.ln_nets);
    -	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
     	INIT_LIST_HEAD(&the_lnet.ln_routers);
     	INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
     	INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
    @@ -616,7 +615,6 @@ lnet_unprepare(void)
     	LASSERT(!the_lnet.ln_refcount);
     	LASSERT(list_empty(&the_lnet.ln_test_peers));
     	LASSERT(list_empty(&the_lnet.ln_nets));
    -	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
     
     	lnet_portals_destroy();
     
    @@ -1294,11 +1292,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     	/* refcount for ln_nis */
     	lnet_ni_addref_locked(ni, 0);
     	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
    -	if (ni->ni_cpts) {
    -		lnet_ni_addref_locked(ni, 0);
    -		list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
    -	}
    -
     	lnet_net_unlock(LNET_LOCK_EX);
     
     	ni->ni_state = LNET_NI_STATE_ACTIVE;
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis
  2018-09-12  3:53   ` Doug Oucharek
@ 2018-09-12  4:10     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  4:10 UTC (permalink / raw)
  To: lustre-devel

On Wed, Sep 12 2018, Doug Oucharek wrote:

> Which refcount line are you referring to?  The call to
> lnet_ni_unlink_locked()?

Line 1141 = in lnet_clear_zombies_nis_locked().

>     -		ni->ni_net->net_lnd->lnd_refcount--;

Thanks,
NeilBrown

>
> Reviewed-by: Doug Oucharek <dougso@me.com>
>
> Doug
>
> ?On 9/6/18, 5:53 PM, "NeilBrown" <neilb@suse.com> wrote:
>
>     A zombie lnet_ni is now attached to the lnet_net rather than the
>     global the_lnet.  The zombie lnet_net are attached to the_lnet.
>     
>     For some reason, we don't drop the refcount on the lnd before shutting
>     it down now.
>     
>     This is part of
>         8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>            LU-7734 lnet: Multi-Rail local NI split
>     
>     Signed-off-by: NeilBrown <neilb@suse.com>
>     ---
>      .../staging/lustre/include/linux/lnet/lib-types.h  |    9 ++-
>      drivers/staging/lustre/lnet/lnet/api-ni.c          |   65 ++++++++++----------
>      drivers/staging/lustre/lnet/lnet/config.c          |    3 +
>      3 files changed, 42 insertions(+), 35 deletions(-)
>     
>     diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     index 22957d142cc0..1d372672e2de 100644
>     --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     @@ -284,6 +284,9 @@ struct lnet_net {
>      	struct lnet_lnd		*net_lnd;
>      	/* list of NIs on this net */
>      	struct list_head	net_ni_list;
>     +
>     +	/* dying LND instances */
>     +	struct list_head	net_ni_zombie;
>      };
>      
>      struct lnet_ni {
>     @@ -653,11 +656,11 @@ struct lnet {
>      	/* LND instances */
>      	struct list_head		ln_nets;
>      	/* NIs bond on specific CPT(s) */
>     -	struct list_head		  ln_nis_cpt;
>     -	/* dying LND instances */
>     -	struct list_head		  ln_nis_zombie;
>     +	struct list_head		ln_nis_cpt;
>      	/* the loopback NI */
>      	struct lnet_ni			*ln_loni;
>     +	/* network zombie list */
>     +	struct list_head		ln_net_zombie;
>      
>      	/* remote networks with routes to them */
>      	struct list_head		 *ln_remote_nets_hash;
>     diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
>     index c3c568e63342..18d111cb826b 100644
>     --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
>     +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
>     @@ -539,7 +539,6 @@ lnet_prepare(lnet_pid_t requested_pid)
>      	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
>      	INIT_LIST_HEAD(&the_lnet.ln_nets);
>      	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
>     -	INIT_LIST_HEAD(&the_lnet.ln_nis_zombie);
>      	INIT_LIST_HEAD(&the_lnet.ln_routers);
>      	INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
>      	INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
>     @@ -618,7 +617,6 @@ lnet_unprepare(void)
>      	LASSERT(list_empty(&the_lnet.ln_test_peers));
>      	LASSERT(list_empty(&the_lnet.ln_nets));
>      	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
>     -	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
>      
>      	lnet_portals_destroy();
>      
>     @@ -1095,34 +1093,35 @@ lnet_ni_unlink_locked(struct lnet_ni *ni)
>      
>      	/* move it to zombie list and nobody can find it anymore */
>      	LASSERT(!list_empty(&ni->ni_netlist));
>     -	list_move(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
>     +	list_move(&ni->ni_netlist, &ni->ni_net->net_ni_zombie);
>      	lnet_ni_decref_locked(ni, 0);
>      }
>      
>      static void
>     -lnet_clear_zombies_nis_locked(void)
>     +lnet_clear_zombies_nis_locked(struct lnet_net *net)
>      {
>      	int i;
>      	int islo;
>      	struct lnet_ni *ni;
>     +	struct list_head *zombie_list = &net->net_ni_zombie;
>      
>      	/*
>     -	 * Now wait for the NI's I just nuked to show up on ln_zombie_nis
>     -	 * and shut them down in guaranteed thread context
>     +	 * Now wait for the NIs I just nuked to show up on the zombie
>     +	 * list and shut them down in guaranteed thread context
>      	 */
>      	i = 2;
>     -	while (!list_empty(&the_lnet.ln_nis_zombie)) {
>     +	while (!list_empty(zombie_list)) {
>      		int *ref;
>      		int j;
>      
>     -		ni = list_entry(the_lnet.ln_nis_zombie.next,
>     +		ni = list_entry(zombie_list->next,
>      				struct lnet_ni, ni_netlist);
>      		list_del_init(&ni->ni_netlist);
>      		cfs_percpt_for_each(ref, j, ni->ni_refs) {
>      			if (!*ref)
>      				continue;
>      			/* still busy, add it back to zombie list */
>     -			list_add(&ni->ni_netlist, &the_lnet.ln_nis_zombie);
>     +			list_add(&ni->ni_netlist, zombie_list);
>      			break;
>      		}
>      
>     @@ -1138,18 +1137,13 @@ lnet_clear_zombies_nis_locked(void)
>      			continue;
>      		}
>      
>     -		ni->ni_net->net_lnd->lnd_refcount--;
>      		lnet_net_unlock(LNET_LOCK_EX);
>      
>      		islo = ni->ni_net->net_lnd->lnd_type == LOLND;
>      
>      		LASSERT(!in_interrupt());
>     -		ni->ni_net->net_lnd->lnd_shutdown(ni);
>     +		net->net_lnd->lnd_shutdown(ni);
>      
>     -		/*
>     -		 * can't deref lnd anymore now; it might have unregistered
>     -		 * itself...
>     -		 */
>      		if (!islo)
>      			CDEBUG(D_LNI, "Removed LNI %s\n",
>      			       libcfs_nid2str(ni->ni_nid));
>     @@ -1162,9 +1156,11 @@ lnet_clear_zombies_nis_locked(void)
>      }
>      
>      static void
>     -lnet_shutdown_lndnis(void)
>     +lnet_shutdown_lndnet(struct lnet_net *net);
>     +
>     +static void
>     +lnet_shutdown_lndnets(void)
>      {
>     -	struct lnet_ni *ni;
>      	int i;
>      	struct lnet_net *net;
>      
>     @@ -1173,30 +1169,35 @@ lnet_shutdown_lndnis(void)
>      	/* All quiet on the API front */
>      	LASSERT(!the_lnet.ln_shutdown);
>      	LASSERT(!the_lnet.ln_refcount);
>     -	LASSERT(list_empty(&the_lnet.ln_nis_zombie));
>      
>      	lnet_net_lock(LNET_LOCK_EX);
>      	the_lnet.ln_shutdown = 1;	/* flag shutdown */
>      
>     -	/* Unlink NIs from the global table */
>      	while (!list_empty(&the_lnet.ln_nets)) {
>     +		/*
>     +		 * move the nets to the zombie list to avoid them being
>     +		 * picked up for new work. LONET is also included in the
>     +		 * Nets that will be moved to the zombie list
>     +		 */
>      		net = list_entry(the_lnet.ln_nets.next,
>      				 struct lnet_net, net_list);
>     -		while (!list_empty(&net->net_ni_list)) {
>     -			ni = list_entry(net->net_ni_list.next,
>     -					struct lnet_ni, ni_netlist);
>     -			lnet_ni_unlink_locked(ni);
>     -		}
>     +		list_move(&net->net_list, &the_lnet.ln_net_zombie);
>      	}
>      
>     -	/* Drop the cached loopback NI. */
>     +	/* Drop the cached loopback Net. */
>      	if (the_lnet.ln_loni) {
>      		lnet_ni_decref_locked(the_lnet.ln_loni, 0);
>      		the_lnet.ln_loni = NULL;
>      	}
>     -
>      	lnet_net_unlock(LNET_LOCK_EX);
>      
>     +	/* iterate through the net zombie list and delete each net */
>     +	while (!list_empty(&the_lnet.ln_net_zombie)) {
>     +		net = list_entry(the_lnet.ln_net_zombie.next,
>     +				 struct lnet_net, net_list);
>     +		lnet_shutdown_lndnet(net);
>     +	}
>     +
>      	/*
>      	 * Clear lazy portals and drop delayed messages which hold refs
>      	 * on their lnet_msg::msg_rxpeer
>     @@ -1211,8 +1212,6 @@ lnet_shutdown_lndnis(void)
>      	lnet_peer_tables_cleanup(NULL);
>      
>      	lnet_net_lock(LNET_LOCK_EX);
>     -
>     -	lnet_clear_zombies_nis_locked();
>      	the_lnet.ln_shutdown = 0;
>      	lnet_net_unlock(LNET_LOCK_EX);
>      }
>     @@ -1222,6 +1221,7 @@ static void
>      lnet_shutdown_lndni(struct lnet_ni *ni)
>      {
>      	int i;
>     +	struct lnet_net *net = ni->ni_net;
>      
>      	lnet_net_lock(LNET_LOCK_EX);
>      	lnet_ni_unlink_locked(ni);
>     @@ -1235,7 +1235,7 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
>      	lnet_peer_tables_cleanup(ni);
>      
>      	lnet_net_lock(LNET_LOCK_EX);
>     -	lnet_clear_zombies_nis_locked();
>     +	lnet_clear_zombies_nis_locked(net);
>      	lnet_net_unlock(LNET_LOCK_EX);
>      }
>      
>     @@ -1445,7 +1445,7 @@ lnet_startup_lndnets(struct list_head *netlist)
>      
>      	return ni_count;
>      failed:
>     -	lnet_shutdown_lndnis();
>     +	lnet_shutdown_lndnets();
>      
>      	return rc;
>      }
>     @@ -1492,6 +1492,7 @@ int lnet_lib_init(void)
>      	the_lnet.ln_refcount = 0;
>      	LNetInvalidateEQHandle(&the_lnet.ln_rc_eqh);
>      	INIT_LIST_HEAD(&the_lnet.ln_lnds);
>     +	INIT_LIST_HEAD(&the_lnet.ln_net_zombie);
>      	INIT_LIST_HEAD(&the_lnet.ln_rcd_zombie);
>      	INIT_LIST_HEAD(&the_lnet.ln_rcd_deathrow);
>      
>     @@ -1656,7 +1657,7 @@ LNetNIInit(lnet_pid_t requested_pid)
>      	if (!the_lnet.ln_nis_from_mod_params)
>      		lnet_destroy_routes();
>      err_shutdown_lndnis:
>     -	lnet_shutdown_lndnis();
>     +	lnet_shutdown_lndnets();
>      err_empty_list:
>      	lnet_unprepare();
>      	LASSERT(rc < 0);
>     @@ -1703,7 +1704,7 @@ LNetNIFini(void)
>      
>      		lnet_acceptor_stop();
>      		lnet_destroy_routes();
>     -		lnet_shutdown_lndnis();
>     +		lnet_shutdown_lndnets();
>      		lnet_unprepare();
>      	}
>      
>     diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
>     index 380a3fb1caba..2588d67fea1b 100644
>     --- a/drivers/staging/lustre/lnet/lnet/config.c
>     +++ b/drivers/staging/lustre/lnet/lnet/config.c
>     @@ -279,6 +279,8 @@ lnet_net_free(struct lnet_net *net)
>      	struct list_head *tmp, *tmp2;
>      	struct lnet_ni *ni;
>      
>     +	LASSERT(list_empty(&net->net_ni_zombie));
>     +
>      	/* delete any nis which have been started. */
>      	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
>      		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
>     @@ -312,6 +314,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
>      
>      	INIT_LIST_HEAD(&net->net_list);
>      	INIT_LIST_HEAD(&net->net_ni_list);
>     +	INIT_LIST_HEAD(&net->net_ni_zombie);
>      
>      	net->net_id = net_id;
>      
>     
>     
>     
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/56aabeb8/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 21/34] lnet: add net_ni_added
  2018-09-07  0:49 ` [lustre-devel] [PATCH 21/34] lnet: add net_ni_added NeilBrown
@ 2018-09-12  4:15   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:15 UTC (permalink / raw)
  To: lustre-devel

I have to say, there are way too many lists being managed in LNet.  I'm confusing myself looking at this code again.  Wish there was a better way.

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    When we allocate an ni, it is now added to the new net_ni_added
    list of unstarted interfaces.
    lnet_startup_lndnet() now starts all those added interfaces.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-types.h  |    3 ++
     drivers/staging/lustre/lnet/lnet/api-ni.c          |   39 +++++++++++++++++---
     drivers/staging/lustre/lnet/lnet/config.c          |   13 ++++++-
     3 files changed, 48 insertions(+), 7 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    index dc15fa75a9d2..1faa247a93b8 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
    @@ -298,6 +298,9 @@ struct lnet_net {
     	/* list of NIs on this net */
     	struct list_head	net_ni_list;
     
    +	/* list of NIs being added, but not started yet */
    +	struct list_head	net_ni_added;
    +
     	/* dying LND instances */
     	struct list_head	net_ni_zombie;
     };
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 960f235df5e7..ce3dd0f32e12 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1350,12 +1350,15 @@ static int
     lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     {
     	struct lnet_ni		*ni;
    +	struct list_head	local_ni_list;
    +	int			rc;
    +	int			ni_count = 0;
     	__u32			lnd_type;
     	struct lnet_lnd		*lnd;
    -	int			rc;
     
     	lnd_type = LNET_NETTYP(net->net_id);
     
    +	INIT_LIST_HEAD(&local_ni_list);
     	LASSERT(libcfs_isknown_lnd(lnd_type));
     
     	/* Make sure this new NI is unique. */
    @@ -1399,12 +1402,36 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	net->net_lnd = lnd;
     	mutex_unlock(&the_lnet.ln_lnd_mutex);
     
    -	ni = list_first_entry(&net->net_ni_list, struct lnet_ni, ni_netlist);
    +	while (!list_empty(&net->net_ni_added)) {
    +		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
    +				ni_netlist);
    +		list_del_init(&ni->ni_netlist);
     
    -	rc = lnet_startup_lndni(ni, tun);
    -	if (rc < 0)
    -		return rc;
    -	return 1;
    +		rc = lnet_startup_lndni(ni, tun);
    +
    +		if (rc < 0)
    +			goto failed1;
    +
    +		list_add_tail(&ni->ni_netlist, &local_ni_list);
    +
    +		ni_count++;
    +	}
    +	lnet_net_lock(LNET_LOCK_EX);
    +	list_splice_tail(&local_ni_list, &net->net_ni_list);
    +	lnet_net_unlock(LNET_LOCK_EX);
    +	return ni_count;
    +
    +failed1:
    +	/*
    +	 * shutdown the new NIs that are being started up
    +	 * free the NET being started
    +	 */
    +	while (!list_empty(&local_ni_list)) {
    +		ni = list_entry(local_ni_list.next, struct lnet_ni,
    +				ni_netlist);
    +
    +		lnet_shutdown_lndni(ni);
    +	}
     
     failed0:
     	lnet_net_free(net);
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index 081812e19b13..f886dcfc6d6e 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -281,6 +281,16 @@ lnet_net_free(struct lnet_net *net)
     
     	LASSERT(list_empty(&net->net_ni_zombie));
     
    +	/*
    +	 * delete any nis that haven't been added yet. This could happen
    +	 * if there is a failure on net startup
    +	 */
    +	list_for_each_safe(tmp, tmp2, &net->net_ni_added) {
    +		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
    +		list_del_init(&ni->ni_netlist);
    +		lnet_ni_free(ni);
    +	}
    +
     	/* delete any nis which have been started. */
     	list_for_each_safe(tmp, tmp2, &net->net_ni_list) {
     		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
    @@ -314,6 +324,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
     
     	INIT_LIST_HEAD(&net->net_list);
     	INIT_LIST_HEAD(&net->net_ni_list);
    +	INIT_LIST_HEAD(&net->net_ni_added);
     	INIT_LIST_HEAD(&net->net_ni_zombie);
     
     	net->net_id = net_id;
    @@ -397,7 +408,7 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
     	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
     	if (rc != 0)
     		goto failed;
    -	list_add_tail(&ni->ni_netlist, &net->net_ni_list);
    +	list_add_tail(&ni->ni_netlist, &net->net_ni_added);
     
     	return ni;
     failed:
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 22/34] lnet: don't take reference in lnet_XX2ni_locked()
  2018-09-07  0:49 ` [lustre-devel] [PATCH 22/34] lnet: don't take reference in lnet_XX2ni_locked() NeilBrown
@ 2018-09-12  4:18   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:18 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    lnet_net2ni_locked() and lnet_nid2ni_locked() no longer take
    a reference - as the lock is held, a ref isn't always needed.
    
    Instead, introduce lnet_nid2ni_addref() which does take the reference
    (but doesn't need the lock).
    Various places which called lnet_net2ni_locked() or
    lnet_nid2ni_locked() no longer need to drop the ref afterwards.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
     .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |    2 +
     drivers/staging/lustre/lnet/lnet/acceptor.c        |    2 +
     drivers/staging/lustre/lnet/lnet/api-ni.c          |   27 +++++++++++++-------
     drivers/staging/lustre/lnet/lnet/lib-move.c        |   17 +------------
     5 files changed, 21 insertions(+), 28 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    index 54a93235834c..6401d9a37b23 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    @@ -398,6 +398,7 @@ extern int avoid_asym_router_failure;
     int lnet_cpt_of_nid_locked(lnet_nid_t nid, struct lnet_ni *ni);
     int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
     struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
    +struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid);
     struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
     struct lnet_ni *lnet_net2ni(__u32 net);
     bool lnet_is_ni_healthy_locked(struct lnet_ni *ni);
    diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
    index e64c14914924..af8f863b6a68 100644
    --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
    +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
    @@ -2294,7 +2294,7 @@ kiblnd_passive_connect(struct rdma_cm_id *cmid, void *priv, int priv_nob)
     	}
     
     	nid = reqmsg->ibm_srcnid;
    -	ni = lnet_net2ni(LNET_NIDNET(reqmsg->ibm_dstnid));
    +	ni = lnet_nid2ni_addref(reqmsg->ibm_dstnid);
     
     	if (ni) {
     		net = (struct kib_net *)ni->ni_data;
    diff --git a/drivers/staging/lustre/lnet/lnet/acceptor.c b/drivers/staging/lustre/lnet/lnet/acceptor.c
    index 88b90c1fdbaf..25205f686801 100644
    --- a/drivers/staging/lustre/lnet/lnet/acceptor.c
    +++ b/drivers/staging/lustre/lnet/lnet/acceptor.c
    @@ -296,7 +296,7 @@ lnet_accept(struct socket *sock, __u32 magic)
     	if (flip)
     		__swab64s(&cr.acr_nid);
     
    -	ni = lnet_net2ni(LNET_NIDNET(cr.acr_nid));
    +	ni = lnet_nid2ni_addref(cr.acr_nid);
     	if (!ni ||	       /* no matching net */
     	    ni->ni_nid != cr.acr_nid) { /* right NET, wrong NID! */
     		if (ni)
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index ce3dd0f32e12..42e775e2a669 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -655,7 +655,6 @@ lnet_net2ni_locked(__u32 net_id, int cpt)
     		if (net->net_id == net_id) {
     			ni = list_entry(net->net_ni_list.next, struct lnet_ni,
     					ni_netlist);
    -			lnet_ni_addref_locked(ni, cpt);
     			return ni;
     		}
     	}
    @@ -794,16 +793,29 @@ lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
     
     	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
     		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
    -			if (ni->ni_nid == nid) {
    -				lnet_ni_addref_locked(ni, cpt);
    +			if (ni->ni_nid == nid)
     				return ni;
    -			}
     		}
     	}
     
     	return NULL;
     }
     
    +struct lnet_ni *
    +lnet_nid2ni_addref(lnet_nid_t nid)
    +{
    +	struct lnet_ni *ni;
    +
    +	lnet_net_lock(0);
    +	ni = lnet_nid2ni_locked(nid, 0);
    +	if (ni)
    +		lnet_ni_addref_locked(ni, 0);
    +	lnet_net_unlock(0);
    +
    +	return ni;
    +}
    +EXPORT_SYMBOL(lnet_nid2ni_addref);
    +
     int
     lnet_islocalnid(lnet_nid_t nid)
     {
    @@ -812,8 +824,6 @@ lnet_islocalnid(lnet_nid_t nid)
     
     	cpt = lnet_net_lock_current();
     	ni = lnet_nid2ni_locked(nid, cpt);
    -	if (ni)
    -		lnet_ni_decref_locked(ni, cpt);
     	lnet_net_unlock(cpt);
     
     	return !!ni;
    @@ -1412,6 +1422,7 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     		if (rc < 0)
     			goto failed1;
     
    +		lnet_ni_addref(ni);
     		list_add_tail(&ni->ni_netlist, &local_ni_list);
     
     		ni_count++;
    @@ -2032,9 +2043,6 @@ lnet_dyn_del_ni(__u32 net)
     		goto failed;
     	}
     
    -	/* decrement the reference counter taken by lnet_net2ni() */
    -	lnet_ni_decref_locked(ni, 0);
    -
     	lnet_shutdown_lndni(ni);
     
     	if (!lnet_count_acceptor_nets())
    @@ -2264,7 +2272,6 @@ LNetCtl(unsigned int cmd, void *arg)
     		else
     			rc = ni->ni_net->net_lnd->lnd_ctl(ni, cmd, arg);
     
    -		lnet_ni_decref(ni);
     		return rc;
     	}
     	/* not reached */
    diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
    index 00a89221c9b3..60f34c4b85d3 100644
    --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
    +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
    @@ -1127,11 +1127,7 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
     		if (!src_ni) {
     			src_ni = local_ni;
     			src_nid = src_ni->ni_nid;
    -		} else if (src_ni == local_ni) {
    -			lnet_ni_decref_locked(local_ni, cpt);
    -		} else {
    -			lnet_ni_decref_locked(local_ni, cpt);
    -			lnet_ni_decref_locked(src_ni, cpt);
    +		} else if (src_ni != local_ni) {
     			lnet_net_unlock(cpt);
     			LCONSOLE_WARN("No route to %s via from %s\n",
     				      libcfs_nid2str(dst_nid),
    @@ -1149,16 +1145,10 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
     			/* No send credit hassles with LOLND */
     			lnet_net_unlock(cpt);
     			lnet_ni_send(src_ni, msg);
    -
    -			lnet_net_lock(cpt);
    -			lnet_ni_decref_locked(src_ni, cpt);
    -			lnet_net_unlock(cpt);
     			return 0;
     		}
     
     		rc = lnet_nid2peer_locked(&lp, dst_nid, cpt);
    -		/* lp has ref on src_ni; lose mine */
    -		lnet_ni_decref_locked(src_ni, cpt);
     		if (rc) {
     			lnet_net_unlock(cpt);
     			LCONSOLE_WARN("Error %d finding peer %s\n", rc,
    @@ -1173,8 +1163,6 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
     					    src_ni->ni_net : NULL,
     					    dst_nid, rtr_nid);
     		if (!lp) {
    -			if (src_ni)
    -				lnet_ni_decref_locked(src_ni, cpt);
     			lnet_net_unlock(cpt);
     
     			LCONSOLE_WARN("No route to %s via %s (all routers down)\n",
    @@ -1192,8 +1180,6 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
     		if (rtr_nid != lp->lp_nid) {
     			cpt2 = lp->lp_cpt;
     			if (cpt2 != cpt) {
    -				if (src_ni)
    -					lnet_ni_decref_locked(src_ni, cpt);
     				lnet_net_unlock(cpt);
     
     				rtr_nid = lp->lp_nid;
    @@ -1212,7 +1198,6 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
     			src_nid = src_ni->ni_nid;
     		} else {
     			LASSERT(src_ni->ni_net == lp->lp_net);
    -			lnet_ni_decref_locked(src_ni, cpt);
     		}
     
     		lnet_peer_addref_locked(lp);
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 18/34] lnet: add ni_state
  2018-09-12  3:59   ` Doug Oucharek
@ 2018-09-12  4:25     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  4:25 UTC (permalink / raw)
  To: lustre-devel

On Wed, Sep 12 2018, Doug Oucharek wrote:

> I believe the introduction of this state machine is to help us understand how healthy an NI is so we can avoid if it is not healthy and we have other paths which are still ok.
>
> Reviewed-by: Doug Oucharek <dougso@me.com>

Thanks.  Now reads:

-----------
lnet: add ni_state

This will be used more in later patches to track how healthy an NI is,
so we can avoid one if it isn't healthy and we have other paths which
are still OK.

Reviewed-by: Doug Oucharek <dougso@me.com>
Signed-off-by: NeilBrown <neilb@suse.com>
------------

I noticed that it was used more in later patches, which is why I didn't
discard the patch.  The original has a "net_state" in lnet_net - I
haven't included that change as net_state is still unused.

Thanks,
NeilBrown


>
> Doug
>
> ?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:
>
>     This is barely used.
>     
>     This is part of
>         8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>            LU-7734 lnet: Multi-Rail local NI split
>     
>     Signed-off-by: NeilBrown <neilb@suse.com>
>     ---
>      .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
>      .../staging/lustre/include/linux/lnet/lib-types.h  |   16 ++++++++++++++++
>      drivers/staging/lustre/lnet/lnet/api-ni.c          |   16 ++++++++++++++++
>      drivers/staging/lustre/lnet/lnet/config.c          |    1 +
>      4 files changed, 34 insertions(+)
>     
>     diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
>     index faa3f19dd844..54a93235834c 100644
>     --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
>     +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
>     @@ -400,6 +400,7 @@ int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni);
>      struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
>      struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
>      struct lnet_ni *lnet_net2ni(__u32 net);
>     +bool lnet_is_ni_healthy_locked(struct lnet_ni *ni);
>      
>      extern int portal_rotor;
>      
>     diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     index 1d372672e2de..6c34ecf22021 100644
>     --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     @@ -256,6 +256,19 @@ struct lnet_tx_queue {
>      	struct list_head	tq_delayed;	/* delayed TXs */
>      };
>      
>     +enum lnet_ni_state {
>     +	/* set when NI block is allocated */
>     +	LNET_NI_STATE_INIT = 0,
>     +	/* set when NI is started successfully */
>     +	LNET_NI_STATE_ACTIVE,
>     +	/* set when LND notifies NI failed */
>     +	LNET_NI_STATE_FAILED,
>     +	/* set when LND notifies NI degraded */
>     +	LNET_NI_STATE_DEGRADED,
>     +	/* set when shuttding down NI */
>     +	LNET_NI_STATE_DELETING
>     +};
>     +
>      struct lnet_net {
>      	/* chain on the ln_nets */
>      	struct list_head	net_list;
>     @@ -324,6 +337,9 @@ struct lnet_ni {
>      	/* my health status */
>      	struct lnet_ni_status	*ni_status;
>      
>     +	/* NI FSM */
>     +	enum lnet_ni_state	ni_state;
>     +
>      	/* per NI LND tunables */
>      	struct lnet_lnd_tunables ni_lnd_tunables;
>      
>     diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
>     index 46c5ca71bc07..618fdf8141f0 100644
>     --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
>     +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
>     @@ -780,6 +780,16 @@ lnet_islocalnet(__u32 net)
>      	return !!ni;
>      }
>      
>     +bool
>     +lnet_is_ni_healthy_locked(struct lnet_ni *ni)
>     +{
>     +	if (ni->ni_state == LNET_NI_STATE_ACTIVE ||
>     +	    ni->ni_state == LNET_NI_STATE_DEGRADED)
>     +		return true;
>     +
>     +	return false;
>     +}
>     +
>      struct lnet_ni  *
>      lnet_nid2ni_locked(lnet_nid_t nid, int cpt)
>      {
>     @@ -1117,6 +1127,9 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
>      		ni = list_entry(zombie_list->next,
>      				struct lnet_ni, ni_netlist);
>      		list_del_init(&ni->ni_netlist);
>     +		/* the ni should be in deleting state. If it's not it's
>     +		 * a bug */
>     +		LASSERT(ni->ni_state == LNET_NI_STATE_DELETING);
>      		cfs_percpt_for_each(ref, j, ni->ni_refs) {
>      			if (!*ref)
>      				continue;
>     @@ -1163,6 +1176,7 @@ lnet_shutdown_lndni(struct lnet_ni *ni)
>      	struct lnet_net *net = ni->ni_net;
>      
>      	lnet_net_lock(LNET_LOCK_EX);
>     +	ni->ni_state = LNET_NI_STATE_DELETING;
>      	lnet_ni_unlink_locked(ni);
>      	lnet_net_unlock(LNET_LOCK_EX);
>      
>     @@ -1291,6 +1305,8 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
>      
>      	lnet_net_unlock(LNET_LOCK_EX);
>      
>     +	ni->ni_state = LNET_NI_STATE_ACTIVE;
>     +
>      	if (net->net_lnd->lnd_type == LOLND) {
>      		lnet_ni_addref(ni);
>      		LASSERT(!the_lnet.ln_loni);
>     diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
>     index 2588d67fea1b..081812e19b13 100644
>     --- a/drivers/staging/lustre/lnet/lnet/config.c
>     +++ b/drivers/staging/lustre/lnet/lnet/config.c
>     @@ -393,6 +393,7 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
>      		ni->ni_net_ns = NULL;
>      
>      	ni->ni_last_alive = ktime_get_real_seconds();
>     +	ni->ni_state = LNET_NI_STATE_INIT;
>      	rc = lnet_net_append_cpts(ni->ni_cpts, ni->ni_ncpts, net);
>      	if (rc != 0)
>      		goto failed;
>     
>     
>     
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/d908d434/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown NeilBrown
@ 2018-09-12  4:27   ` Doug Oucharek
  2018-09-12  5:54     ` NeilBrown
  0 siblings, 1 reply; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:27 UTC (permalink / raw)
  To: lustre-devel

It seems the selection code being affected by this patch later gets moved to its own routine called lnet_select_pathway().  The logic is completely re-written so it may not be important to fix the use of locks here.  However,  it is not good that the lock is not being held while checking for shutdown.

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    ln_shutdown returns -ESHUTDOWN if ln_shutdown
    is already set.
    The lock is always taken to set ln_shutdown, but apparently
    we don't need to hold the lock for this test.
    I guess if it is set immediately after the test, and before
    we take the lock then.... can anything bad happen?
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/lib-move.c |    7 ++-----
     1 file changed, 2 insertions(+), 5 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
    index 60f34c4b85d3..46e593fbb44f 100644
    --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
    +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
    @@ -1099,12 +1099,9 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
     	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid,
     			      local_ni);
      again:
    -	lnet_net_lock(cpt);
    -
    -	if (the_lnet.ln_shutdown) {
    -		lnet_net_unlock(cpt);
    +	if (the_lnet.ln_shutdown)
     		return -ESHUTDOWN;
    -	}
    +	lnet_net_lock(cpt);
     
     	if (src_nid == LNET_NID_ANY) {
     		src_ni = NULL;
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 24/34] lnet: don't take lock over lnet_net_unique()
  2018-09-07  0:49 ` [lustre-devel] [PATCH 24/34] lnet: don't take lock over lnet_net_unique() NeilBrown
@ 2018-09-12  4:29   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:29 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    holding ln_api_mutex is enough to keep the list
    stable.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |    2 --
     1 file changed, 2 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 42e775e2a669..2b5c25a1dc7c 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1372,9 +1372,7 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	LASSERT(libcfs_isknown_lnd(lnd_type));
     
     	/* Make sure this new NI is unique. */
    -	lnet_net_lock(LNET_LOCK_EX);
     	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
    -	lnet_net_unlock(LNET_LOCK_EX);
     	if (!rc) {
     		if (lnd_type == LOLND) {
     			lnet_net_free(net);
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 25/34] lnet: swap 'then' and 'else' branches in lnet_startup_lndnet
  2018-09-07  0:49 ` [lustre-devel] [PATCH 25/34] lnet: swap 'then' and 'else' branches in lnet_startup_lndnet NeilBrown
@ 2018-09-12  4:32   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:32 UTC (permalink / raw)
  To: lustre-devel

Hmm...if we made lnd_refcount atomic, could we get rid of unnecessary calls to lnet_net_lock(LNET_LOCK_EX) ?

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:

    This swap makes the diff for the next patch more readable.
    We also stop storing the return value from lnet_net_unique()
    as it is never used.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   55 +++++++++++++++--------------
     1 file changed, 28 insertions(+), 27 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 2b5c25a1dc7c..ab4d093c04da 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1372,8 +1372,34 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	LASSERT(libcfs_isknown_lnd(lnd_type));
     
     	/* Make sure this new NI is unique. */
    -	rc = lnet_net_unique(net->net_id, &the_lnet.ln_nets);
    -	if (!rc) {
    +	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets)) {
    +		mutex_lock(&the_lnet.ln_lnd_mutex);
    +		lnd = lnet_find_lnd_by_type(lnd_type);
    +
    +		if (lnd == NULL) {
    +			mutex_unlock(&the_lnet.ln_lnd_mutex);
    +			rc = request_module("%s", libcfs_lnd2modname(lnd_type));
    +			mutex_lock(&the_lnet.ln_lnd_mutex);
    +
    +			lnd = lnet_find_lnd_by_type(lnd_type);
    +			if (lnd == NULL) {
    +				mutex_unlock(&the_lnet.ln_lnd_mutex);
    +				CERROR("Can't load LND %s, module %s, rc=%d\n",
    +				libcfs_lnd2str(lnd_type),
    +				libcfs_lnd2modname(lnd_type), rc);
    +				rc = -EINVAL;
    +				goto failed0;
    +			}
    +		}
    +
    +		lnet_net_lock(LNET_LOCK_EX);
    +		lnd->lnd_refcount++;
    +		lnet_net_unlock(LNET_LOCK_EX);
    +
    +		net->net_lnd = lnd;
    +
    +		mutex_unlock(&the_lnet.ln_lnd_mutex);
    +	} else {
     		if (lnd_type == LOLND) {
     			lnet_net_free(net);
     			return 0;
    @@ -1385,31 +1411,6 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     		goto failed0;
     	}
     
    -	mutex_lock(&the_lnet.ln_lnd_mutex);
    -	lnd = lnet_find_lnd_by_type(lnd_type);
    -
    -	if (!lnd) {
    -		mutex_unlock(&the_lnet.ln_lnd_mutex);
    -		rc = request_module("%s", libcfs_lnd2modname(lnd_type));
    -		mutex_lock(&the_lnet.ln_lnd_mutex);
    -
    -		lnd = lnet_find_lnd_by_type(lnd_type);
    -		if (!lnd) {
    -			mutex_unlock(&the_lnet.ln_lnd_mutex);
    -			CERROR("Can't load LND %s, module %s, rc=%d\n",
    -			       libcfs_lnd2str(lnd_type),
    -			       libcfs_lnd2modname(lnd_type), rc);
    -			rc = -EINVAL;
    -			goto failed0;
    -		}
    -	}
    -
    -	lnet_net_lock(LNET_LOCK_EX);
    -	lnd->lnd_refcount++;
    -	lnet_net_unlock(LNET_LOCK_EX);
    -	net->net_lnd = lnd;
    -	mutex_unlock(&the_lnet.ln_lnd_mutex);
    -
     	while (!list_empty(&net->net_ni_added)) {
     		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
     				ni_netlist);
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 26/34] lnet: only valid lnd_type when net_id is unique.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 26/34] lnet: only valid lnd_type when net_id is unique NeilBrown
@ 2018-09-12  4:34   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:34 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    If it isn't unique, we won't add it, so no need to validate.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |    7 ++++---
     1 file changed, 4 insertions(+), 3 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index ab4d093c04da..0dfd3004f735 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1366,13 +1366,14 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	__u32			lnd_type;
     	struct lnet_lnd		*lnd;
     
    -	lnd_type = LNET_NETTYP(net->net_id);
    -
     	INIT_LIST_HEAD(&local_ni_list);
    -	LASSERT(libcfs_isknown_lnd(lnd_type));
     
     	/* Make sure this new NI is unique. */
     	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets)) {
    +		lnd_type = LNET_NETTYP(net->net_id);
    +
    +		LASSERT(libcfs_isknown_lnd(lnd_type));
    +
     		mutex_lock(&the_lnet.ln_lnd_mutex);
     		lnd = lnet_find_lnd_by_type(lnd_type);
     
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 27/34] lnet: make it possible to add a new interface to a network
  2018-09-07  0:49 ` [lustre-devel] [PATCH 27/34] lnet: make it possible to add a new interface to a network NeilBrown
@ 2018-09-12  4:38   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:38 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    lnet_startup_lndnet() is enhanced to cope if the net already
    exists.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-lnet.h   |    3 +
     drivers/staging/lustre/lnet/lnet/api-ni.c          |   69 +++++++++++++++-----
     drivers/staging/lustre/lnet/lnet/config.c          |   12 ++-
     3 files changed, 61 insertions(+), 23 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    index 6401d9a37b23..905213fc16c7 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    @@ -630,7 +630,8 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
     int lnet_parse_ip2nets(char **networksp, char *ip2nets);
     int lnet_parse_routes(char *route_str, int *im_a_router);
     int lnet_parse_networks(struct list_head *nilist, char *networks);
    -bool lnet_net_unique(__u32 net, struct list_head *nilist);
    +bool lnet_net_unique(__u32 net_id, struct list_head *nilist,
    +		     struct lnet_net **net);
     
     int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
     struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 0dfd3004f735..042ab0d9e318 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1298,14 +1298,9 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
     		goto failed0;
     	}
     
    -	lnet_net_lock(LNET_LOCK_EX);
    -	/* refcount for ln_nis */
    -	lnet_ni_addref_locked(ni, 0);
    -	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
    -	lnet_net_unlock(LNET_LOCK_EX);
    -
     	ni->ni_state = LNET_NI_STATE_ACTIVE;
     
    +	/* We keep a reference on the loopback net through the loopback NI */
     	if (net->net_lnd->lnd_type == LOLND) {
     		lnet_ni_addref(ni);
     		LASSERT(!the_lnet.ln_loni);
    @@ -1360,6 +1355,7 @@ static int
     lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     {
     	struct lnet_ni		*ni;
    +	struct lnet_net		*net_l = NULL;
     	struct list_head	local_ni_list;
     	int			rc;
     	int			ni_count = 0;
    @@ -1368,8 +1364,14 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     
     	INIT_LIST_HEAD(&local_ni_list);
     
    -	/* Make sure this new NI is unique. */
    -	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets)) {
    +	/*
    +	 * make sure that this net is unique. If it isn't then
    +	 * we are adding interfaces to an already existing network, and
    +	 * 'net' is just a convenient way to pass in the list.
    +	 * if it is unique we need to find the LND and load it if
    +	 * necessary.
    +	 */
    +	if (lnet_net_unique(net->net_id, &the_lnet.ln_nets, &net_l)) {
     		lnd_type = LNET_NETTYP(net->net_id);
     
     		LASSERT(libcfs_isknown_lnd(lnd_type));
    @@ -1400,23 +1402,41 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     		net->net_lnd = lnd;
     
     		mutex_unlock(&the_lnet.ln_lnd_mutex);
    -	} else {
    -		if (lnd_type == LOLND) {
    -			lnet_net_free(net);
    -			return 0;
    -		}
     
    -		CERROR("Net %s is not unique\n",
    -		       libcfs_net2str(net->net_id));
    -		rc = -EEXIST;
    -		goto failed0;
    +		net_l = net;
     	}
     
    +	/*
    +	 * net_l: if the network being added is unique then net_l
    +	 *        will point to that network
    +	 *        if the network being added is not unique then
    +	 *        net_l points to the existing network.
    +	 *
    +	 * When we enter the loop below, we'll pick NIs off he
    +	 * network beign added and start them up, then add them to
    +	 * a local ni list. Once we've successfully started all
    +	 * the NIs then we join the local NI list (of started up
    +	 * networks) with the net_l->net_ni_list, which should
    +	 * point to the correct network to add the new ni list to
    +	 *
    +	 * If any of the new NIs fail to start up, then we want to
    +	 * iterate through the local ni list, which should include
    +	 * any NIs which were successfully started up, and shut
    +	 * them down.
    +	 *
    +	 * After than we want to delete the network being added,
    +	 * to avoid a memory leak.
    +	 */
    +
     	while (!list_empty(&net->net_ni_added)) {
     		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
     				ni_netlist);
     		list_del_init(&ni->ni_netlist);
     
    +		/* adjust the pointer the parent network, just in case it
    +		 * the net is a duplicate */
    +		ni->ni_net = net_l;
    +
     		rc = lnet_startup_lndni(ni, tun);
     
     		if (rc < 0)
    @@ -1427,9 +1447,22 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     
     		ni_count++;
     	}
    +
     	lnet_net_lock(LNET_LOCK_EX);
    -	list_splice_tail(&local_ni_list, &net->net_ni_list);
    +	list_splice_tail(&local_ni_list, &net_l->net_ni_list);
     	lnet_net_unlock(LNET_LOCK_EX);
    +
    +	/* if the network is not unique then we don't want to keep
    +	 * it around after we're done. Free it. Otherwise add that
    +	 * net to the global the_lnet.ln_nets */
    +	if (net_l != net && net_l != NULL) {
    +		lnet_net_free(net);
    +	} else {
    +		lnet_net_lock(LNET_LOCK_EX);
    +		list_add_tail(&net->net_list, &the_lnet.ln_nets);
    +		lnet_net_unlock(LNET_LOCK_EX);
    +	}
    +
     	return ni_count;
     
     failed1:
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index f886dcfc6d6e..fcae50676422 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -79,13 +79,17 @@ lnet_issep(char c)
     }
     
     bool
    -lnet_net_unique(__u32 net, struct list_head *netlist)
    +lnet_net_unique(__u32 net_id, struct list_head *netlist,
    +		struct lnet_net **net)
     {
    -	struct lnet_net	 *net_l;
    +	struct lnet_net  *net_l;
     
     	list_for_each_entry(net_l, netlist, net_list) {
    -		if (net_l->net_id == net)
    +		if (net_l->net_id == net_id) {
    +			if (net != NULL)
    +				*net = net_l;
     			return false;
    +		}
     	}
     
     	return true;
    @@ -309,7 +313,7 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
     {
     	struct lnet_net		*net;
     
    -	if (!lnet_net_unique(net_id, net_list)) {
    +	if (!lnet_net_unique(net_id, net_list, NULL)) {
     		CERROR("Duplicate net %s. Ignore\n",
     		       libcfs_net2str(net_id));
     		return NULL;
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 28/34] lnet: add checks to ensure network interface names are unique.
  2018-09-07  0:49 ` [lustre-devel] [PATCH 28/34] lnet: add checks to ensure network interface names are unique NeilBrown
@ 2018-09-12  4:39   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:39 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-lnet.h   |    1 +
     drivers/staging/lustre/lnet/lnet/api-ni.c          |    8 ++++++
     drivers/staging/lustre/lnet/lnet/config.c          |   25 ++++++++++++++++++++
     3 files changed, 34 insertions(+)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    index 905213fc16c7..ef551b571935 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    @@ -632,6 +632,7 @@ int lnet_parse_routes(char *route_str, int *im_a_router);
     int lnet_parse_networks(struct list_head *nilist, char *networks);
     bool lnet_net_unique(__u32 net_id, struct list_head *nilist,
     		     struct lnet_net **net);
    +bool lnet_ni_unique_net(struct list_head *nilist, char *iface);
     
     int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
     struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 042ab0d9e318..3f6f5ead8a03 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1433,6 +1433,14 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     				ni_netlist);
     		list_del_init(&ni->ni_netlist);
     
    +		/* make sure that the the NI we're about to start
    +		 * up is actually unique. if it's not fail. */
    +		if (!lnet_ni_unique_net(&net_l->net_ni_list,
    +					ni->ni_interfaces[0])) {
    +			rc = -EINVAL;
    +			goto failed1;
    +		}
    +
     		/* adjust the pointer the parent network, just in case it
     		 * the net is a duplicate */
     		ni->ni_net = net_l;
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index fcae50676422..11d6dbc80507 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -95,6 +95,25 @@ lnet_net_unique(__u32 net_id, struct list_head *netlist,
     	return true;
     }
     
    +/* check that the NI is unique within the list of NIs already added to
    + * a network */
    +bool
    +lnet_ni_unique_net(struct list_head *nilist, char *iface)
    +{
    +	struct list_head *tmp;
    +	struct lnet_ni *ni;
    +
    +	list_for_each(tmp, nilist) {
    +		ni = list_entry(tmp, struct lnet_ni, ni_netlist);
    +
    +		if (ni->ni_interfaces[0] != NULL &&
    +		    strncmp(ni->ni_interfaces[0], iface, strlen(iface)) == 0)
    +			return false;
    +	}
    +
    +	return true;
    +}
    +
     static bool
     in_array(__u32 *array, __u32 size, __u32 value)
     {
    @@ -352,6 +371,12 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
     	int			rc;
     	int			i;
     
    +	if (iface != NULL)
    +		/* make sure that this NI is unique in the net it's
    +		 * being added to */
    +		if (!lnet_ni_unique_net(&net->net_ni_added, iface))
    +			return NULL;
    +
     	ni = kzalloc(sizeof(*ni), GFP_KERNEL);
     	if (ni == NULL) {
     		CERROR("Out of memory creating network interface %s%s\n",
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 29/34] lnet: track tunables in lnet_startup_lndnet()
  2018-09-07  0:49 ` [lustre-devel] [PATCH 29/34] lnet: track tunables in lnet_startup_lndnet() NeilBrown
@ 2018-09-12  4:47   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:47 UTC (permalink / raw)
  To: lustre-devel

Amir: You need to comment on this one.

It looks to me that it starts off by making a copy of three tunables to see if the user has set them specific to the NET.  Then, if the LND changes them, it changes them back to what the user set.  What I don't get: why would the LND change the values if they are not -1?  

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    Not really sure what this is yet.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   24 ++++++++++++++++++++++++
     1 file changed, 24 insertions(+)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 3f6f5ead8a03..f4efb48c4cf3 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1361,6 +1361,12 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	int			ni_count = 0;
     	__u32			lnd_type;
     	struct lnet_lnd		*lnd;
    +	int			peer_timeout =
    +		net->net_tunables.lct_peer_timeout;
    +	int			maxtxcredits =
    +		net->net_tunables.lct_max_tx_credits;
    +	int			peerrtrcredits =
    +		net->net_tunables.lct_peer_rtr_credits;
     
     	INIT_LIST_HEAD(&local_ni_list);
     
    @@ -1447,6 +1453,9 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     
     		rc = lnet_startup_lndni(ni, tun);
     
    +		LASSERT(ni->ni_net->net_tunables.lct_peer_timeout <= 0 ||
    +			ni->ni_net->net_lnd->lnd_query != NULL);
    +
     		if (rc < 0)
     			goto failed1;
     
    @@ -1464,8 +1473,23 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	 * it around after we're done. Free it. Otherwise add that
     	 * net to the global the_lnet.ln_nets */
     	if (net_l != net && net_l != NULL) {
    +		/*
    +		 * TODO - note. currently the tunables can not be updated
    +		 * once added
    +		 */
     		lnet_net_free(net);
     	} else {
    +		/*
    +		 * restore tunables after it has been overwitten by the
    +		 * lnd
    +		 */
    +		if (peer_timeout != -1)
    +			net->net_tunables.lct_peer_timeout = peer_timeout;
    +		if (maxtxcredits != -1)
    +			net->net_tunables.lct_max_tx_credits = maxtxcredits;
    +		if (peerrtrcredits != -1)
    +			net->net_tunables.lct_peer_rtr_credits = peerrtrcredits;
    +
     		lnet_net_lock(LNET_LOCK_EX);
     		list_add_tail(&net->net_list, &the_lnet.ln_nets);
     		lnet_net_unlock(LNET_LOCK_EX);
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 30/34] lnet: fix typo
  2018-09-07  0:49 ` [lustre-devel] [PATCH 30/34] lnet: fix typo NeilBrown
@ 2018-09-12  4:47   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:47 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    to -> too
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |    2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index f4efb48c4cf3..cf0ffb8ac84b 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -1868,7 +1868,7 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_data *config)
     	if (config->cfg_hdr.ioc_len > min_size)
     		tunable_size = config->cfg_hdr.ioc_len - min_size;
     
    -	/* Don't copy to much data to user space */
    +	/* Don't copy too much data to user space */
     	min_size = min(tunable_size, sizeof(ni->ni_lnd_tunables));
     	lnd_cfg = (struct lnet_ioctl_config_lnd_tunables *)net_config->cfg_bulk;
     
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 31/34] lnet: lnet_dyn_add_ni: fix ping_info count
  2018-09-07  0:49 ` [lustre-devel] [PATCH 31/34] lnet: lnet_dyn_add_ni: fix ping_info count NeilBrown
@ 2018-09-12  4:48   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:48 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    Use the correct count of interfaces when calling
       lnet_ping_info_setup()
    in lnet_dyn_add_ni()
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   27 ++++++++++++++++++++++++++-
     1 file changed, 26 insertions(+), 1 deletion(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index cf0ffb8ac84b..2ce0a7212dc2 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -871,6 +871,18 @@ lnet_ping_info_create(int num_ni)
     	return ping_info;
     }
     
    +static inline int
    +lnet_get_net_ni_count_locked(struct lnet_net *net)
    +{
    +	struct lnet_ni	*ni;
    +	int		count = 0;
    +
    +	list_for_each_entry(ni, &net->net_ni_list, ni_netlist)
    +		count++;
    +
    +	return count;
    +}
    +
     static inline int
     lnet_get_ni_count(void)
     {
    @@ -1977,6 +1989,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     	struct list_head net_head;
     	struct lnet_remotenet *rnet;
     	int rc;
    +	int			net_ni_count;
     	int			num_acceptor_nets;
     	__u32			net_type;
     	struct lnet_ioctl_config_lnd_tunables *lnd_tunables = NULL;
    @@ -2014,7 +2027,19 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     		goto failed0;
     	}
     
    -	rc = lnet_ping_info_setup(&pinfo, &md_handle, 1 + lnet_get_ni_count(),
    +	/*
    +	 * make sure you calculate the correct number of slots in the ping
    +	 * info. Since the ping info is a flattened list of all the NIs,
    +	 * we should allocate enough slots to accomodate the number of NIs
    +	 * which will be added.
    +	 *
    +	 * We can use lnet_get_net_ni_count_locked() since the net is not
    +	 * on a public list yet, so locking is not a problem
    +	 */
    +	net_ni_count = lnet_get_net_ni_count_locked(net);
    +
    +	rc = lnet_ping_info_setup(&pinfo, &md_handle,
    +				  net_ni_count + lnet_get_ni_count(),
     				  false);
     	if (rc)
     		goto failed0;
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 32/34] lnet: lnet_dyn_del_ni: fix ping_info count
  2018-09-07  0:49 ` [lustre-devel] [PATCH 32/34] lnet: lnet_dyn_del_ni: fix ping_info count NeilBrown
@ 2018-09-12  4:49   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:49 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    - use correct interface count for lnet_ping_info_setup().
    - also rename 'net' to 'net_id' so the name 'net' is free
      to identify the lnet_net.
    
    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     drivers/staging/lustre/lnet/lnet/api-ni.c |   35 +++++++++++++++++------------
     1 file changed, 20 insertions(+), 15 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index 2ce0a7212dc2..ff5149da2d79 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -2109,40 +2109,45 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     }
     
     int
    -lnet_dyn_del_ni(__u32 net)
    +lnet_dyn_del_ni(__u32 net_id)
     {
    -	struct lnet_ni *ni;
    +	struct lnet_net *net;
     	struct lnet_ping_info *pinfo;
     	struct lnet_handle_md md_handle;
     	int rc;
    +	int		  net_ni_count;
     
     	/* don't allow userspace to shutdown the LOLND */
    -	if (LNET_NETTYP(net) == LOLND)
    +	if (LNET_NETTYP(net_id) == LOLND)
     		return -EINVAL;
     
     	mutex_lock(&the_lnet.ln_api_mutex);
    +
    +	lnet_net_lock(0);
    +
    +	net = lnet_get_net_locked(net_id);
    +	if (net == NULL) {
    +		rc = -EINVAL;
    +		goto out;
    +	}
    +
    +	net_ni_count = lnet_get_net_ni_count_locked(net);
    +
    +	lnet_net_unlock(0);
    +
     	/* create and link a new ping info, before removing the old one */
     	rc = lnet_ping_info_setup(&pinfo, &md_handle,
    -				  lnet_get_ni_count() - 1, false);
    +				  lnet_get_ni_count() - net_ni_count, false);
     	if (rc)
     		goto out;
     
    -	ni = lnet_net2ni(net);
    -	if (!ni) {
    -		rc = -EINVAL;
    -		goto failed;
    -	}
    -
    -	lnet_shutdown_lndni(ni);
    +	lnet_shutdown_lndnet(net);
     
     	if (!lnet_count_acceptor_nets())
     		lnet_acceptor_stop();
     
     	lnet_ping_target_update(pinfo, md_handle);
    -	goto out;
    -failed:
    -	lnet_ping_md_unlink(pinfo, &md_handle);
    -	lnet_ping_info_free(pinfo);
    +
     out:
     	mutex_unlock(&the_lnet.ln_api_mutex);
     
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 33/34] Completely re-write lnet_parse_networks().
  2018-09-07  0:49 ` [lustre-devel] [PATCH 33/34] Completely re-write lnet_parse_networks() NeilBrown
@ 2018-09-12  4:54   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:54 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:55 PM, "NeilBrown" <neilb@suse.com> wrote:

    From: Amir Shehata <amir.shehata@intel.com>
    
    Was:
    
    LU-7734 lnet: Multi-Rail local NI split
    
    This patch allows the configuration of multiple NIs under one Net.
    It is now possible to have multiple NIDs on the same network:
       Ex: <ip1>@tcp, <ip2>@tcp.
    This can be configured using the following syntax:
       Ex: tcp(eth0, eth1)
    
    The data structures for the example above can be visualized
    as follows
    
                   NET(tcp)
                    |
            -----------------
            |               |
          NI(eth0)        NI(eth1)
    
    For more details refer to the Mult-Rail Requirements and HLD
    documents
    
    Signed-off-by: Amir Shehata <amir.shehata@intel.com>
    Change-Id: Id7c73b9b811a3082b61e53b9e9f95743188cbd51
    Reviewed-on: http://review.whamcloud.com/18274
    Tested-by: Jenkins
    Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
    Tested-by: Maloo <hpdd-maloo@intel.com>
    Reviewed-by: Olaf Weber <olaf@sgi.com>
    ---
     drivers/staging/lustre/lnet/lnet/config.c |  341 ++++++++++++++++++-----------
     1 file changed, 217 insertions(+), 124 deletions(-)
    
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index 11d6dbc80507..0571fa6a7249 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -48,8 +48,11 @@ static int lnet_tbnob;			/* track text buf allocation */
     #define LNET_MAX_TEXTBUF_NOB     (64 << 10)	/* bound allocation */
     #define LNET_SINGLE_TEXTBUF_NOB  (4 << 10)
     
    +#define SPACESTR " \t\v\r\n"
    +#define DELIMITERS ":()[]"
    +
     static void
    -lnet_syntax(char *name, char *str, int offset, int width)
    +lnet_syntax(const char *name, const char *str, int offset, int width)
     {
     	static char dots[LNET_SINGLE_TEXTBUF_NOB];
     	static char dashes[LNET_SINGLE_TEXTBUF_NOB];
    @@ -363,6 +366,42 @@ lnet_net_alloc(__u32 net_id, struct list_head *net_list)
     	return net;
     }
     
    +static int
    +lnet_ni_add_interface(struct lnet_ni *ni, char *iface)
    +{
    +	int niface = 0;
    +
    +	if (ni == NULL)
    +		return -ENOMEM;
    +
    +	/* Allocate a separate piece of memory and copy
    +	 * into it the string, so we don't have
    +	 * a depencency on the tokens string.  This way we
    +	 * can free the tokens@the end of the function.
    +	 * The newly allocated ni_interfaces[] can be
    +	 * freed when freeing the NI */
    +	while (niface < LNET_MAX_INTERFACES &&
    +	       ni->ni_interfaces[niface] != NULL)
    +		niface++;
    +
    +	if (niface >= LNET_MAX_INTERFACES) {
    +		LCONSOLE_ERROR_MSG(0x115, "Too many interfaces "
    +				   "for net %s\n",
    +				   libcfs_net2str(LNET_NIDNET(ni->ni_nid)));
    +		return -EINVAL;
    +	}
    +
    +	ni->ni_interfaces[niface] = kstrdup(iface, GFP_KERNEL);
    +
    +	if (ni->ni_interfaces[niface] == NULL) {
    +		CERROR("Can't allocate net interface name\n");
    +		return -ENOMEM;
    +	}
    +
    +	return 0;
    +}
    +
    +/* allocate and add to the provided network */
     struct lnet_ni *
     lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
     {
    @@ -439,24 +478,33 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
     		goto failed;
     	list_add_tail(&ni->ni_netlist, &net->net_ni_added);
     
    +	/* if an interface name is provided then make sure to add in that
    +	 * interface name in NI */
    +	if (iface != NULL)
    +		if (lnet_ni_add_interface(ni, iface) != 0)
    +			goto failed;
    +
     	return ni;
     failed:
     	lnet_ni_free(ni);
     	return NULL;
     }
     
    +/*
    + * Parse the networks string and create the matching set of NIs on the
    + * nilist.
    + */
     int
     lnet_parse_networks(struct list_head *netlist, char *networks)
     {
    -	struct cfs_expr_list *el = NULL;
    +	struct cfs_expr_list *net_el = NULL;
    +	struct cfs_expr_list *ni_el = NULL;
     	char *tokens;
     	char *str;
    -	char *tmp;
     	struct lnet_net *net;
     	struct lnet_ni *ni = NULL;
     	__u32 net_id;
     	int nnets = 0;
    -	struct list_head *temp_node;
     
     	if (!networks) {
     		CERROR("networks string is undefined\n");
    @@ -476,84 +524,108 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
     		return -ENOMEM;
     	}
     
    -	tmp = tokens;
     	str = tokens;
     
    -	while (str && *str) {
    -		char *comma = strchr(str, ',');
    -		char *bracket = strchr(str, '(');
    -		char *square = strchr(str, '[');
    -		char *iface;
    -		int niface;
    +	/*
    +	 * Main parser loop.
    +	 *
    +	 * NB we don't check interface conflicts here; it's the LNDs
    +	 * responsibility (if it cares at all)
    +	 */
    +	do {
    +		char *nistr;
    +		char *elstr;
    +		char *name;
     		int rc;
     
     		/*
    -		 * NB we don't check interface conflicts here; it's the LNDs
    -		 * responsibility (if it cares at all)
    +		 * Parse a network string into its components.
    +		 *
    +		 * <name>{"("...")"}{"["<el>"]"}
     		 */
    -		if (square && (!comma || square < comma)) {
    -			/*
    -			 * i.e: o2ib0(ib0)[1,2], number between square
    -			 * brackets are CPTs this NI needs to be bond
    -			 */
    -			if (bracket && bracket > square) {
    -				tmp = square;
    +
    +		/* Network name (mandatory)
    +		 */
    +		while (isspace(*str))
    +			*str++ = '\0';
    +		if (!*str)
    +			break;
    +		name = str;
    +		str += strcspn(str, SPACESTR ":()[],");
    +		while (isspace(*str))
    +			*str++ = '\0';
    +
    +		/* Interface list (optional) */
    +		if (*str == '(') {
    +			*str++ = '\0';
    +			nistr = str;
    +			str += strcspn(str, ")");
    +			if (*str != ')') {
    +				str = nistr;
     				goto failed_syntax;
     			}
    +			do {
    +				*str++ = '\0';
    +			} while (isspace(*str));
    +		} else {
    +			nistr = NULL;
    +		}
     
    -			tmp = strchr(square, ']');
    -			if (!tmp) {
    -				tmp = square;
    +		/* CPT expression (optional) */
    +		if (*str == '[') {
    +			elstr = str;
    +			str += strcspn(str, "]");
    +			if (*str != ']') {
    +				str = elstr;
     				goto failed_syntax;
     			}
    -
    -			rc = cfs_expr_list_parse(square, tmp - square + 1,
    -						 0, LNET_CPT_NUMBER - 1, &el);
    +			rc = cfs_expr_list_parse(elstr, str - elstr + 1,
    +						0, LNET_CPT_NUMBER - 1,
    +						&net_el);
     			if (rc) {
    -				tmp = square;
    +				str = elstr;
     				goto failed_syntax;
     			}
    -
    -			while (square <= tmp)
    -				*square++ = ' ';
    +			*elstr = '\0';
    +			do {
    +				*str++ = '\0';
    +			} while (isspace(*str));
     		}
     
    -		if (!bracket || (comma && comma < bracket)) {
    -			/* no interface list specified */
    +		/* Bad delimiters */
    +		if (*str && (strchr(DELIMITERS, *str) != NULL))
    +			goto failed_syntax;
     
    -			if (comma)
    -				*comma++ = 0;
    -			net_id = libcfs_str2net(strim(str));
    +		/* go to the next net if it exits */
    +		str += strcspn(str, ",");
    +		if (*str == ',')
    +			*str++ = '\0';
     
    -			if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
    -				LCONSOLE_ERROR_MSG(0x113,
    -						   "Unrecognised network type\n");
    -				tmp = str;
    -				goto failed_syntax;
    -			}
    -
    -			if (LNET_NETTYP(net_id) != LOLND) { /* LO is implicit */
    -				net = lnet_net_alloc(net_id, netlist);
    -				if (!net ||
    -				    !lnet_ni_alloc(net, el, NULL))
    -					goto failed;
    -			}
    +		/*
    +		 * At this point the name is properly terminated.
    +		 */
    +		net_id = libcfs_str2net(name);
    +		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
    +			LCONSOLE_ERROR_MSG(0x113,
    +					"Unrecognised network type\n");
    +			str = name;
    +			goto failed_syntax;
    +		}
     
    -			if (el) {
    -				cfs_expr_list_free(el);
    -				el = NULL;
    +		if (LNET_NETTYP(net_id) == LOLND) {
    +			/* Loopback is implicit, and there can be only one. */
    +			if (net_el) {
    +				cfs_expr_list_free(net_el);
    +				net_el = NULL;
     			}
    -
    -			str = comma;
    +			/* Should we error out instead? */
     			continue;
     		}
     
    -		*bracket = 0;
    -		net_id = libcfs_str2net(strim(str));
    -		if (net_id == LNET_NIDNET(LNET_NID_ANY)) {
    -			tmp = str;
    -			goto failed_syntax;
    -		}
    +		/*
    +		 * All network paramaters are now known.
    +		 */
    +		nnets++;
     
     		/* always allocate a net, since we will eventually add an
     		 * interface to it, or we will fail, in which case we'll
    @@ -562,88 +634,107 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
     		if (IS_ERR_OR_NULL(net))
     			goto failed;
     
    -		ni = lnet_ni_alloc(net, el, NULL);
    -		if (IS_ERR_OR_NULL(ni))
    -			goto failed;
    -
    -		if (el) {
    -			cfs_expr_list_free(el);
    -			el = NULL;
    -		}
    -
    -		niface = 0;
    -		iface = bracket + 1;
    +		if (!nistr) {
    +			/*
    +			 * No interface list was specified, allocate a
    +			 * ni using the defaults.
    +			 */
    +			ni = lnet_ni_alloc(net, net_el, NULL);
    +			if (IS_ERR_OR_NULL(ni))
    +				goto failed;
     
    -		bracket = strchr(iface, ')');
    -		if (!bracket) {
    -			tmp = iface;
    -			goto failed_syntax;
    +			if (net_el) {
    +				cfs_expr_list_free(net_el);
    +				net_el = NULL;
    +			}
    +			continue;
     		}
     
    -		*bracket = 0;
     		do {
    -			comma = strchr(iface, ',');
    -			if (comma)
    -				*comma++ = 0;
    -
    -			iface = strim(iface);
    -			if (!*iface) {
    -				tmp = iface;
    -				goto failed_syntax;
    +			elstr = NULL;
    +
    +			/* Interface name (mandatory) */
    +			while (isspace(*nistr))
    +				*nistr++ = '\0';
    +			name = nistr;
    +			nistr += strcspn(nistr, SPACESTR "[],");
    +			while (isspace(*nistr))
    +				*nistr++ = '\0';
    +
    +			/* CPT expression (optional) */
    +			if (*nistr == '[') {
    +				elstr = nistr;
    +				nistr += strcspn(nistr, "]");
    +				if (*nistr != ']') {
    +					str = elstr;
    +					goto failed_syntax;
    +				}
    +				rc = cfs_expr_list_parse(elstr,
    +							nistr - elstr + 1,
    +							0, LNET_CPT_NUMBER - 1,
    +							&ni_el);
    +				if (rc != 0) {
    +					str = elstr;
    +					goto failed_syntax;
    +				}
    +				*elstr = '\0';
    +				do {
    +					*nistr++ = '\0';
    +				} while (isspace(*nistr));
    +			} else {
    +				ni_el = net_el;
     			}
     
    -			if (niface == LNET_MAX_INTERFACES) {
    -				LCONSOLE_ERROR_MSG(0x115,
    -						   "Too many interfaces for net %s\n",
    -						   libcfs_net2str(net_id));
    -				goto failed;
    +			/*
    +			 * End of single interface specificaton,
    +			 * advance to the start of the next one, if
    +			 * any.
    +			 */
    +			if (*nistr == ',') {
    +				do {
    +					*nistr++ = '\0';
    +				} while (isspace(*nistr));
    +				if (!*nistr) {
    +					str = nistr;
    +					goto failed_syntax;
    +				}
    +			} else if (*nistr) {
    +				str = nistr;
    +				goto failed_syntax;
     			}
     
     			/*
    -			 * Allocate a separate piece of memory and copy
    -			 * into it the string, so we don't have
    -			 * a depencency on the tokens string.  This way we
    -			 * can free the tokens@the end of the function.
    -			 * The newly allocated ni_interfaces[] can be
    -			 * freed when freeing the NI
    +			 * At this point the name
    +			 is properly terminated.
     			 */
    -			ni->ni_interfaces[niface] = kstrdup(iface, GFP_KERNEL);
    -			if (!ni->ni_interfaces[niface]) {
    -				CERROR("Can't allocate net interface name\n");
    -				goto failed;
    -			}
    -			niface++;
    -			iface = comma;
    -		} while (iface);
    -
    -		str = bracket + 1;
    -		comma = strchr(bracket + 1, ',');
    -		if (comma) {
    -			*comma = 0;
    -			str = strim(str);
    -			if (*str) {
    -				tmp = str;
    +			if (!*name) {
    +				str = name;
     				goto failed_syntax;
     			}
    -			str = comma + 1;
    -			continue;
    -		}
     
    -		str = strim(str);
    -		if (*str) {
    -			tmp = str;
    -			goto failed_syntax;
    -		}
    -	}
    +			ni = lnet_ni_alloc(net, ni_el, name);
    +			if (IS_ERR_OR_NULL(ni))
    +				goto failed;
     
    -	list_for_each(temp_node, netlist)
    -		nnets++;
    +			if (ni_el) {
    +				if (ni_el != net_el) {
    +					cfs_expr_list_free(ni_el);
    +					ni_el = NULL;
    +				}
    +			}
    +		} while (*nistr);
    +
    +		if (net_el) {
    +			cfs_expr_list_free(net_el);
    +			net_el = NULL;
    +		}
    +	} while (*str);
     
     	kfree(tokens);
     	return nnets;
     
      failed_syntax:
    -	lnet_syntax("networks", networks, (int)(tmp - tokens), strlen(tmp));
    +	lnet_syntax("networks", networks, (int)(str - tokens), strlen(str));
      failed:
     	/* free the net list and all the nis on each net */
     	while (!list_empty(netlist)) {
    @@ -653,8 +744,10 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
     		lnet_net_free(net);
     	}
     
    -	if (el)
    -		cfs_expr_list_free(el);
    +	if (ni_el && ni_el != net_el)
    +		cfs_expr_list_free(ni_el);
    +	if (net_el)
    +		cfs_expr_list_free(net_el);
     
     	kfree(tokens);
     
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 34/34] lnet: introduce use_tcp_bonding mod param
  2018-09-07  0:49 ` [lustre-devel] [PATCH 34/34] lnet: introduce use_tcp_bonding mod param NeilBrown
@ 2018-09-12  4:54   ` Doug Oucharek
  0 siblings, 0 replies; 98+ messages in thread
From: Doug Oucharek @ 2018-09-12  4:54 UTC (permalink / raw)
  To: lustre-devel

Reviewed-by: Doug Oucharek <dougso@me.com>

Doug

?On 9/6/18, 5:56 PM, "NeilBrown" <neilb@suse.com> wrote:

    This is part of
        8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
           LU-7734 lnet: Multi-Rail local NI split
    
    Signed-off-by: NeilBrown <neilb@suse.com>
    ---
     .../staging/lustre/include/linux/lnet/lib-lnet.h   |    3 +
     drivers/staging/lustre/lnet/lnet/api-ni.c          |   22 ++++++++-
     drivers/staging/lustre/lnet/lnet/config.c          |   50 ++++++++++++++++----
     3 files changed, 61 insertions(+), 14 deletions(-)
    
    diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    index ef551b571935..5ee770cd7a5f 100644
    --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
    @@ -629,7 +629,8 @@ void lnet_swap_pinginfo(struct lnet_ping_info *info);
     
     int lnet_parse_ip2nets(char **networksp, char *ip2nets);
     int lnet_parse_routes(char *route_str, int *im_a_router);
    -int lnet_parse_networks(struct list_head *nilist, char *networks);
    +int lnet_parse_networks(struct list_head *nilist, char *networks,
    +			bool use_tcp_bonding);
     bool lnet_net_unique(__u32 net_id, struct list_head *nilist,
     		     struct lnet_net **net);
     bool lnet_ni_unique_net(struct list_head *nilist, char *iface);
    diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
    index ff5149da2d79..8ff386992c99 100644
    --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
    +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
    @@ -59,6 +59,11 @@ static int rnet_htable_size = LNET_REMOTE_NETS_HASH_DEFAULT;
     module_param(rnet_htable_size, int, 0444);
     MODULE_PARM_DESC(rnet_htable_size, "size of remote network hash table");
     
    +static int use_tcp_bonding = false;
    +module_param(use_tcp_bonding, int, 0444);
    +MODULE_PARM_DESC(use_tcp_bonding,
    +		 "Set to 1 to use socklnd bonding. 0 to use Multi-Rail");
    +
     static int lnet_ping(struct lnet_process_id id, signed long timeout,
     		     struct lnet_process_id __user *ids, int n_ids);
     
    @@ -1446,6 +1451,18 @@ lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun)
     	 * to avoid a memory leak.
     	 */
     
    +	/*
    +	 * When a network uses TCP bonding then all its interfaces
    +	 * must be specified when the network is first defined: the
    +	 * TCP bonding code doesn't allow for interfaces to be added
    +	 * or removed.
    +	 */
    +	if (net_l != net && net_l != NULL && use_tcp_bonding &&
    +	    LNET_NETTYP(net_l->net_id) == SOCKLND) {
    +		rc = -EINVAL;
    +		goto failed0;
    +	}
    +
     	while (!list_empty(&net->net_ni_added)) {
     		ni = list_entry(net->net_ni_added.next, struct lnet_ni,
     				ni_netlist);
    @@ -1702,7 +1719,8 @@ LNetNIInit(lnet_pid_t requested_pid)
     	 * routes if it has been loaded
     	 */
     	if (!the_lnet.ln_nis_from_mod_params) {
    -		rc = lnet_parse_networks(&net_head, lnet_get_networks());
    +		rc = lnet_parse_networks(&net_head, lnet_get_networks(),
    +					 use_tcp_bonding);
     		if (rc < 0)
     			goto err_empty_list;
     	}
    @@ -2000,7 +2018,7 @@ lnet_dyn_add_ni(lnet_pid_t requested_pid, struct lnet_ioctl_config_data *conf)
     		lnd_tunables = (struct lnet_ioctl_config_lnd_tunables *)conf->cfg_bulk;
     
     	/* Create a net/ni structures for the network string */
    -	rc = lnet_parse_networks(&net_head, nets);
    +	rc = lnet_parse_networks(&net_head, nets, use_tcp_bonding);
     	if (rc <= 0)
     		return !rc ? -EINVAL : rc;
     
    diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
    index 0571fa6a7249..abfc5d8dc219 100644
    --- a/drivers/staging/lustre/lnet/lnet/config.c
    +++ b/drivers/staging/lustre/lnet/lnet/config.c
    @@ -117,6 +117,21 @@ lnet_ni_unique_net(struct list_head *nilist, char *iface)
     	return true;
     }
     
    +/* check that the NI is unique to the interfaces with in the same NI.
    + * This is only a consideration if use_tcp_bonding is set */
    +static bool
    +lnet_ni_unique_ni(char *iface_list[LNET_MAX_INTERFACES], char *iface)
    +{
    +	int i;
    +	for (i = 0; i < LNET_MAX_INTERFACES; i++) {
    +		if (iface_list[i] != NULL &&
    +		    strncmp(iface_list[i], iface, strlen(iface)) == 0)
    +			return false;
    +	}
    +
    +	return true;
    +}
    +
     static bool
     in_array(__u32 *array, __u32 size, __u32 value)
     {
    @@ -374,6 +389,9 @@ lnet_ni_add_interface(struct lnet_ni *ni, char *iface)
     	if (ni == NULL)
     		return -ENOMEM;
     
    +	if (!lnet_ni_unique_ni(ni->ni_interfaces, iface))
    +		return -EINVAL;
    +
     	/* Allocate a separate piece of memory and copy
     	 * into it the string, so we don't have
     	 * a depencency on the tokens string.  This way we
    @@ -495,7 +513,8 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
      * nilist.
      */
     int
    -lnet_parse_networks(struct list_head *netlist, char *networks)
    +lnet_parse_networks(struct list_head *netlist, char *networks,
    +		    bool use_tcp_bonding)
     {
     	struct cfs_expr_list *net_el = NULL;
     	struct cfs_expr_list *ni_el = NULL;
    @@ -634,7 +653,8 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
     		if (IS_ERR_OR_NULL(net))
     			goto failed;
     
    -		if (!nistr) {
    +		if (!nistr ||
    +		    (use_tcp_bonding && LNET_NETTYP(net_id) == SOCKLND)) {
     			/*
     			 * No interface list was specified, allocate a
     			 * ni using the defaults.
    @@ -643,11 +663,13 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
     			if (IS_ERR_OR_NULL(ni))
     				goto failed;
     
    -			if (net_el) {
    -				cfs_expr_list_free(net_el);
    -				net_el = NULL;
    +			if (!nistr) {
    +				if (net_el) {
    +					cfs_expr_list_free(net_el);
    +					net_el = NULL;
    +				}
    +				continue;
     			}
    -			continue;
     		}
     
     		do {
    @@ -704,17 +726,23 @@ lnet_parse_networks(struct list_head *netlist, char *networks)
     			}
     
     			/*
    -			 * At this point the name
    -			 is properly terminated.
    +			 * At this point the name is properly terminated.
     			 */
     			if (!*name) {
     				str = name;
     				goto failed_syntax;
     			}
     
    -			ni = lnet_ni_alloc(net, ni_el, name);
    -			if (IS_ERR_OR_NULL(ni))
    -				goto failed;
    +			if (use_tcp_bonding &&
    +			    LNET_NETTYP(net->net_id) == SOCKLND) {
    +				rc = lnet_ni_add_interface(ni, name);
    +				if (rc != 0)
    +					goto failed;
    +			} else {
    +				ni = lnet_ni_alloc(net, ni_el, name);
    +				if (IS_ERR_OR_NULL(ni))
    +					goto failed;
    +			}
     
     			if (ni_el) {
     				if (ni_el != net_el) {
    
    
    

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list
  2018-09-12  4:07   ` Doug Oucharek
@ 2018-09-12  5:48     ` NeilBrown
  2018-09-13 19:33       ` Amir Shehata
  0 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-12  5:48 UTC (permalink / raw)
  To: lustre-devel

On Wed, Sep 12 2018, Doug Oucharek wrote:

> I'm assuming that a future patch will be chaining the NI structure on to the NET structure it belongs to.  This patch is just not chaining the NIs on a global NIS list anymore.  As such, ni_cptlist is being "repurposed".

The NI is already chained onto the NET through
 lnet_net.net_ni_list and lnet_ni.ni_netlist

ni_cptlist is not used even in current master.
It is never added to any list, but lnet_ni_unlink_locked() does remove
it from a list.  Is that code wrong (should be checking ni_netlist), or
is it cruft that should be removed?

Thanks,
NeilBrown

>
> Reviewed-by: Doug Oucharek <dougso@me.com>
>
> Doug
>
> ?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:
>
>     This isn't used any more.
>     The new comment is odd - this is no net_ni_cpt !!
>     The ni_cptlist linkage is no longer used - should it go too?
>     
>     This is part of
>         8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>            LU-7734 lnet: Multi-Rail local NI split
>     
>     Signed-off-by: NeilBrown <neilb@suse.com>
>     ---
>      .../staging/lustre/include/linux/lnet/lib-types.h  |    4 +---
>      drivers/staging/lustre/lnet/lnet/api-ni.c          |    7 -------
>      2 files changed, 1 insertion(+), 10 deletions(-)
>     
>     diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     index 6c34ecf22021..dc15fa75a9d2 100644
>     --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>     @@ -305,7 +305,7 @@ struct lnet_net {
>      struct lnet_ni {
>      	/* chain on the lnet_net structure */
>      	struct list_head	  ni_netlist;
>     -	/* chain on ln_nis_cpt */
>     +	/* chain on net_ni_cpt */
>      	struct list_head	ni_cptlist;
>      
>      	spinlock_t		ni_lock;
>     @@ -671,8 +671,6 @@ struct lnet {
>      
>      	/* LND instances */
>      	struct list_head		ln_nets;
>     -	/* NIs bond on specific CPT(s) */
>     -	struct list_head		ln_nis_cpt;
>      	/* the loopback NI */
>      	struct lnet_ni			*ln_loni;
>      	/* network zombie list */
>     diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
>     index 546d5101360f..960f235df5e7 100644
>     --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
>     +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
>     @@ -538,7 +538,6 @@ lnet_prepare(lnet_pid_t requested_pid)
>      
>      	INIT_LIST_HEAD(&the_lnet.ln_test_peers);
>      	INIT_LIST_HEAD(&the_lnet.ln_nets);
>     -	INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
>      	INIT_LIST_HEAD(&the_lnet.ln_routers);
>      	INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
>      	INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
>     @@ -616,7 +615,6 @@ lnet_unprepare(void)
>      	LASSERT(!the_lnet.ln_refcount);
>      	LASSERT(list_empty(&the_lnet.ln_test_peers));
>      	LASSERT(list_empty(&the_lnet.ln_nets));
>     -	LASSERT(list_empty(&the_lnet.ln_nis_cpt));
>      
>      	lnet_portals_destroy();
>      
>     @@ -1294,11 +1292,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct lnet_lnd_tunables *tun)
>      	/* refcount for ln_nis */
>      	lnet_ni_addref_locked(ni, 0);
>      	list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
>     -	if (ni->ni_cpts) {
>     -		lnet_ni_addref_locked(ni, 0);
>     -		list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
>     -	}
>     -
>      	lnet_net_unlock(LNET_LOCK_EX);
>      
>      	ni->ni_state = LNET_NI_STATE_ACTIVE;
>     
>     
>     
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/cdd72e86/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown.
  2018-09-12  4:27   ` Doug Oucharek
@ 2018-09-12  5:54     ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-12  5:54 UTC (permalink / raw)
  To: lustre-devel

On Wed, Sep 12 2018, Doug Oucharek wrote:

> It seems the selection code being affected by this patch later gets moved to its own routine called lnet_select_pathway().  The logic is completely re-written so it may not be important to fix the use of locks here.  However,  it is not good that the lock is not being held while checking for shutdown.

Thanks... I might drop the patch here, and probably fold it into which
ever subsequent patch changes this code can gets rid of lnet_send().

I'll try that anyway.

NeilBrown


>
> Reviewed-by: Doug Oucharek <dougso@me.com>
>
> Doug
>
> ?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:
>
>     ln_shutdown returns -ESHUTDOWN if ln_shutdown
>     is already set.
>     The lock is always taken to set ln_shutdown, but apparently
>     we don't need to hold the lock for this test.
>     I guess if it is set immediately after the test, and before
>     we take the lock then.... can anything bad happen?
>     
>     This is part of
>         8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>            LU-7734 lnet: Multi-Rail local NI split
>     
>     Signed-off-by: NeilBrown <neilb@suse.com>
>     ---
>      drivers/staging/lustre/lnet/lnet/lib-move.c |    7 ++-----
>      1 file changed, 2 insertions(+), 5 deletions(-)
>     
>     diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c
>     index 60f34c4b85d3..46e593fbb44f 100644
>     --- a/drivers/staging/lustre/lnet/lnet/lib-move.c
>     +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c
>     @@ -1099,12 +1099,9 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid)
>      	cpt = lnet_cpt_of_nid(rtr_nid == LNET_NID_ANY ? dst_nid : rtr_nid,
>      			      local_ni);
>       again:
>     -	lnet_net_lock(cpt);
>     -
>     -	if (the_lnet.ln_shutdown) {
>     -		lnet_net_unlock(cpt);
>     +	if (the_lnet.ln_shutdown)
>      		return -ESHUTDOWN;
>     -	}
>     +	lnet_net_lock(cpt);
>      
>      	if (src_nid == LNET_NID_ANY) {
>      		src_ni = NULL;
>     
>     
>     
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/7108a7a7/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list
  2018-09-07  0:49 ` [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list NeilBrown
  2018-09-12  4:07   ` Doug Oucharek
@ 2018-09-12 16:29   ` Amir Shehata
  1 sibling, 0 replies; 98+ messages in thread
From: Amir Shehata @ 2018-09-12 16:29 UTC (permalink / raw)
  To: lustre-devel

This area was re-implemented. There is no need for ni_cptlist any longer. I
looked at the current code and it's not being used.
Originally it was being used to place the the ni on a global list:
ln_nis_cpt. Which was traversed when attempting to calculate the cpt for a
NID using lnet_cpt_of_nid_locked(). However that latter function has been
re-implemented due to how MR works now. So there is not need for ni_cptlist.

On Thu, 6 Sep 2018 at 18:05, NeilBrown <neilb@suse.com> wrote:

> This isn't used any more.
> The new comment is odd - this is no net_ni_cpt !!
> The ni_cptlist linkage is no longer used - should it go too?
>
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  .../staging/lustre/include/linux/lnet/lib-types.h  |    4 +---
>  drivers/staging/lustre/lnet/lnet/api-ni.c          |    7 -------
>  2 files changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> index 6c34ecf22021..dc15fa75a9d2 100644
> --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> @@ -305,7 +305,7 @@ struct lnet_net {
>  struct lnet_ni {
>         /* chain on the lnet_net structure */
>         struct list_head          ni_netlist;
> -       /* chain on ln_nis_cpt */
> +       /* chain on net_ni_cpt */
>         struct list_head        ni_cptlist;
>
>         spinlock_t              ni_lock;
> @@ -671,8 +671,6 @@ struct lnet {
>
>         /* LND instances */
>         struct list_head                ln_nets;
> -       /* NIs bond on specific CPT(s) */
> -       struct list_head                ln_nis_cpt;
>         /* the loopback NI */
>         struct lnet_ni                  *ln_loni;
>         /* network zombie list */
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c
> b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index 546d5101360f..960f235df5e7 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -538,7 +538,6 @@ lnet_prepare(lnet_pid_t requested_pid)
>
>         INIT_LIST_HEAD(&the_lnet.ln_test_peers);
>         INIT_LIST_HEAD(&the_lnet.ln_nets);
> -       INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
>         INIT_LIST_HEAD(&the_lnet.ln_routers);
>         INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
>         INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
> @@ -616,7 +615,6 @@ lnet_unprepare(void)
>         LASSERT(!the_lnet.ln_refcount);
>         LASSERT(list_empty(&the_lnet.ln_test_peers));
>         LASSERT(list_empty(&the_lnet.ln_nets));
> -       LASSERT(list_empty(&the_lnet.ln_nis_cpt));
>
>         lnet_portals_destroy();
>
> @@ -1294,11 +1292,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct
> lnet_lnd_tunables *tun)
>         /* refcount for ln_nis */
>         lnet_ni_addref_locked(ni, 0);
>         list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
> -       if (ni->ni_cpts) {
> -               lnet_ni_addref_locked(ni, 0);
> -               list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
> -       }
> -
>         lnet_net_unlock(LNET_LOCK_EX);
>
>         ni->ni_state = LNET_NI_STATE_ACTIVE;
>
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180912/9562f938/attachment-0001.html>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list
  2018-09-12  5:48     ` NeilBrown
@ 2018-09-13 19:33       ` Amir Shehata
  2018-09-24  6:03         ` NeilBrown
  0 siblings, 1 reply; 98+ messages in thread
From: Amir Shehata @ 2018-09-13 19:33 UTC (permalink / raw)
  To: lustre-devel

did you read my response to that question? Pasted below:

---
This area was re-implemented. There is no need for ni_cptlist any longer. I
looked at the current code and it's not being used.
Originally it was being used to place the the ni on a global list:
ln_nis_cpt. Which was traversed when attempting to calculate the cpt for a
NID using lnet_cpt_of_nid_locked(). However that latter function has been
re-implemented due to how MR works now. So there is not need for ni_cptlist.
___

thanks
amir

On Wed, 12 Sep 2018 at 17:35, NeilBrown <neilb@suse.com> wrote:

> On Wed, Sep 12 2018, Doug Oucharek wrote:
>
> > I'm assuming that a future patch will be chaining the NI structure on to
> the NET structure it belongs to.  This patch is just not chaining the NIs
> on a global NIS list anymore.  As such, ni_cptlist is being "repurposed".
>
> The NI is already chained onto the NET through
>  lnet_net.net_ni_list and lnet_ni.ni_netlist
>
> ni_cptlist is not used even in current master.
> It is never added to any list, but lnet_ni_unlink_locked() does remove
> it from a list.  Is that code wrong (should be checking ni_netlist), or
> is it cruft that should be removed?
>
> Thanks,
> NeilBrown
>
> >
> > Reviewed-by: Doug Oucharek <dougso@me.com>
> >
> > Doug
> >
> > ?On 9/6/18, 5:54 PM, "NeilBrown" <neilb@suse.com> wrote:
> >
> >     This isn't used any more.
> >     The new comment is odd - this is no net_ni_cpt !!
> >     The ni_cptlist linkage is no longer used - should it go too?
> >
> >     This is part of
> >         8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
> >            LU-7734 lnet: Multi-Rail local NI split
> >
> >     Signed-off-by: NeilBrown <neilb@suse.com>
> >     ---
> >      .../staging/lustre/include/linux/lnet/lib-types.h  |    4 +---
> >      drivers/staging/lustre/lnet/lnet/api-ni.c          |    7 -------
> >      2 files changed, 1 insertion(+), 10 deletions(-)
> >
> >     diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> >     index 6c34ecf22021..dc15fa75a9d2 100644
> >     --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
> >     +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
> >     @@ -305,7 +305,7 @@ struct lnet_net {
> >      struct lnet_ni {
> >       /* chain on the lnet_net structure */
> >       struct list_head          ni_netlist;
> >     - /* chain on ln_nis_cpt */
> >     + /* chain on net_ni_cpt */
> >       struct list_head        ni_cptlist;
> >
> >       spinlock_t              ni_lock;
> >     @@ -671,8 +671,6 @@ struct lnet {
> >
> >       /* LND instances */
> >       struct list_head                ln_nets;
> >     - /* NIs bond on specific CPT(s) */
> >     - struct list_head                ln_nis_cpt;
> >       /* the loopback NI */
> >       struct lnet_ni                  *ln_loni;
> >       /* network zombie list */
> >     diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c
> b/drivers/staging/lustre/lnet/lnet/api-ni.c
> >     index 546d5101360f..960f235df5e7 100644
> >     --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> >     +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> >     @@ -538,7 +538,6 @@ lnet_prepare(lnet_pid_t requested_pid)
> >
> >       INIT_LIST_HEAD(&the_lnet.ln_test_peers);
> >       INIT_LIST_HEAD(&the_lnet.ln_nets);
> >     - INIT_LIST_HEAD(&the_lnet.ln_nis_cpt);
> >       INIT_LIST_HEAD(&the_lnet.ln_routers);
> >       INIT_LIST_HEAD(&the_lnet.ln_drop_rules);
> >       INIT_LIST_HEAD(&the_lnet.ln_delay_rules);
> >     @@ -616,7 +615,6 @@ lnet_unprepare(void)
> >       LASSERT(!the_lnet.ln_refcount);
> >       LASSERT(list_empty(&the_lnet.ln_test_peers));
> >       LASSERT(list_empty(&the_lnet.ln_nets));
> >     - LASSERT(list_empty(&the_lnet.ln_nis_cpt));
> >
> >       lnet_portals_destroy();
> >
> >     @@ -1294,11 +1292,6 @@ lnet_startup_lndni(struct lnet_ni *ni, struct
> lnet_lnd_tunables *tun)
> >       /* refcount for ln_nis */
> >       lnet_ni_addref_locked(ni, 0);
> >       list_add_tail(&ni->ni_net->net_list, &the_lnet.ln_nets);
> >     - if (ni->ni_cpts) {
> >     -         lnet_ni_addref_locked(ni, 0);
> >     -         list_add_tail(&ni->ni_cptlist, &the_lnet.ln_nis_cpt);
> >     - }
> >     -
> >       lnet_net_unlock(LNET_LOCK_EX);
> >
> >       ni->ni_state = LNET_NI_STATE_ACTIVE;
> >
> >
> >
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180913/3e1b2e4f/attachment.html>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list
  2018-09-13 19:33       ` Amir Shehata
@ 2018-09-24  6:03         ` NeilBrown
  0 siblings, 0 replies; 98+ messages in thread
From: NeilBrown @ 2018-09-24  6:03 UTC (permalink / raw)
  To: lustre-devel

On Thu, Sep 13 2018, Amir Shehata wrote:

> did you read my response to that question? Pasted below:
>
> ---
> This area was re-implemented. There is no need for ni_cptlist any longer. I
> looked at the current code and it's not being used.
> Originally it was being used to place the the ni on a global list:
> ln_nis_cpt. Which was traversed when attempting to calculate the cpt for a
> NID using lnet_cpt_of_nid_locked(). However that latter function has been
> re-implemented due to how MR works now. So there is not need for ni_cptlist.
> ___

Thanks.  I've queued this patch.

NeilBrown

From: NeilBrown <neilb@suse.com>
Date: Mon, 24 Sep 2018 15:57:13 +1000
Subject: [PATCH] lustre: remove ni_cptlist field.

This is never used in a meaningful way.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/staging/lustre/include/linux/lnet/lib-types.h | 2 --
 drivers/staging/lustre/lnet/lnet/api-ni.c             | 5 -----
 drivers/staging/lustre/lnet/lnet/config.c             | 1 -
 3 files changed, 8 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h
index 89158b52dc1b..1423aea83747 100644
--- a/drivers/staging/lustre/include/linux/lnet/lib-types.h
+++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h
@@ -309,8 +309,6 @@ struct lnet_ni {
 	spinlock_t		ni_lock;
 	/* chain on the lnet_net structure */
 	struct list_head	ni_netlist;
-	/* chain on net_ni_cpt */
-	struct list_head	ni_cptlist;
 
 	/* number of CPTs */
 	int			ni_ncpts;
diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c
index 69ea18ce2dcc..20fa3fea04b9 100644
--- a/drivers/staging/lustre/lnet/lnet/api-ni.c
+++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
@@ -1115,11 +1115,6 @@ lnet_ni_tq_credits(struct lnet_ni *ni)
 static void
 lnet_ni_unlink_locked(struct lnet_ni *ni)
 {
-	if (!list_empty(&ni->ni_cptlist)) {
-		list_del_init(&ni->ni_cptlist);
-		lnet_ni_decref_locked(ni, 0);
-	}
-
 	/* move it to zombie list and nobody can find it anymore */
 	LASSERT(!list_empty(&ni->ni_netlist));
 	list_move(&ni->ni_netlist, &ni->ni_net->net_ni_zombie);
diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c
index 92694b51f223..9539ce07ae05 100644
--- a/drivers/staging/lustre/lnet/lnet/config.c
+++ b/drivers/staging/lustre/lnet/lnet/config.c
@@ -438,7 +438,6 @@ lnet_ni_alloc(struct lnet_net *net, struct cfs_expr_list *el, char *iface)
 	}
 
 	spin_lock_init(&ni->ni_lock);
-	INIT_LIST_HEAD(&ni->ni_cptlist);
 	INIT_LIST_HEAD(&ni->ni_netlist);
 	ni->ni_refs = cfs_percpt_alloc(lnet_cpt_table(),
 				       sizeof(*ni->ni_refs[0]));
-- 
2.14.0.rc0.dirty

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180924/834df8e9/attachment.sig>

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre
  2018-09-10 23:10 ` [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre James Simmons
@ 2018-09-24  6:58   ` NeilBrown
  2018-09-29 22:35     ` James Simmons
  0 siblings, 1 reply; 98+ messages in thread
From: NeilBrown @ 2018-09-24  6:58 UTC (permalink / raw)
  To: lustre-devel

On Tue, Sep 11 2018, James Simmons wrote:

>> The following series implements the first patch in the
>> multi-rail series:
>> Commit: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")
>> 
>> I split that commit up into 40 individual commits which can be found
>> at
>>   https://github.com/neilbrown/lustre/commits/multirail
>> though you need to scroll down a bit, as that contains all the
>> multi-rail series.
>> 
>> I then ported most of these patches to my mainline tree.
>> Some that I haven't included are:
>> - lnet: Move lnet_msg_alloc/free down a bit.
>>     lnet_msg_alloc/free don't exist any more
>> - lnet: lib-types: change some tabs to spaces
>> - lnet - assorted whitespace changes.
>> - lnet: change ni_last_alive from time64_t to long
>> - lnet: add lnet_net_state
>>     net_state is never used.
>> - lnet: remove 'static' from lnet_get_net_config()
>> 
>> I've also made a couple of minor changes to individual patches not
>> strictly related to porting (the net_prio field is never used, so I
>> never added it - I should have made that a separate patch).
>>
>> This series compiles, but doesn't work.  I get a NULL pointer
>> reference, then an assertion failure.  If I fix those, it hangs.
>> The NULL pointer ref and the failing assertion are gone with
>> later patches, so I hope the other problems are too.
>> 
>
> I have tried it and did a compare to what landed in the OpenSFS branch.
> I saw the failures in my testing and foudn the mistake in the 7th patch.
>
>> Some of these patches have very poor descriptions, such as "I have no
>> idea what this does".  If someone would like to explain - or maybe say
>> "Oh, we really shouldn't have done that", I'd be very happy to
>> receive that, and update the description or patch accordingly.
>
> When I ran checkpatch it really dislikes:
>
> This is part of
>     8cbb8cd3e771e7f7e0f99cafc19fad32770dc015
>        LU-7734 lnet: Multi-Rail local NI split
>
> I don't recommend landing the above in the commit messsage as for the
> reason that a person outside of lustre will not know where to look for
> that git commit. Instead I recommend replacing it with:
>
> ------------------------------------------------------------------
> Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
> Reviewed-on: http://review.whamcloud.com/18274
> Reviewed-by: Doug Oucharek <dougso@me.com>
> Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
> Signed-off-by: NeilBrown <neilb@suse.com>

Thanks for the suggestion.  I don't like that approach exactly because
it seems to be a lie.  The specific patch was not reviewed by those
people, and there is useful information which is not included there.
I have changed to patches to include:

    This is part of
        Commit: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")
    from upstream lustre, where it is marked:
        Signed-off-by: Amir Shehata <amir.shehata@intel.com>
        WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
        Reviewed-on: http://review.whamcloud.com/18274
        Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
        Reviewed-by: Olaf Weber <olaf@sgi.com>

checkpatch is not happy with the indented tags, but checkpatch is a
servant, not the master.

I've made that change, moved this series to my 'lustre' branch, merged
in the latest -rc, and pushed it all out.

Now the post the rest of the 'MR' series, and then start looking at
Dynamic Discovery (756abb9cf00b9^..1c45d9051764)

Thanks,
NeilBrown

>
> This gives the reviewer a URL link for both the JIRA ticket that usually
> contains details not in the commit message as well as the gerrit URL
> for the original patch. This way if a future bug is found a comparison
> can be done against the original patch. 
>  
> The policy for the Lustre project is to perserve authorship for patches
> when porting to other branches, upstream or LTS.
>
>> These will all appear in my lustre-testing branch, but won't migrate
>> to 'lustre' until I, at least, have enough other patches that I can
>> get a successful test run.
>> 
>> Review and comments always welcome.
>> 
>> Thanks,
>> NeilBrown
>> 
>> 
>> ---
>> 
>> Amir Shehata (1):
>>       Completely re-write lnet_parse_networks().
>> 
>> NeilBrown (33):
>>       struct lnet_ni - reformat comments.
>>       lnet: Create struct lnet_net
>>       lnet: struct lnet_ni: move ni_lnd to lnet_net
>>       lnet: embed lnd_tunables in lnet_ni
>>       lnet: begin separating "networks" from "network interfaces".
>>       lnet: store separate xmit/recv net-interface in each message.
>>       lnet: change lnet_peer to reference the net, rather than ni.
>>       lnet: add cpt to lnet_match_info.
>>       lnet: add list of cpts to lnet_net.
>>       lnet: add ni arg to lnet_cpt_of_nid()
>>       lnet: pass tun to lnet_startup_lndni, instead of full conf
>>       lnet: split lnet_startup_lndni
>>       lnet: reverse order of lnet_startup_lnd{net,ni}
>>       lnet: rename lnet_find_net_locked to lnet_find_rnet_locked
>>       lnet: extend zombie handling to nets and nis
>>       lnet: lnet_shutdown_lndnets - remove some cleanup code.
>>       lnet: move lnet_shutdown_lndnets down to after first use
>>       lnet: add ni_state
>>       lnet: simplify lnet_islocalnet()
>>       lnet: discard ni_cpt_list
>>       lnet: add net_ni_added
>>       lnet: don't take reference in lnet_XX2ni_locked()
>>       lnet: don't need lock to test ln_shutdown.
>>       lnet: don't take lock over lnet_net_unique()
>>       lnet: swap 'then' and 'else' branches in lnet_startup_lndnet
>>       lnet: only valid lnd_type when net_id is unique.
>>       lnet: make it possible to add a new interface to a network
>>       lnet: add checks to ensure network interface names are unique.
>>       lnet: track tunables in lnet_startup_lndnet()
>>       lnet: fix typo
>>       lnet: lnet_dyn_add_ni: fix ping_info count
>>       lnet: lnet_dyn_del_ni: fix ping_info count
>>       lnet: introduce use_tcp_bonding mod param
>> 
>> 
>>  .../staging/lustre/include/linux/lnet/lib-lnet.h   |   31 -
>>  .../staging/lustre/include/linux/lnet/lib-types.h  |  142 ++-
>>  .../lustre/include/uapi/linux/lnet/lnet-dlc.h      |   18 
>>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c    |   10 
>>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |    6 
>>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |   12 
>>  .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |   74 +-
>>  .../staging/lustre/lnet/klnds/socklnd/socklnd.c    |   25 -
>>  drivers/staging/lustre/lnet/lnet/acceptor.c        |    8 
>>  drivers/staging/lustre/lnet/lnet/api-ni.c          |  939 +++++++++++++-------
>>  drivers/staging/lustre/lnet/lnet/config.c          |  688 +++++++++++----
>>  drivers/staging/lustre/lnet/lnet/lib-move.c        |  132 ++-
>>  drivers/staging/lustre/lnet/lnet/lib-ptl.c         |    6 
>>  drivers/staging/lustre/lnet/lnet/lo.c              |    2 
>>  drivers/staging/lustre/lnet/lnet/net_fault.c       |    3 
>>  drivers/staging/lustre/lnet/lnet/peer.c            |   31 -
>>  drivers/staging/lustre/lnet/lnet/router.c          |   51 +
>>  drivers/staging/lustre/lnet/lnet/router_proc.c     |   24 -
>>  drivers/staging/lustre/lnet/selftest/brw_test.c    |    2 
>>  drivers/staging/lustre/lnet/selftest/framework.c   |    3 
>>  drivers/staging/lustre/lnet/selftest/selftest.h    |    2 
>>  21 files changed, 1507 insertions(+), 702 deletions(-)
>> 
>> --
>> Signature
>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180924/ce3ab1c5/attachment.sig>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre
  2018-09-24  6:58   ` NeilBrown
@ 2018-09-29 22:35     ` James Simmons
  0 siblings, 0 replies; 98+ messages in thread
From: James Simmons @ 2018-09-29 22:35 UTC (permalink / raw)
  To: lustre-devel


> > Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
> > Reviewed-on: http://review.whamcloud.com/18274
> > Reviewed-by: Doug Oucharek <dougso@me.com>
> > Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
> > Signed-off-by: NeilBrown <neilb@suse.com>
> 
> Thanks for the suggestion.  I don't like that approach exactly because
> it seems to be a lie.  The specific patch was not reviewed by those
> people, and there is useful information which is not included there.
> I have changed to patches to include:
> 
>     This is part of
>         Commit: 8cbb8cd3e771 ("LU-7734 lnet: Multi-Rail local NI split")
>     from upstream lustre, where it is marked:
>         Signed-off-by: Amir Shehata <amir.shehata@intel.com>
>         WC-bug-id: https://jira.whamcloud.com/browse/LU-7734
>         Reviewed-on: http://review.whamcloud.com/18274
>         Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
>         Reviewed-by: Olaf Weber <olaf@sgi.com>
> 
> checkpatch is not happy with the indented tags, but checkpatch is a
> servant, not the master.

To my knowledge their isn't really a policy about this. What I have been
doing is kind of following how LTS versions of lustre have been handled. 
For LTS versions patches are cherry-picked and two additional lines are
added:

Lustre-change:
Lustre-commit:

The orginal reviews are keep. Also by including the original reviews the
people involved with those patches are poked. The only requirement is that
2 people review again. Not everyone has to review for it to land. Once 
landed I don't see a clear why to tell who reviewed.

In any case the above approach seems reasonable as long as the original
author is preserve. The general rule is the original patch poster normally
keeps authorship. Also I noticed patches recently pushed are not reaching
the original authors and reviewers. We should make sure that still 
happens.

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2018-09-29 22:35 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07  0:49 [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 11/34] lnet: pass tun to lnet_startup_lndni, instead of full conf NeilBrown
2018-09-11 18:31   ` Amir Shehata
2018-09-12  4:03     ` NeilBrown
2018-09-12  3:30   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 12/34] lnet: split lnet_startup_lndni NeilBrown
2018-09-12  3:39   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 16/34] lnet: lnet_shutdown_lndnets - remove some cleanup code NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 18/34] lnet: add ni_state NeilBrown
2018-09-12  3:59   ` Doug Oucharek
2018-09-12  4:25     ` NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 14/34] lnet: rename lnet_find_net_locked to lnet_find_rnet_locked NeilBrown
2018-09-12  3:40   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 09/34] lnet: add list of cpts to lnet_net NeilBrown
2018-09-10 23:28   ` Doug Oucharek
2018-09-12  2:16     ` NeilBrown
2018-09-11  1:02   ` James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 06/34] lnet: store separate xmit/recv net-interface in each message NeilBrown
2018-09-10 23:24   ` Doug Oucharek
2018-09-10 23:29   ` James Simmons
2018-09-10 23:36   ` James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 03/34] lnet: struct lnet_ni: move ni_lnd to lnet_net NeilBrown
2018-09-10 23:04   ` Doug Oucharek
2018-09-10 23:19     ` James Simmons
2018-09-10 23:19       ` Doug Oucharek
2018-09-10 23:19     ` James Simmons
2018-09-10 23:24   ` James Simmons
2018-09-10 23:25   ` James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 15/34] lnet: extend zombie handling to nets and nis NeilBrown
2018-09-12  3:53   ` Doug Oucharek
2018-09-12  4:10     ` NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 02/34] lnet: Create struct lnet_net NeilBrown
2018-09-10 22:56   ` Doug Oucharek
2018-09-10 23:23   ` James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 04/34] lnet: embed lnd_tunables in lnet_ni NeilBrown
2018-09-10 23:08   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 05/34] lnet: begin separating "networks" from "network interfaces" NeilBrown
2018-09-10 23:18   ` Doug Oucharek
2018-09-12  2:48     ` NeilBrown
2018-09-10 23:27   ` James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 13/34] lnet: reverse order of lnet_startup_lnd{net, ni} NeilBrown
2018-09-12  3:39   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 10/34] lnet: add ni arg to lnet_cpt_of_nid() NeilBrown
2018-09-10 23:32   ` Doug Oucharek
2018-09-11  1:03   ` James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 01/34] struct lnet_ni - reformat comments NeilBrown
2018-09-10 22:49   ` Doug Oucharek
2018-09-10 23:17   ` James Simmons
2018-09-12  2:44     ` NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 07/34] lnet: change lnet_peer to reference the net, rather than ni NeilBrown
2018-09-10 23:17   ` James Simmons
2018-09-12  2:56     ` NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 08/34] lnet: add cpt to lnet_match_info NeilBrown
2018-09-10 23:25   ` Doug Oucharek
2018-09-11  1:01   ` James Simmons
2018-09-11  1:01   ` [lustre-devel] BRe: " James Simmons
2018-09-07  0:49 ` [lustre-devel] [PATCH 17/34] lnet: move lnet_shutdown_lndnets down to after first use NeilBrown
2018-09-12  3:55   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 20/34] lnet: discard ni_cpt_list NeilBrown
2018-09-12  4:07   ` Doug Oucharek
2018-09-12  5:48     ` NeilBrown
2018-09-13 19:33       ` Amir Shehata
2018-09-24  6:03         ` NeilBrown
2018-09-12 16:29   ` Amir Shehata
2018-09-07  0:49 ` [lustre-devel] [PATCH 34/34] lnet: introduce use_tcp_bonding mod param NeilBrown
2018-09-12  4:54   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 28/34] lnet: add checks to ensure network interface names are unique NeilBrown
2018-09-12  4:39   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 27/34] lnet: make it possible to add a new interface to a network NeilBrown
2018-09-12  4:38   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 30/34] lnet: fix typo NeilBrown
2018-09-12  4:47   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 33/34] Completely re-write lnet_parse_networks() NeilBrown
2018-09-12  4:54   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 32/34] lnet: lnet_dyn_del_ni: fix ping_info count NeilBrown
2018-09-12  4:49   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 22/34] lnet: don't take reference in lnet_XX2ni_locked() NeilBrown
2018-09-12  4:18   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 29/34] lnet: track tunables in lnet_startup_lndnet() NeilBrown
2018-09-12  4:47   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 23/34] lnet: don't need lock to test ln_shutdown NeilBrown
2018-09-12  4:27   ` Doug Oucharek
2018-09-12  5:54     ` NeilBrown
2018-09-07  0:49 ` [lustre-devel] [PATCH 26/34] lnet: only valid lnd_type when net_id is unique NeilBrown
2018-09-12  4:34   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 31/34] lnet: lnet_dyn_add_ni: fix ping_info count NeilBrown
2018-09-12  4:48   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 19/34] lnet: simplify lnet_islocalnet() NeilBrown
2018-09-12  4:02   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 25/34] lnet: swap 'then' and 'else' branches in lnet_startup_lndnet NeilBrown
2018-09-12  4:32   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 24/34] lnet: don't take lock over lnet_net_unique() NeilBrown
2018-09-12  4:29   ` Doug Oucharek
2018-09-07  0:49 ` [lustre-devel] [PATCH 21/34] lnet: add net_ni_added NeilBrown
2018-09-12  4:15   ` Doug Oucharek
2018-09-10 23:10 ` [lustre-devel] [PATCH 00/34] Beginning of multi-rail support for drivers/staging/lustre James Simmons
2018-09-24  6:58   ` NeilBrown
2018-09-29 22:35     ` James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.